Twitter Archive to Markdown

I have been wanting to convert my Twitter Export archive into a more useful format for a while. I had vague dreams of maybe turning them into some sort of digest posts on WordPress but there are a fuckton of tweets (25,000) and it would require a LOT of scrubbing. I could probably manage to find and remove Retweets pretty easily, but then there is the issue of media and getting the images into the digest posts and it’s just not worth the hassle.

What I can, and did do, is preserve the data is a better, more digestible, and searchable format. Specifically, Markdown. Well, ok, the files are not doing anything fancy, so it’s just, plaintext, pretending to be Markdown.

I have no idea if Twitter still offers an export dump of your data, I have not visited the site at all in over a year. I left, I deleted everything, I blocked it on my DNS. But assuming they do, or you have one, it’s a big zip file that can be unrolled I to a sort of, local, Twitter-like interface. There are a lot of files in this ball, and while I am keeping the core archive, I just mostly care about the content.

If you dig in, it’s easy to find, there is a folder called data, the tweets are in a file called “tweets.js.”. It’s some sort of JSON/XML style format. If you want to media, it’s in a folder called “Tweets_Media” or something like that. I skimmed through mine, most of the images looked familiar, because I already have them, I removed the copy because I didn’t need it.

But I kept the Tweets.js file.

So, what to do with it? It has a bunch of extraneous meta data for each Tweet that makes it a cluttered mess. It’s useful for a huge website, but all I want is the date and the text. Here is a sample Tweet in the file.

{
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "508262277464608768"
          ],
          "editableUntil" : "2014-09-06T15:05:44.661Z",
          "editsRemaining" : "5",
          "isEditEligible" : true
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
      },
      "display_text_range" : [
        "0",
        "57"
      ],
      "favorite_count" : "0",
      "id_str" : "508262277464608768",
      "truncated" : false,
      "retweet_count" : "0",
      "id" : "508262277464608768",
      "created_at" : "Sat Sep 06 14:35:44 +0000 2014",
      "favorited" : false,
      "full_text" : "\"Sorry, you are over the limit for spam reports!\"  Heh...",
      "lang" : "en"
    }
  },

So I wrote a quick and simple Python Script (it’s below). I probably could have done something fancy with Beautiful Soup or Pandas, but instead I did a quick and basic scan that pulls the data I care about. If a line contains “created_at” pull it out to get the data, if it has “full_text”, pull it out to get the text.

Once I was able to output these two lines, I went about cleaning them up a bit. I don’t need the titles, so I started by splitting on “:”. This was quickly problematic if the Tweet contained a semicolon and because the time contained several semicolons. Instead I did a split on ‘ ” : ” ‘. Specifically, quote, space, semicolon, space, quote.”. Only the breaks I wanted had the spaces and quotes, so that got me through step one. The end quotation mark was easy to slice off as well.

I considered simplifying things by using the same transformation on the date and the text, but the data also had this +0000 in it that I wanted to remove. It’s not efficient, but it was just as simple to just have two, very similar operations.

After some massaging, I was able to output something along the lines of “date – text”.

But then I noticed that for some reason the Tweets are apparently not in date order. I had decided that I was just going to create a series of year based archival files, so I needed them to be in order.

So I added a few more steps to sort each Tweet during processing into an array of arrays based on the year. Once again, this isn’t the cleanest code, It assumes a range of something like, 2004 to 2026, which covers my needs for certain. I also had some “index out of range” errors with my array of arrays, which probably have a clever loopy solution, but instead it’s just a bug pre-initialized copy/paste array.

Part of the motivation of doing the array of arrays was also that I could make the script output my sorted yearly files directly, but I just did it manually from the big ball final result.. the job is done, but it could easily be done by adjusting the lower output block a bit.

Anyway, here is the code, and a link to a Git repository for it.

# A simple script that takes an exported tweets.js file and outputs it to a markdown text file for archiving.
# In pulling data for this, I noticed that older Twitter exports use a csv file instead of a .js file.
# As such, this is for newer exports.
# The Tweets.js file is in the 'data' directory of a standard Twitter archive export file.

# Open the tweet.js file containing all the tweets, should eb in the same folder
with open("tweets.js", encoding="utf-8") as file:
    filedata = file.readlines()

tweet_data = []
current_tweet = []
# The Tweets don't seem to be in order, so I needed to sort them out, this is admitedly ugly
# but I only need to cover so many years of sorting and this was the easiest way to avoid index errors
sorted_tweets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

# Does a simple search through the file.  It pulls out the date posted and the full text.
# This does not do anything with images, sorry, that gets more complicated, it would be doable
for line in filedata:
    if "created_at" in line:
        timesplit = line.split(":")
        posted_at = line.split("\" : \"")[1].replace(" +0000 ", " ")[:-3]
        current_tweet.append(posted_at)
    elif "full_text" in line:
        current_tweet.append(line.split("\" : \"")[1][:-3])
        #        current_tweet.append(line.split(":")[1].split("\"")[1])
        tweet_data.append(current_tweet)
        current_tweet = []
        # Because full text is always after the date, it just moves on after it gets both
    else:
        pass

# An ugly sort, it simply looks for the year in the date, then creates an array of arrays based on year.
# I did it this way partly in case I wanted to output to seperate files based on year, but I can copy/paste that
# It probably is still out of order based on date, but whatever, I just want a simple archive file
for each in tweet_data:
    for year in range(2004, 2026):
        if str(year) in each[0]:
            sorted_tweets[year - 2004].append(each)

# Prints the output and dumps it to a file.
with open("output.md", encoding="utf-8", mode="w") as output:
    for eachyear in sorted_tweets:
        for each in reversed(eachyear):
            output.write(each[0] + " : " + each[1] + "\n")
            print(each[0] + " : " + each[1])

NAS Recovery

What a fun time it’s been with my Synology NAS lately. And before I get going here, I want to make it clear, nothing here is a knock against Synology, or WD for that matter. The NAS I have is like ten years old, if it had failed, I was already pricing out a new, up-to-date, Synology. Heck, I may still get one anyway.

But for now, it seems to be working fine again.

As I mentioned, it’s been like ten years or so. I ran on one 4TB WD Red drive for a long time. Eventually, I did add the second drive to make things RAID and redundant. Sometimes last year, my original WD drive died on me, I ordered a replacement and swapped it out, and everything was fine.

Sometime, maybe a month ago now, I received an error about a drive failure. The newer drive was already showing bad. I made up an RMA request with Western Digital, wiped the drive, and then sent it in. They sent me a replacement.

A short time before the replacement arrived, I found another error, “Volume has crashed”. It showed that the, at the time, one drive was “Healthy”, and I could still read all of my data. This was starting to feel a bit suspect. I have everything important backed up online to OneDrive, but just in case, I started pulling things off to other storage as a secondary backup. This basically involved eating up all the spare space on my project server (temporarily) and using a USB enclosure and an old 2TB drive that, seems to be failing, but would work well enough for short-term storage. The point was, if I had to rebuild things, I would not have to download mountains of data off OneDrive. USB transfer is much easier and faster.

With everything backed up, I received the replacement for my RMA drive. My hope was, I could attach the replacement drive, and whatever was causing this Volume to show as crashed would clear itself out. Unfortunately, I could not really interact with the volume at all.

After several attempts at various workarounds, I gave up on recovering the Volume. I had the data, that is what matters.

I pulled the crashed drive out, which allowed me to create a new volume using the new drive. I then recreated the set of shared network folders, Books, Video, Music, Photo, General Files, as well as reestablished the home folders for users.

Fortunately, because I kept the same base names, all of my Network Mapped drives to the NAS, just worked. Fixing my own connections would be easy, hassling with connections on my wife and kids’ laptops, would be a pain. They get all annoyed when I do anything on their laptops.

Unfortunately, the crashed volume seems to have killed all of the apps I had set up. This is not a huge loss honestly, I don’t actually use most of the built-in Synology apps anymore beyond Cloud Sync and the Torrent client. The main one I need to reconfigure is the VPN client. I may just move that to a docker instance on my project PC. Fortunately, last year, I pulled both my email and blog archives off of the NAS. All my email is consolidated again in Outlook, and my blog archive is in a Docker container now. This means I can just remove all of these apps instead of reinstalling them.

I did find that I had failed to do a fresh local backup of my “Family Videos” folder, but I was able to resync that down from the One Drive backup. Speaking of which rebuilding all those sync connections was a little tedious since they are spread across two One Drive accounts, but I got them worked out and thankfully, everything recognized existing files and called it good. While I didn’t put everything back on the NAS, I have a few things that are less important that I’m just going to store on the file server/project server, I somehow gained about 1.5TB of space. I’ve repeatedly checked and everything is there as it should be. I can only speculate that there was some sort of residual cruft that had built up over time in logs or something somewhere. I also used to use Surveillance station, so it’s possible I had a mountain of useless videos stored on it.

In general, it’s actually been a bit of an excuse to clean up a few things. I had some folders in there that used to sync my DropBox and Google Drive, neither of which I use anymore, for example.

I am 99% sure everything is back in working order, and the last step I keep putting off is to whip the drive from the crashed volume (it still reads healthy) and read it to the current, new volume.

It’s been a hassle, but not really that bad. The main hassle is because it’s large amounts of data, it often means starting a copy and just, letting it run for hours.

This Year’s Garden

I meant to post when I planted but did not because, “reasons.” More specifically, a lack of motivation to do so. It’s not exactly anything impressive anyway, but that isn’t really supposed to matter anyway. My 2024 gardening is underway, and if it’s anything like the past several years, it will not be very fruitful.

At our old house, we had a pretty decent garden. I built a nice tiered raised bed pyramid thing, we grew plenty of peppers and tomatoes and the plants were super large and full. We had so many tomatoes we made a ton of salsa and I think I still have hot peppers frozen somewhere, though I doubt they are any good years later now.

The new house has been, not so successful. We get a lot more backyard animals here, and generally speaking, they eat all the fruits and vegetables. We have tried a few things to discourage it from keeping them up high on the back deck, to rubber snakes and other things.

I am trying again this year. I moved all of the plants (everything is is pots or buckets) down to the lower deck area, the pots still have the useless rubber snakes. I put my wind chimes down there as well, I am hoping the noise deters the animals a bit. In the past, we could not really use this lower area because we had our dog outside down there fairly often and she would get into things. She passed away a few years ago (like 20 years old, we thought she was immortal). So the lower area is available.

Anyway, also for deterrent, I have planted a bunch of garlic in the bottoms of all the pots, and a few mint plants in small pots nearby. Both are supposed to deter animals due to the smell, or so I hear.

As for what, it’s nothing super fancy, a tomato plant, a cherry tomato plant, a green pepper plant, a poblano pepper plant. I also have some oregano and basil. I also had a cilantro plant but something has already come along and snatched it up completely. Most of the plants I picked up from a sale at the local college agriculture building. They did not have any mint there though so I picked those up at the Kroger. They were conveniently on sale in the vegetable department later the same day I had gone to the college sale.

I also had my leftover plants from last year. Sadly, none of those had made it. We took them inside for the winter but they don’t seem to be coming back at all. Something took and ate my oregano and basil from last year anyway. I also had a Lemon and a Lime tree I had bought on clearance at the end of the summer last year. Both just seem to be dead sticks still.

I don’t have enough plants to make any huge batches of salsa or anything, but hopefully I can start getting some vegetables to eat occasionally.

Housekeeping Matters

It’s been a bit since I’ve last posted, I admit. Today I mostly wanted to address a few housekeeping matters.

I’ve moved the Letterboxd movie post syndication to Lameazoid.com. These posts were filtered from the home page and primarily serve the purpose of archiving what I post on Letterboxd. Frankly,. if you cared about these, I’d prefer a follow over there. They also now appear here on Lameazoid. They are still excluded from the main page. I post TV and movie content on Lameazoid, it’s just a “better fit” over there.

On the subject of “mostly for archival purposes”. I believe I have at least made the BlueSky archive posts slightly less ugly looking. Maybe not. I am not entirely sue how to use all fo the options on the RSS feed plug in and it’s not really important enough to care that much. It’s primarily for archival purposes, in case Blue Sky, or Letterboxed, etc close, I still have my posts. I want to add Threads as well at some point.

My wife is trying to get more serious with her online sales business. We’ve set up a website and Facebook page for it. The website collects the various sales pages mostly. Anyway, I’ve replaced the old “ebay” link with a link to the site. If you want to buy vintage (and some newer) clothing, that’s the primary things for sale over there.

Also notable, I updated the theme to something else. Mostly, I was bored with the old one.

I keep telling myself I should post and write more, I have ideas and thoughts to post on, but in the end, I just get in the way of myself and don’t. I even started writing in my own private journal a bit more just to sort of, try to get the juices flowing more. Which sort of works, but also not. Anyway, I just wanted to get these couple of housekeeping things mentioned.

Thoughts on Twitter, Musk, and Alternatives…

I have really really tried to mostly avoid discussing Twitter and Musk and everything that has happened over the past, year and a half to two years there. I do occasionally share news in the link blog posts, but even there, I mostly just avoid it. I am pretty outspoken about my dislike of Musk and Twitter on other forums but not on my own forums.

Watching this death spiral is really entertaining though.

And it is a death spiral. It may not actually result in the death of Twitter, god knows we won’t get that lucky, but it’s just increasingly looking shittier and shittier over there. I stopped using Twitter completely the day Musk took over. I deleted a bunch of random secondary meme accounts I had after that, and I did log in a few times to pull all my Tweet archive data. I want to, someday, maybe, write a Python Script that will parse through it all and compile it into a bunch of daily digests I can dump into a WordPress blog, for posterity. I also started running some Python scripts before the API was cut off to delete all my old Tweets from the site. As far as I know, I still have my @ handles, mostly kept to prevent them from getting scooped up by spammers and bots.

I am not sure though. I blocked Twitter shortly after I started using NextDNS (Referral Link) everywhere. I can’t even check on my own accounts without a bunch of extra steps anymore. At this point, I really don’t care. I am not going back ever so long as Musk is even remotely connected to the service and I doubt he ever gives it up. I do keep watch from the sidelines. I see mentions of large businesses or politicians or news outlets moving permanently to Threads. I see people talking about how blue-checked bots are topping all the replies. I see complaints about all the crypto scams and weed gummies being advertised. I see it, and I quietly laugh to myself. Because all of this happening was clearly going to be the outcome of a big winey racist narcissist forcibly taking things over.

I’m not entirely convinced this wasn’t the intended outcome honestly. People like Musk, with their “free speech advocacy”, generally dislike actual open discussion and speech. They dislike when people can talk openly to each other and let ideas swell and become reality while smashing down stupid racist bull shit and conspiracy lies.

Fun fact, you can post a tweet with phrases like “Transwomen aren’t women” but if you post about “CIS people” you get flagged for using a slur.

Probably the first and biggest stupidity was the new pay-to-play blue check system that was implemented pretty early on. Blue Checks were originally issued as a way to verify people and companies were actually who they were. Someone at Twitter would do due diligence to make sure @McDonalds was actually run by the popular restaurant chain. This also meant not allowing blue checks for “@MacDonalds” or “”@McD0nalds” or various other typo-style fake accounts. It meant something. Early on, this was changed so Blue Checks just meant you had a paid subscription. Anyone could get a blue check. It also showed that you were supporting the racist jackass and his company, so a lot of previously verified celebrity types, refused to pay. Some were given checks anyway, which also upset these companies and people since it of course, implies support. It’s essentially a false endorsement.

As more advertisers fled the platform as it became increasingly filled with assholes and bots and scams, the Blue Check system has just been pushed more and more in a desperate attempt to make up for lost ad revenue. The irony being that even if EVERYONE signed up, it’s not where neat what advertisers were paying. The latest stupidity is that they now require new users to pay in to start posting. It’s pushed as a way to “deter bots”. Twitter doesn’t seem to understand just how cheap $8/month/account is for priority visibility for scams. One might wonder if it’s still worthwhile if so many are jumping ship, but it’s like those scam emails full of spelling errors. The scammers do this to weed out the intelligent users so only the choices of marks remain. Twitter is doing a GREAT job of weeding out the intelligence from its system leaving nothing but easy marks for these scammers.

I almost would feel bad for these people if they weren’t mostly the same people pushing all the hate-filled stupidity on the world in politics during the past decade. But that’s probably left to another discussion, if ever.

The really funny part is how this isn’t even the first time this has happened to a microblog service centered around “Free speech”. Gab, Truth, Parlor, and others I am sure I’ve forgotten are all basically complete failures after they failed to take off and get any real traction after being filled with right-wing extremists which at best just drives away any legitimate advertisers. Truth recently pushed a scam IPO as a way to grift money for Trump’s lawsuits which is failing pretty spectacularly.

Because of course it is. It was a grift to funnel money in a “legitimate” manner, and now it’s just a bunch of bag holders getting fucked over.

Alternatives

I have not really quite settled on a good alternative to Twitter yet. I’m not entirely sure I really NEED one. I wasn’t using Twitter a lot before the fall, though I had used it since 2006 when it was very very new. The alternatives all have their own sort of pitfalls.

Threads seems to be the most active. It’s run by Facebook and is technically a spin-off of Instagram. I kind of like Threads, because it’s full of people posting Toy photos. Basically, everything I used to like about Instagram, before it became TikTok but with ads every 3 posts, is Threads. I don’t super like that it’s a Facebook property. I also hate how the timeline feels really really algorithm-driven.

BlueSky feels the most like “old Twitter”. and I don’t mean “2021/2022 Twitter”, I mean like, “2007-2008 Twitter”. OLD old Twitter. But it’s also kind of dead as fuck. Even now that it’s open to anyone without the need for invites, it feels a bit deserted.

Mastodon is probably my favorite. People claim it’s “hard to use” but it really isn’t. The real technical hurdles on Mastodon kind of stem from servers and admins who tend to be a little… eccentric, for lack of a better thing to call them. There are admins who will ban entire other instances because ONE user on that other instance says something that is kind of maybe offensive to … somebody. Or heck, even blatantly offensive to everyone. But the whole server gets banned over one person. Which feels a bit shitty, especially since there also feels like a lot of mindset that “once banned, it’s banned forever.”

The federation also had some weirdness. Sometimes I get a new follower, so I go and check them out to see if I want to follow back, but in the app, they LOOK like they have a blank profile. But if I open their profile in a web browser, it’s complete and they have posts. So there is clearly some weird syncing issue there. I’m not familiar enough with how the federation works to know the details, but from what I have gleaned from other discussions, it’s something like that. Or maybe that server is banned for some reason.

It’s also kind of clunky to re-toot something, from that something. If I link to a Toot, and you want to re-toot it, from what I can tell, you need to cut and paste the URL and do a search to find it from your own server. Or do a weird login jaunt from the local server. And it’s all very doable, but it’s cludgy as fuck.

Anyway, I kind of post to all three, sometimes I post the same thing to all three, sometimes I kind of segment it out depending on “audience”. Not that I really have an audience. My pseudo plan is to mostly use Threads for Toy stuff, and BlueSky or Mastodon for everything else. I’m not entirely sure yet. There also aren’t really easy tools to post things like, blog posts, automatically to Threads or BlueSky. This was a factor that always felt like part of why Google Plus failed.