Ramen Junkie

AI Music and the Dead Internet Theory

A man was arrested for creating AI music and using bots to stream it, netting 10 million dollars from Spotify.

https://www.forbes.com/sites/lesliekatz/2024/09/08/man-charged-with-10-million-streaming-scam-using-ai-generated-songs

A few things to note here, and some thoughts on it in general. He was doing this for a while, since 2017 according to the article, so it wasn’t like he made it in a month. Apparently, he was a music maker, he just wasn’t getting anywhere with his produced music. As he has been doing it since 2017, this predates even ChatGPT by about 5 years. He was not using the current crop of “AI” tools. My guess, he was just using a script of some kind to compile together loop tracks to mass-produce generic EDM music. Because AI is the current buzzword, this automation is being called AI by news outlets.

In the end though, the automation part is not the illegal part, it’s the scamming using bots that is the illegal part, as morally justified as it may be. Spotify is extremely popular, but Spotify doesn’t make artists any money. For example, Snoop Dogg, one of the most popular rap musicians ever, made about $45,000 for a billion plays. And a billion plays is a LOT. My favorite artist Aurora, has just under a billion plays on her most popular track, Runaway. The next most popular of her tracks is almost half that and third place is about 150 million plays.

Snoop Dogg has a LOT of plays.

The point is, that Spotify isn’t exactly the patron saint of supporting artists, and so the fraudster in the story above may be a bit morally justified in his efforts. That’s part of why I prefer to buy music, digitally, on CDs, on Vinyl. A larger chunk goes to the artist that way, especially on Bandcamp Fridays or buying direct from the band’s website, or even direct from the band at a show.

Anyway, I am not here to try to defend the guy in the original article above, just to talk a bit about AI and the Internet. I seriously doubt he is the only one doing this. He is just the first to get caught. Or at least the first high-profile one. Especially with current tools of AI, making it easier than ever to mass-produce garbage. Heck, I am pretty sure record labels themselves use software to pump up numbers on certain artists, less for the Spotify money, but for marketing.

But this also likely pushes into other areas too. It would be easy to do similar tricks on YouTube with bots, or Kindle Unlimited, just bots turning pages in free, AI-created eBooks.

A long while ago, probably a decade now, I came across a post on 4chan’s /g/ board (/g/ = Technology) with a guide on how to set up a Blogspot blog using scraping tools, add it to a ring of other Blogspot blogs, then automated a script that would click through the blogs gathering AdSense money from Google, to the benefit of anyone involved. I am pretty sure this was a regular post too, to keep new people coming in.

It’s the same principle as the automated Spotify system above. Hell, it may even be the brainchild of the same folks.

Which is all in the end just a version of the Dead Internet Theory.


The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content manipulated by algorithmic curation to intentionally manipulate the population and minimize organic human activity.

Which is probably less about “manipulating the population” and more just about extracting wealth through automated systems. Like the top level morally gray hero, it’s all a sort of, not necessarily evil activity. It’s very “Digital Robin Hood” in a way. Except instead of directly taking from the rich to give to the poor, this Robin Hood is out making posts on 4chan on how to create automated blog systems. I mean, Google has replaced all of its systems for support and everything with bots, why shouldn’t the users replace themselves with bots as well? It’s bots all the way down!

Bots are trivially easy to build as well. One of the lessons in my 100 Days of Python class was making a bit that would play a cookie clicker game in a maximum my efficient way.

Even without using software it gets done in manual ways in the real world sometimes, for marketing purposes.  It’s all just manipulating the algorithm for money.  I guess in the end the trick is to do it in a way that it doesn’t harm the “wrong people”.  Sometimes I feel like I could be rich if I weren’t so honest because a lot of this isn’t that hard to do.

On the Doing of the Things

Long time no post, or, sort of, I have been posting on Lameazoid as part of Blaugust, but even that has sort of fallen apart completely. I wasn’t planning to do the full 31 days, then it just started happening, but then it just… wasn’t.

I think mostly I have just still been in a weird funk lately and I was sort of shaking it for posting but not really. I have also been busy off and on with life stuff. My wife and daughter have rented a shop space. My daughter is opening a vintage shop in the front half, something she has wanted to do for a while, and they will be able to run all their online sales stuff from the back and be better organized and productive with it.

The shop isn’t open but there is a website full of links for the online stuff at RTThrift.com.

The shop itself has needed a bit of clean up and work to get set up, and though they have been doing a lot of that, I still get recruited to do things like, haul hundreds of totes from storage to the shop, and sand the entire upstairs with an upright floor sander. Let me tell you, using that thing was surprisingly fun. Highly recommended.

It’s heavy as fuck though, bring a friend to lift it into the car, even if you split it into two-pieces like we discovered in time for the return trip.

There has already been a lot of interest in it through the locals in town promoting it coming and people we talked to at one of our many garage sales.

For my various hobbies I write about here instead of there, There hasn’t been much exciting going on.i have not done any code or electronics projects recently, like blogging, this whole endless funk has me slacking on learning and other things. I have been listening to a ton of music, but nothing I had any impulse to write about.

The garden is going pretty meh, both my lemon and lime trees died, and I am barely getting any peppers or cherry tomatoes. And no regular tomatoes. My basil and oregano aren’t doing great. My mint is going pretty gangbusters but I don’t really know what to do with it. I mostly planted it as a pest deterrent.

I do have a fun Kickstarter device finally shipping that I will have to write about once I receive it and get a chance to play with it some.

I will also add that it’s not really writing that I am down on, just blogging. I have been writing personal journals to Joplin pretty regularly. It’s just not stuff I intend to share.

Twitter Archive to Markdown

I have been wanting to convert my Twitter Export archive into a more useful format for a while. I had vague dreams of maybe turning them into some sort of digest posts on WordPress but there are a fuckton of tweets (25,000) and it would require a LOT of scrubbing. I could probably manage to find and remove Retweets pretty easily, but then there is the issue of media and getting the images into the digest posts and it’s just not worth the hassle.

What I can, and did do, is preserve the data is a better, more digestible, and searchable format. Specifically, Markdown. Well, ok, the files are not doing anything fancy, so it’s just, plaintext, pretending to be Markdown.

I have no idea if Twitter still offers an export dump of your data, I have not visited the site at all in over a year. I left, I deleted everything, I blocked it on my DNS. But assuming they do, or you have one, it’s a big zip file that can be unrolled I to a sort of, local, Twitter-like interface. There are a lot of files in this ball, and while I am keeping the core archive, I just mostly care about the content.

If you dig in, it’s easy to find, there is a folder called data, the tweets are in a file called “tweets.js.”. It’s some sort of JSON/XML style format. If you want to media, it’s in a folder called “Tweets_Media” or something like that. I skimmed through mine, most of the images looked familiar, because I already have them, I removed the copy because I didn’t need it.

But I kept the Tweets.js file.

So, what to do with it? It has a bunch of extraneous meta data for each Tweet that makes it a cluttered mess. It’s useful for a huge website, but all I want is the date and the text. Here is a sample Tweet in the file.

{
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "508262277464608768"
          ],
          "editableUntil" : "2014-09-06T15:05:44.661Z",
          "editsRemaining" : "5",
          "isEditEligible" : true
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
      },
      "display_text_range" : [
        "0",
        "57"
      ],
      "favorite_count" : "0",
      "id_str" : "508262277464608768",
      "truncated" : false,
      "retweet_count" : "0",
      "id" : "508262277464608768",
      "created_at" : "Sat Sep 06 14:35:44 +0000 2014",
      "favorited" : false,
      "full_text" : "\"Sorry, you are over the limit for spam reports!\"  Heh...",
      "lang" : "en"
    }
  },

So I wrote a quick and simple Python Script (it’s below). I probably could have done something fancy with Beautiful Soup or Pandas, but instead I did a quick and basic scan that pulls the data I care about. If a line contains “created_at” pull it out to get the data, if it has “full_text”, pull it out to get the text.

Once I was able to output these two lines, I went about cleaning them up a bit. I don’t need the titles, so I started by splitting on “:”. This was quickly problematic if the Tweet contained a semicolon and because the time contained several semicolons. Instead I did a split on ‘ ” : ” ‘. Specifically, quote, space, semicolon, space, quote.”. Only the breaks I wanted had the spaces and quotes, so that got me through step one. The end quotation mark was easy to slice off as well.

I considered simplifying things by using the same transformation on the date and the text, but the data also had this +0000 in it that I wanted to remove. It’s not efficient, but it was just as simple to just have two, very similar operations.

After some massaging, I was able to output something along the lines of “date – text”.

But then I noticed that for some reason the Tweets are apparently not in date order. I had decided that I was just going to create a series of year based archival files, so I needed them to be in order.

So I added a few more steps to sort each Tweet during processing into an array of arrays based on the year. Once again, this isn’t the cleanest code, It assumes a range of something like, 2004 to 2026, which covers my needs for certain. I also had some “index out of range” errors with my array of arrays, which probably have a clever loopy solution, but instead it’s just a bug pre-initialized copy/paste array.

Part of the motivation of doing the array of arrays was also that I could make the script output my sorted yearly files directly, but I just did it manually from the big ball final result.. the job is done, but it could easily be done by adjusting the lower output block a bit.

Anyway, here is the code, and a link to a Git repository for it.

# A simple script that takes an exported tweets.js file and outputs it to a markdown text file for archiving.
# In pulling data for this, I noticed that older Twitter exports use a csv file instead of a .js file.
# As such, this is for newer exports.
# The Tweets.js file is in the 'data' directory of a standard Twitter archive export file.

# Open the tweet.js file containing all the tweets, should eb in the same folder
with open("tweets.js", encoding="utf-8") as file:
    filedata = file.readlines()

tweet_data = []
current_tweet = []
# The Tweets don't seem to be in order, so I needed to sort them out, this is admitedly ugly
# but I only need to cover so many years of sorting and this was the easiest way to avoid index errors
sorted_tweets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

# Does a simple search through the file.  It pulls out the date posted and the full text.
# This does not do anything with images, sorry, that gets more complicated, it would be doable
for line in filedata:
    if "created_at" in line:
        timesplit = line.split(":")
        posted_at = line.split("\" : \"")[1].replace(" +0000 ", " ")[:-3]
        current_tweet.append(posted_at)
    elif "full_text" in line:
        current_tweet.append(line.split("\" : \"")[1][:-3])
        #        current_tweet.append(line.split(":")[1].split("\"")[1])
        tweet_data.append(current_tweet)
        current_tweet = []
        # Because full text is always after the date, it just moves on after it gets both
    else:
        pass

# An ugly sort, it simply looks for the year in the date, then creates an array of arrays based on year.
# I did it this way partly in case I wanted to output to seperate files based on year, but I can copy/paste that
# It probably is still out of order based on date, but whatever, I just want a simple archive file
for each in tweet_data:
    for year in range(2004, 2026):
        if str(year) in each[0]:
            sorted_tweets[year - 2004].append(each)

# Prints the output and dumps it to a file.
with open("output.md", encoding="utf-8", mode="w") as output:
    for eachyear in sorted_tweets:
        for each in reversed(eachyear):
            output.write(each[0] + " : " + each[1] + "\n")
            print(each[0] + " : " + each[1])