Programming Projects

Porting Playlists With Python

I had a brief sting last year where I was using Spotify, but I dumped it, mostly for financial reasons, but also because as much as I like the ability to just, listen to whatever, I kind of dislike the whole “Music as a service” aspect. I can still find new stuff via Youtube and then add it to my list of “Albums to maybe buy eventually”.

One thing I lost though was my Playlists. I was worried they were just gone, soon after logging in, I swear they hd just vanished, but checking now, they seem to all be there again. Whatever the case, I wanted a backup copy.

This is of course, an arduous thing to do, particularly my large “play random tracks” list, which has 1200+ songs. I don’t have time to type all that out, or to search and find all these tracks on Youtube. There are services, but they tend to be limited unless you want to pay, which is more annoying than anything.

Exporting from Spotify

Thankfully, i can use Python. I needed a script that would pull down my playlists and dump them to simplet text files. I actually had originally asked Perplexity to build this script, which it did, but the API method it used didn’t match the one I had previously used during my Python class, to make a Plylist generator for Spotify.

Instead of doing what would probably be the easier thing, and figuring out whate OAUTH method the Perplexity script uses, I just, rebuilt things using the Spotipy library, which is what I had used previously. So this script is one I made, for the most part.

It connects and gets a list of all the playlists you have, then loops through that list, and on ech playlist, pulls down all the track names, and writes them to a text file, in the format Artist – Album – Track Name.

The credentials go into a file int he same directory called auth,py with the following format of your Spotify Developer credentials.  Keep the quotation marks.

SPOTIPY_CLIENT_ID = "YOUR CLIENT ID"  
SPOTIPY_CLIENT_SECRET = "YOUR CLIENT SECRET"  
SPOTIPY_REDIRECT_URI = "http://localhost"  
SPOTIFY_USERNAME = "YOUR USER ID NUMBER"
import requests
import os
import spotipy
from auth import *
from spotipy.oauth2 import SpotifyOAuth

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id=SPOTIPY_CLIENT_ID,
                                               client_secret=SPOTIPY_CLIENT_SECRET,
                                               redirect_uri=SPOTIPY_REDIRECT_URI,
                                               scope="user-library-read",
                                               cache_path="token.txt"))

def get_all_playlists():
    playlists = []
    limit = 50
    offset = 0
    playlists = sp.current_user_playlists(limit, offset)
    return playlists

## https://stackoverflow.com/questions/39086287/spotipy-how-to-read-more-than-100-tracks-from-a-playlist
def get_playlist_tracks(username,playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

def save_playlists_to_files(this_list, listname):
    if not os.path.exists('lists'):
        os.makedirs('lists')
    # Sanitize filename for filesystem
    safe_name = listname.replace('/', '_').replace('\\', '_')
    filename = f"lists/{safe_name}.txt"

    with open(filename, 'w', encoding='utf-8') as f:
         f.write(f"Playlist: {listname}\n")
         f.write("Tracks:\n")
            # Optionally, you can fetch and list track names here
         for eachtrack in this_list:
             f.write(f"{eachtrack}\n")

playlists = get_all_playlists()
#print(playlists)
for each in playlists['items']:
   this_list=[]
   #print(each['name'])
   listid = each['id']
   ownerid = each['owner']['id']
   #print("\n")
   mytracks = get_playlist_tracks(ownerid,listid)
   for eachtrack in mytracks:
      trackentry = f"{eachtrack['track']['artists'][0]['name']} - {eachtrack['track']['album']['name']} - {eachtrack['track']['name']}"
      this_list.append(trackentry)
      #print(trackentry)
   save_playlists_to_files(this_list, each['name'])

Everything gets output to a folder called “lists”.

Importing to Youtube

But what to do with these lists?  It’s going to be a bit more complicated to try to get Python to build them from my private music collection.  I have a LOT of the tracks, I don’t have all of the tracks, I also would need it to scan through well, it’s a fuckton, of music files, some tens of thousands, maybe more, decide on a file, and add it to a winamp or VLC playlist.

What I can do though, for now, is make a big ass YouTube Playlist.  

I have no experience with the Youtube API, so I just asked Perplexity for this script, specifically:

“create a python script that will take a text file list of sings, as an input, one song on each line, formatted “artist – Album – Song title” and search Youtube for the artist and song, and add the first result to a new playlist named after the name of the file”

It did some thinking, then gave me a script and instructions on how to set up OAUTH credentials on Youtube.  I then did a test run of the script on one of the shorter list files and, sure enough, it worked perfectly.  I have included the script below.

You need to create an app here, and create OATH Credentials, and download the file, place it int he folder with the script below, renammed to “client_secret.json”.

The script requires the following dependencies.

pip install google-auth-oauthlib google-auth-httplib2 google-api-python-client

Something not mentioned by Perplexity, that I found a solution for on Stack Overflow, after getting an error, you need to add users. On the App page (you should be sitting there after creating the app), Select the “Audience” tab on the side bar, then a bit down, add a “Test User” by email address, which is the email address associated with your Youtube Channel that you want ot attach the playlists.

import os
import argparse
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow

SCOPES = ["https://www.googleapis.com/auth/youtube.force-ssl"]

def create_playlist_and_add_songs(file_path):
    # Authenticate and build service
    flow = InstalledAppFlow.from_client_secrets_file("client_secret.json", SCOPES)
    credentials = flow.run_local_server(port=0)
    youtube = build("youtube", "v3", credentials=credentials)

    # Get playlist name from filename
    playlist_name = os.path.splitext(os.path.basename(file_path))[0]

    # Create new playlist
    playlist = youtube.playlists().insert(
        part="snippet,status",
        body={
            "snippet": {
                "title": playlist_name,
                "description": f"Auto-generated from {playlist_name}"
            },
            "status": {"privacyStatus": "private"}
        }
    ).execute()
    playlist_id = playlist["id"]

    # Process songs
    with open(file_path, "r") as f:
        for line in f:
            parts = line.strip().split(" - ", 2)
            if len(parts) != 3:
                print(f"Skipping malformed line: {line}")
                continue

            artist, album, song = parts
            query = f"{artist} {song}"
            
            # Search for video
            search_response = youtube.search().list(
                q=query,
                part="id",
                maxResults=1,
                type="video"
            ).execute()
            
            if not search_response.get("items"):
                print(f"No results for: {query}")
                continue
            
            video_id = search_response["items"][0]["id"]["videoId"]

            # Add to playlist
            youtube.playlistItems().insert(
                part="snippet",
                body={
                    "snippet": {
                        "playlistId": playlist_id,
                        "resourceId": {
                            "kind": "youtube#video",
                            "videoId": video_id
                        }
                    }
                }
            ).execute()
            print(f"Added {artist} - {song}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("input_file", help="Text file containing songs")
    args = parser.parse_args()
    create_playlist_and_add_songs(args.input_file)

And here is the final imported version of my Raffaella playlist from Spotify.

Python – Simple URL Extractor

This one is pretty basic tier, but more useful than most. I had a website full of links to pages of video files. I wanted a list that I could stick into yt-dlp. I could do a bunch of copy and pasting, or I could use Python to scrape and take all the links.

It takes a URL as a CLI Argument, specifically like:

> main.py https://website.com

It skims the page with Beautiful Soup and spits out a text file with the date-time-url.txt format. Useful if a site changes over time.

The site I was scraping was using some relative links, so it checks for “http” in the URLs and if its present, just dumps the URL, otherwise, it prepends “REPLACEME” in front of the link, so it’s easy to do a Find/Replace operation and add whatever the full URL is.

For example, if the URL is “/video/12345.php”, which takes you to “website.com/video/12345.php”, it outputs “REPLACEME /video/12345.php” on each line. It’s easy to then replace the “REPLACEME” with the URL on 1-1000+ URLs. I didn’t just add the URL because, at least for my use case, the links added a bit more than just the base URL, and I wanted it to function more universally.

Anyway, here is the script. I don’t think it uses any non-standard library that would need a pip install or anything but if it does, it’ll complain and tell you what to do.

## Simple URL Extractor
## ToUse, at CLI $> python3 main.py [Replace With URL No Braces]
## Will output a list of all href links on a page to a file witht he date time and URL.
## Useful for pushing to a bulk downloader program, though it does not processing so URLs may need to be edited
## If there is not full URL, it pre prends an easily find/replaceable slug

import httplib2
import sys
from datetime import datetime
from bs4 import BeautifulSoup, SoupStrainer

current_datetime = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

try:
    url = sys.argv[1]
except IndexError:
    print("Error: No URL Defined! Please use main.py [URL]")
    sys.exit(1)

http = httplib2.Http()
status, response = http.request(url)
filename = f"{current_datetime}-{url.split('//')[1].replace('/','_')}.txt"

with open(filename, "x") as f:
    for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
        if link.has_attr('href'):
            #print(link['href'])
            write_link = link['href']
            if "http://" not in write_link:
                write_link = f"REPLACEME_{write_link}"
            f.write(f"{write_link}\n")


## Reference
## https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup
## https://stackoverflow.com/questions/4033723/how-do-i-access-command-line-arguments
## https://stackoverflow.com/questions/14016742/detect-and-print-if-no-command-line-argument-is-provided

Coding with Perplexity AI – Hirst Painting Drawer

Ive not been using AI a lot, frankly, I find it to be pretty lame for the most part, the images are almost always weirdly uncanny and ugly, and the writing is just bland. I’ve heard it’s pretty good at coding though, and I have not tried using it for code at all. So I decided to give it a go. Specifically, I wanted to use it to augment an existing project from when I was taking that 100 Days of Python course. Specifically, Day 18, the Hirst Painting Project.

The full original code is here:

import colorgram
from turtle import Turtle, Screen
from random import choice
turtle.colormode(255)

#Sample image, cover of CHVRCHES Every Open Eye Album
color_extracted = colorgram.extract("image.jpg", 20)
color_choices = []
for i in color_extracted:
    color_choices.append((i.rgb.r, i.rgb.g, i.rgb.b))

# debug print(color_choices)
#color_choices = [(193, 137, 150), (128, 74, 88), (22, 28, 47), (59, 32, 48), (219, 210, 206), (184, 161, 155), (17, 11, 11), (174, 101, 116), (217, 179, 189), (148, 152, 159), (94, 47, 60), (93, 104, 114), (227, 201, 206), (154, 159, 156), (122, 83, 78), (209, 183, 180), (203, 206, 202), (164, 109, 105), (81, 95, 91), (59, 60, 74)]


marker = Turtle()
marker.speed(100)
marker.penup()
marker.hideturtle()

for y_coord in range(1,10):
    for x_coord in range(0,10):
        marker.setpos(-300 + x_coord * 50,-280 + y_coord * 50)
        marker.pencolor(choice(color_choices))
        marker.dot(30)

screen = Screen()
screen.screensize(600, 600)
screen.exitonclick()

This code, which is also on GitHub with the sample, image, but any image will work, will read the color pallet of a file, “image.jpg”, then draws a simple series of dots in the style of a painting by Damien Hirst. I didn’t pick the theme, it was part of the course, but I do think the result is simple and quite neat.

I have for a while, wanted to make a few updates to this simple program, and sort of tried to a few times, but this time, I let AI do the work. I really wanted two main features.

  • The ability to open any file, instead of having to put a file in the folder and rename it image.jpg
  • The ability to export the result to an image file

I chose Perplexity AI for my assistant. I wanted to use it as a sort of, accompanying tool, rather than letting ti write all the code. I already have the simple drawing code.

I started by asking it for a simple request:

Can you create a python script that will open a turtle graphics window, 1024x768 in size, draw a circle of 5 pixels thickness, diameter 100 pixels, and include a button that will export the canvas to a png or jpg file

Which it did, I could run the code, it would draw a circle, then I could click a button and save an image of a circle. Though, I did come across an issue I never quite fixed.

It would save the dialogue box along with the canvas. It’s basically just, taking a screen shot.

The solution at the moment is to make sure I drag the dialog box off to the side before saving.

Next:

Can you add a "file Open" button at the bottom that passes the file in as a variable and does not draw anything until a file as been selected

Initially, I wanted to make sure it wouldn’t draw an image until the file was loaded, so the file select box doesn’t actually do anything. Later I changed it to allow for drawing without selecting an image, it just defaults to the ‘Every Open Eye’ color set in my original code.

This worked out as expected as well. Now I had the basic structure to slip my existing code in. I had a file as a variable and a mechanism that drew something (currently, a circle). The code it was giving me though, used a class structure though, which is fine, but my existing code doesn’t. I managed to insert my dots drawing code fine, this required renaming some variables to align, specifically, all of the ‘marker’ variables became “self.pen” at the appropriate location. I had trouble though getting the colormode to work properly. I wasn’t sure where to put it in the code, as relating to the class structure.

I have to say, I probably had it correct, but I also realized later I was having some virtual environment issues between VS Code and my venv and the system. Despite VSCode showing that the imports were resolved, when running things, I got not found’ errors. I ended up just running the code from a venv sourced terminal outside of VSCode. It”s a problem to be fixed later.

The first problem that came up here though, Perplexity had added a function that would display the loaded image, as a backdrop behind the dots. This is not the functionality I wanted.

I just found the function and stripped it out manually.

Then I found it again, because it loaded the image as a backdrop when opening a file, and then again when drawing the dots.

Something also notable here. At one point, I took my working code, with the draw dots, and fed it back to Perplexity, telling it, ‘I added some code, please make this the new baseline.’ This worked out, perfectly. Going forward, it worked off my updated code. Even more surprisingly, it detected the new function of the drawing, to draw dots, instead of a single circle, and it renamed the internal references ON ITS OWN.

I was pretty impressed with that.

I had the basic functionality down, but wanted to do some cleanup. After running it over and over, and having to navigate to a directory with pictures each time, I asked it to change the file open and save to default to the user’s home folder. I also asked it to only look for image files, to avoid errors from other file types. I also had it resize the image down to center and fit the dots better.

I couldn’t solve the dialogue box option. I tried to, It added a short delay on the save, but that just reulted in a saved image of the things behind the drawing. I tried to get it to position the save dialog outside of the window, but the code there didn’t seem to actually DO anything.

I also added a few last-minute features. One, is a way to update the background color. It had actually had this feature originally, but I asked it to remove it, because at the time, I didn’t want it.

I also had it add some boxes that allow for selecting how many rows and columns will be drawn. I may look into having it draw larger canvases or maybe things that are not dots in the future. It’s pretty functional as it is though. Well, at least as functional as a program that makes dot images can be.

Anyway, I’ll make a GitHub Repository probably, but for now, the full updated code is below. Also, it turns out I can share my Perplexity chat, so you can also see the full chat here.

## pip install colorgram.py

import turtle
import colorgram
from tkinter import *
from tkinter import filedialog
from tkinter import colorchooser
from random import choice
from PIL import ImageGrab
import os

class TurtleApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Turtle Graphics with Image Open and Save")
        
        self.canvas_width = 600
        self.canvas_height = 550
        
        self.canvas = Canvas(root, width=self.canvas_width, height=self.canvas_height)
        self.canvas.pack()
        
        self.screen = turtle.TurtleScreen(self.canvas)
        self.screen.colormode(255)
        self.screen.bgcolor("white")
        
        self.pen = turtle.RawTurtle(self.screen)
        self.pen.pensize(5)
        self.pen.hideturtle()
        
        self.file_path = None
        self.color_choices = [(193, 137, 150), (128, 74, 88), (22, 28, 47), (59, 32, 48), (219, 210, 206), 
                              (184, 161, 155), (17, 11, 11), (174, 101, 116), (217, 179, 189), (148, 152, 159), 
                              (94, 47, 60), (93, 104, 114), (227, 201, 206), (154, 159, 156), (122, 83, 78), 
                              (209, 183, 180), (203, 206, 202), (164, 109, 105), (81, 95, 91), (59, 60, 74)]
        
        self.rows = 9
        self.columns = 10
        
        self.create_widgets()

    def create_widgets(self):
        button_frame = Frame(self.root)
        button_frame.pack(side=BOTTOM, fill=X)

        open_button = Button(button_frame, text="Open Image", command=self.open_file)
        open_button.pack(side=LEFT, padx=5, pady=5)
        
        draw_button = Button(button_frame, text="Draw Dots", command=self.draw_dots)
        draw_button.pack(side=LEFT, padx=5, pady=5)
        
        save_button = Button(button_frame, text="Export Canvas", command=self.save_canvas)
        save_button.pack(side=LEFT, padx=5, pady=5)

        bg_color_button = Button(button_frame, text="Change Background", command=self.change_background_color)
        bg_color_button.pack(side=LEFT, padx=5, pady=5)

        exit_button = Button(button_frame, text="Exit", command=self.exit_app)
        exit_button.pack(side=LEFT, padx=5, pady=5)

        # Add row and column input
        row_label = Label(button_frame, text="Rows:")
        row_label.pack(side=LEFT, padx=5, pady=5)
        self.row_entry = Entry(button_frame, width=5)
        self.row_entry.insert(0, str(self.rows))
        self.row_entry.pack(side=LEFT, padx=5, pady=5)

        col_label = Label(button_frame, text="Columns:")
        col_label.pack(side=LEFT, padx=5, pady=5)
        self.col_entry = Entry(button_frame, width=5)
        self.col_entry.insert(0, str(self.columns))
        self.col_entry.pack(side=LEFT, padx=5, pady=5)

    def open_file(self):
        # Get the user's home directory
        home_dir = os.path.expanduser("~")
        
        self.file_path = filedialog.askopenfilename(
            initialdir=home_dir,  # Set initial directory to user's home folder
            filetypes=[
                ("Image files", "*.png *.jpg *.jpeg *.gif *.bmp"),
                ("PNG files", "*.png"),
                ("JPEG files", "*.jpg *.jpeg"),
                ("GIF files", "*.gif"),
                ("BMP files", "*.bmp")
            ]
        )
        if self.file_path:
            print(f"Image selected: {self.file_path}")
            self.root.title(f"Turtle Graphics - {os.path.basename(self.file_path)}")
            self.extract_colors()

    def change_background_color(self):
        color = colorchooser.askcolor(title="Choose background color")
        if color[1]:  # color is in the format ((r, g, b), hexcode)
            self.screen.bgcolor(color[1])
            print(f"Background color changed to {color[1]}")

    def extract_colors(self):
        color_extracted = colorgram.extract(self.file_path, 20)
        self.color_choices = []
        for i in color_extracted:
            self.color_choices.append((i.rgb.r, i.rgb.g, i.rgb.b))
        print("Colors extracted from the image")

    def draw_dots(self):
        self.pen.clear()
        
        try:
            self.rows = int(self.row_entry.get())
            self.columns = int(self.col_entry.get())
        except ValueError:
            print("Invalid row or column value. Using default values.")

        self.pen.speed(100)
        self.pen.penup()
        self.pen.hideturtle()

        dot_size = 30
        spacing_x = self.canvas_width / (self.columns + 1)
        spacing_y = self.canvas_height / (self.rows + 1)
        start_x = -self.canvas_width / 2 + spacing_x
        start_y = self.canvas_height / 2 - spacing_y

        for y_coord in range(self.rows):
            for x_coord in range(self.columns):
                self.pen.setpos(start_x + x_coord * spacing_x, start_y - y_coord * spacing_y)
                self.pen.pencolor(choice(self.color_choices))
                self.pen.dot(dot_size)

    def save_canvas(self):
        # Get the main window's position and size
        window_x = self.root.winfo_x()
        window_y = self.root.winfo_y()
        window_width = self.root.winfo_width()
        
        # Calculate the position for the dialog box
        dialog_x = window_x + window_width + 10  # 10 pixels to the right of the main window
        dialog_y = window_y
        
        # Get the user's home directory
        home_dir = os.path.expanduser("~")
        
        # Open the save dialog at the calculated position
        self.root.update()  # Ensure the window size is updated
        save_path = filedialog.asksaveasfilename(
            parent=self.root,
            defaultextension=".png",
            filetypes=[("PNG files", "*.png"), ("JPEG files", "*.jpg *.jpeg")],
            initialdir=home_dir,  # Set initial directory to user's home folder
        )
        
        if save_path:
            # Move the dialog to the desired position
            self.root.geometry(f"+{dialog_x}+{dialog_y}")
            
            x0 = self.root.winfo_rootx() + self.canvas.winfo_x()
            y0 = self.root.winfo_rooty() + self.canvas.winfo_y()
            x1 = x0 + self.canvas.winfo_width()
            y1 = y0 + self.canvas.winfo_height()
            
            ImageGrab.grab(bbox=(x0, y0, x1, y1)).save(save_path)
            print(f"Canvas saved as {save_path}")
            
            # Reset the main window position
            self.root.geometry(f"+{window_x}+{window_y}")

    def exit_app(self):
        self.root.quit()
        self.root.destroy()

root = Tk()
app = TurtleApp(root)
root.mainloop()

2024.11.08 – Code Project – Python – Improved FreshRSS Link Lists

Today, I want to talk about recent improvements I have made on my FreshRSS to WordPress Digest Python script. And to make a note on what I would like to do next.

This is the script I used to produce these Link List Posts on [Blogging Intensifies] and Lameazoid. The Github Repository for it is here.

  • The first version was simple, it pulled from the sharded feed of FreshRSS, collected favorited articles, formatted them a bit, then posted a wordpress post of links.
  • Overtime, I wanted these posts to be prettier, so I added a bit more massaging to the formatting, and some HTML code so the links would show up in pretty little formatted boxes. I also decided some sort of summary would be useful, so it pulls the first 100 words or so from a feed item as a teaser.
  • Initially I was using the main share feed from FreshRSS, but I have two blogs, each with vague but disting “themes”. Sharing video game news to [BI] felt a bit silly. Not that it matters, no one reads those lists and its mostly for my reference. I found you could narrow things down by personal tags, so altered the Python script to handle any number of configured blogs and now both get seperate link lists.

So, whats new this round?

Not a lot on the externally visible end. A week or so ago, I found that the Raspberry Pi I had running the script on had died on me. Or, more likely, the SD card did. Whatever the case, the script was not running as scheduled. I have always had a bit of a love/hate with the scheduled run. Some days I barely share anything from the reader, so it makes for weird small posts. Also it ran at like, 10:30P, which was kind of late, and occasionally, I found myself rushing to get through everything to make sure I flagged anything relevant, because I was flipping through it at 5 minutes till.

Right now, I am running it manually. And at irregular intervals. This created a new problem, but its one I had already been planning to fix. The way the FreshRSS shared feed works, you can append a number, X, and get “everything in the last X hours.” When it ran on a cronjob schedule every 24 hours, this number was simply, hardcoded at 24.

When manually running things, I needed X to be “however many hours since it last ran.” So now, it writes out a simple file with a time stamp, after each run. It also pulls in that file, and calculated the time difference. If the time difference is less than an hour, it defaults to an hour, because “X=0”, just gives the default feed, which may be everything, or may be the last ten items. I am not sure on the limits. If there isn’t a timestamp file, most likely if its being first run, it sets the hours different to 0, and gets everything.

Something else I added this round, everytime I wanted to do modifications, I needed to comment out some lines and uncomment others, so the script would not spam my blog with the same post over and over. Also, I have a time stamp file now that I don’t want to overwrite when testing, since it will probably stop seeing feed items unless they were marked in the last hour.

So I added a flag variable at the top, and encapsulated the business end output in some conditional statements. Now, when I want to test, I just change the “runmode” variable at the tol to “False”, and it stops posting or editing the time stamp file.

This was also needed for my third new feature. In addition to posting to the blog, it spits everything out into a simple, dated markdown file. This way, I have a private record of everything shared, since a lot of the point is, “I want to keep these links for reference.” Initially I just spit out the post data, but that was ugly since it was full of HTML tags. So It now compiles together a second, markdown formatted, variable, that gets written to the file. Another key difference, in my private files, I dump the entire article, and not just the 100 word summary.

I don’t want to repost entire articles on my blog, its rude and ugly, hence the summary, but for a private archive of text data, dumping it all is preferable. Articles disappear ALL the time, this is literally 100% of why I save and archive shit like this in the first place.

Some changes I still want to make

  • Right now it dumps everything into one output file, I may split this across blogs/topics
  • Another advantage of the private archive, I can add any number of additional tags to pull, that don’t have to be posted anywhere. I can just, pull them to a text archive. I already have started a recipes tag, for example. I also added a to be used flag in the config for it a feed gets posted anywhere.
  • I am kind of down on the idea of AI, but I still may look into hooking the summary function to some sort of AI service to create actual summaries. In my very very vague testing, it had a hard time keeping it short, even when instructed to do so.
  • I kind of want to modify the script to also produce a queue of links, then maybe a second script on a schedule that posts any links out of a file on microblogging services like Mastodon.
  • I would love to find a way to share links I find elsewhere to FreshRSS or this script.
  • I kind of want to find a way to sort and group posts under categories (Music, Coding, Video Games, etc). I have ideas oh how, but they are not all very… Elegant.
  • I kind of dislike having the timestamp file, I would like to figure out a way to query the WordPress Blog itself for “The last post marked link list,” and go off of that as the “last run date.”

The “Holy Grail” want, is the ability to add comments to shares. I put in a suggestion on the Github page for a “Notes” feature. I am seriously just considering making my own plug in. This would be super useful for WHY I shared a link. I could use this later in the social sharing queue system as well. The idea would be, as an example, I tag a post for a new CHVRCHES album, then add a little note, “I am super looking forward to this!”, then on the digest, the comment would show up.

Twitter Archive to Markdown

I have been wanting to convert my Twitter Export archive into a more useful format for a while. I had vague dreams of maybe turning them into some sort of digest posts on WordPress but there are a fuckton of tweets (25,000) and it would require a LOT of scrubbing. I could probably manage to find and remove Retweets pretty easily, but then there is the issue of media and getting the images into the digest posts and it’s just not worth the hassle.

What I can, and did do, is preserve the data is a better, more digestible, and searchable format. Specifically, Markdown. Well, ok, the files are not doing anything fancy, so it’s just, plaintext, pretending to be Markdown.

I have no idea if Twitter still offers an export dump of your data, I have not visited the site at all in over a year. I left, I deleted everything, I blocked it on my DNS. But assuming they do, or you have one, it’s a big zip file that can be unrolled I to a sort of, local, Twitter-like interface. There are a lot of files in this ball, and while I am keeping the core archive, I just mostly care about the content.

If you dig in, it’s easy to find, there is a folder called data, the tweets are in a file called “tweets.js.”. It’s some sort of JSON/XML style format. If you want to media, it’s in a folder called “Tweets_Media” or something like that. I skimmed through mine, most of the images looked familiar, because I already have them, I removed the copy because I didn’t need it.

But I kept the Tweets.js file.

So, what to do with it? It has a bunch of extraneous meta data for each Tweet that makes it a cluttered mess. It’s useful for a huge website, but all I want is the date and the text. Here is a sample Tweet in the file.

{
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "508262277464608768"
          ],
          "editableUntil" : "2014-09-06T15:05:44.661Z",
          "editsRemaining" : "5",
          "isEditEligible" : true
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
      },
      "display_text_range" : [
        "0",
        "57"
      ],
      "favorite_count" : "0",
      "id_str" : "508262277464608768",
      "truncated" : false,
      "retweet_count" : "0",
      "id" : "508262277464608768",
      "created_at" : "Sat Sep 06 14:35:44 +0000 2014",
      "favorited" : false,
      "full_text" : "\"Sorry, you are over the limit for spam reports!\"  Heh...",
      "lang" : "en"
    }
  },

So I wrote a quick and simple Python Script (it’s below). I probably could have done something fancy with Beautiful Soup or Pandas, but instead I did a quick and basic scan that pulls the data I care about. If a line contains “created_at” pull it out to get the data, if it has “full_text”, pull it out to get the text.

Once I was able to output these two lines, I went about cleaning them up a bit. I don’t need the titles, so I started by splitting on “:”. This was quickly problematic if the Tweet contained a semicolon and because the time contained several semicolons. Instead I did a split on ‘ ” : ” ‘. Specifically, quote, space, semicolon, space, quote.”. Only the breaks I wanted had the spaces and quotes, so that got me through step one. The end quotation mark was easy to slice off as well.

I considered simplifying things by using the same transformation on the date and the text, but the data also had this +0000 in it that I wanted to remove. It’s not efficient, but it was just as simple to just have two, very similar operations.

After some massaging, I was able to output something along the lines of “date – text”.

But then I noticed that for some reason the Tweets are apparently not in date order. I had decided that I was just going to create a series of year based archival files, so I needed them to be in order.

So I added a few more steps to sort each Tweet during processing into an array of arrays based on the year. Once again, this isn’t the cleanest code, It assumes a range of something like, 2004 to 2026, which covers my needs for certain. I also had some “index out of range” errors with my array of arrays, which probably have a clever loopy solution, but instead it’s just a bug pre-initialized copy/paste array.

Part of the motivation of doing the array of arrays was also that I could make the script output my sorted yearly files directly, but I just did it manually from the big ball final result.. the job is done, but it could easily be done by adjusting the lower output block a bit.

Anyway, here is the code, and a link to a Git repository for it.

# A simple script that takes an exported tweets.js file and outputs it to a markdown text file for archiving.
# In pulling data for this, I noticed that older Twitter exports use a csv file instead of a .js file.
# As such, this is for newer exports.
# The Tweets.js file is in the 'data' directory of a standard Twitter archive export file.

# Open the tweet.js file containing all the tweets, should eb in the same folder
with open("tweets.js", encoding="utf-8") as file:
    filedata = file.readlines()

tweet_data = []
current_tweet = []
# The Tweets don't seem to be in order, so I needed to sort them out, this is admitedly ugly
# but I only need to cover so many years of sorting and this was the easiest way to avoid index errors
sorted_tweets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

# Does a simple search through the file.  It pulls out the date posted and the full text.
# This does not do anything with images, sorry, that gets more complicated, it would be doable
for line in filedata:
    if "created_at" in line:
        timesplit = line.split(":")
        posted_at = line.split("\" : \"")[1].replace(" +0000 ", " ")[:-3]
        current_tweet.append(posted_at)
    elif "full_text" in line:
        current_tweet.append(line.split("\" : \"")[1][:-3])
        #        current_tweet.append(line.split(":")[1].split("\"")[1])
        tweet_data.append(current_tweet)
        current_tweet = []
        # Because full text is always after the date, it just moves on after it gets both
    else:
        pass

# An ugly sort, it simply looks for the year in the date, then creates an array of arrays based on year.
# I did it this way partly in case I wanted to output to seperate files based on year, but I can copy/paste that
# It probably is still out of order based on date, but whatever, I just want a simple archive file
for each in tweet_data:
    for year in range(2004, 2026):
        if str(year) in each[0]:
            sorted_tweets[year - 2004].append(each)

# Prints the output and dumps it to a file.
with open("output.md", encoding="utf-8", mode="w") as output:
    for eachyear in sorted_tweets:
        for each in reversed(eachyear):
            output.write(each[0] + " : " + each[1] + "\n")
            print(each[0] + " : " + each[1])