Python

Tuesday 2024-11-12 – Link List

Blogging Intensifies Link List for Tuesday 2024-11-12

2024.11.08 – Code Project – Python – Improved FreshRSS Link Lists

Today, I want to talk about recent improvements I have made on my FreshRSS to WordPress Digest Python script. And to make a note on what I would like to do next.

This is the script I used to produce these Link List Posts on [Blogging Intensifies] and Lameazoid. The Github Repository for it is here.

  • The first version was simple, it pulled from the sharded feed of FreshRSS, collected favorited articles, formatted them a bit, then posted a wordpress post of links.
  • Overtime, I wanted these posts to be prettier, so I added a bit more massaging to the formatting, and some HTML code so the links would show up in pretty little formatted boxes. I also decided some sort of summary would be useful, so it pulls the first 100 words or so from a feed item as a teaser.
  • Initially I was using the main share feed from FreshRSS, but I have two blogs, each with vague but disting “themes”. Sharing video game news to [BI] felt a bit silly. Not that it matters, no one reads those lists and its mostly for my reference. I found you could narrow things down by personal tags, so altered the Python script to handle any number of configured blogs and now both get seperate link lists.

So, whats new this round?

Not a lot on the externally visible end. A week or so ago, I found that the Raspberry Pi I had running the script on had died on me. Or, more likely, the SD card did. Whatever the case, the script was not running as scheduled. I have always had a bit of a love/hate with the scheduled run. Some days I barely share anything from the reader, so it makes for weird small posts. Also it ran at like, 10:30P, which was kind of late, and occasionally, I found myself rushing to get through everything to make sure I flagged anything relevant, because I was flipping through it at 5 minutes till.

Right now, I am running it manually. And at irregular intervals. This created a new problem, but its one I had already been planning to fix. The way the FreshRSS shared feed works, you can append a number, X, and get “everything in the last X hours.” When it ran on a cronjob schedule every 24 hours, this number was simply, hardcoded at 24.

When manually running things, I needed X to be “however many hours since it last ran.” So now, it writes out a simple file with a time stamp, after each run. It also pulls in that file, and calculated the time difference. If the time difference is less than an hour, it defaults to an hour, because “X=0”, just gives the default feed, which may be everything, or may be the last ten items. I am not sure on the limits. If there isn’t a timestamp file, most likely if its being first run, it sets the hours different to 0, and gets everything.

Something else I added this round, everytime I wanted to do modifications, I needed to comment out some lines and uncomment others, so the script would not spam my blog with the same post over and over. Also, I have a time stamp file now that I don’t want to overwrite when testing, since it will probably stop seeing feed items unless they were marked in the last hour.

So I added a flag variable at the top, and encapsulated the business end output in some conditional statements. Now, when I want to test, I just change the “runmode” variable at the tol to “False”, and it stops posting or editing the time stamp file.

This was also needed for my third new feature. In addition to posting to the blog, it spits everything out into a simple, dated markdown file. This way, I have a private record of everything shared, since a lot of the point is, “I want to keep these links for reference.” Initially I just spit out the post data, but that was ugly since it was full of HTML tags. So It now compiles together a second, markdown formatted, variable, that gets written to the file. Another key difference, in my private files, I dump the entire article, and not just the 100 word summary.

I don’t want to repost entire articles on my blog, its rude and ugly, hence the summary, but for a private archive of text data, dumping it all is preferable. Articles disappear ALL the time, this is literally 100% of why I save and archive shit like this in the first place.

Some changes I still want to make

  • Right now it dumps everything into one output file, I may split this across blogs/topics
  • Another advantage of the private archive, I can add any number of additional tags to pull, that don’t have to be posted anywhere. I can just, pull them to a text archive. I already have started a recipes tag, for example. I also added a to be used flag in the config for it a feed gets posted anywhere.
  • I am kind of down on the idea of AI, but I still may look into hooking the summary function to some sort of AI service to create actual summaries. In my very very vague testing, it had a hard time keeping it short, even when instructed to do so.
  • I kind of want to modify the script to also produce a queue of links, then maybe a second script on a schedule that posts any links out of a file on microblogging services like Mastodon.
  • I would love to find a way to share links I find elsewhere to FreshRSS or this script.
  • I kind of want to find a way to sort and group posts under categories (Music, Coding, Video Games, etc). I have ideas oh how, but they are not all very… Elegant.
  • I kind of dislike having the timestamp file, I would like to figure out a way to query the WordPress Blog itself for “The last post marked link list,” and go off of that as the “last run date.”

The “Holy Grail” want, is the ability to add comments to shares. I put in a suggestion on the Github page for a “Notes” feature. I am seriously just considering making my own plug in. This would be super useful for WHY I shared a link. I could use this later in the social sharing queue system as well. The idea would be, as an example, I tag a post for a new CHVRCHES album, then add a little note, “I am super looking forward to this!”, then on the digest, the comment would show up.

Keyboard Jiggler with Python and Raspberry Pi

So I did a pretty simple but amusing little project recently, a bit on a whim. Let’s say, I have a few games that it would be useful to just, let idle for experience or whatever. The problem is, that these games also have built-in idle deterrence. Your character falls asleep, or you just time out of the game after five or ten minutes.

I initially start off trying to use AutoHotkey, a program that basically does what I wanted here, you program it to press keys at certain intervals, basically just a simple keep-alive movement every couple of minutes.

It turns out, at least one of these games detects Autohotkey as a cheat, and won’t launch when it’s running.

I got to thinking, I could probably program one of my Arduino boards to emulate a keyboard. And sure enough, there are libraries for this very task. Then I discovered that, the keyboard library does not work on my old Uno boards. But I found an alternative route with my Raspberry Pi Pico that I picked up a few years ago. The Pico could probably do what I needed as well.

After some digging online, I found plenty of guides on how to build a full-sized Keyboard using GPIO pins on the Pico, but nothing quite exactly what I needed, but there was enough Python Code available, I could figure it out pretty easily by stripping apart some full keyboard code. Instead, I just started with a simpler macro button keyboard script. Most of the scripts I came across have code to detect and save a button press, then send the command using the Python Keyboard library. I just stripped all that out and put it on a timer loop with some simple, regular input. A set of repeated w presses, follow d by repeat d presses of a, s, and d.

Essentially, “walk in a little square loop.”

Step one was to set up Circuit Pi on the Pi Pico, which is detailed here, though only the initial setup is needed.

The test using Notepad worked perfectly, aside from one annoying issue, I could not easily edit the code while the device was plugged in, because the test loop would spit out wwwaaasssddd every 5 seconds.

After some careful quick timing, I adjusted that out to every 300 seconds (5 minutes).

It was time to test things out. Before work, I set the game running and the keyboard Jiggler working. When I came back later it was working just fine.

But there were some issues, one of which I could mostly address.

Firstly, the thing just runs, forever. It doesn’t really need to, I only need it going for around 3-4 hours. I added a counter variable to the script that would count how many runs through the loop had occured, and if it was more than a set amount, it would break the loop, which would stop the scripted movement and idle out of the game.

The other two issues are less easy to solve.

One, I had a thought that I could remote to my PC and swap games halfway through the day (at lunch). Except when you leave Remote Desktop, it locks the remote PC. Meaning the game loses focus and the keyboard Jiggler stops working. It’s literally a hardware device that pretends to be a keyboard.

The second issue, that could be easy to solve with some habit changes. I, very often, will use Firefox’s Tab Share to send tabs to either my desktop or my laptop. These are articles I want to clip and save, something I may want to buy later, notes for some projects I had done. Basically, it’s a way to send myself a reminder of something I don’t want to deal with on my phone. When I send a tab, on the remote machine, Firefox pops up and takes focus, meaning, once again, the game idle breaker stops working and it idles out.

The solution here is to just, get into the habit of only sending tabs after noon or so.

Another little improvement I added was a bit of randomization. I am not really worried about “detection,” but it’s easy to avoid by simply, adjusting the 5-minute timer to be random movement between 4 and 5 minutes, as well as randomizing what the movement is a bit.  I also added a bit of correction if the player moves too far away from the starting position.

Anyway, the script below is the completed script.

import time
import random

import board
import digitalio
import usb_hid
## Aquired from https://github.com/adafruit/Adafruit\_CircuitPython\_HID
from adafruit_hid.keyboard import Keyboard
from adafruit_hid.keyboard_layout_us import KeyboardLayoutUS
from adafruit_hid.keycode import Keycode

time.sleep(1)
keyboard = Keyboard(usb_hid.devices)
keyboard_layout = KeyboardLayoutUS(keyboard)  # We're in the US :)
led = digitalio.DigitalInOut(board.LED)
led.direction = digitalio.Direction.OUTPUT
total_runs = 0
running = True
# This is the choices for keys to randomly choose from, this is standard WASD
# This could be changed to be whatever to choose from.
keyOptions = ["w","a","s","d"," "]
# "Starting position" is set to 0,0
position = [0,0]

while running:
    # Turn the LED on while doing things
    led.value = True
    
    # Randomly choose 20-50 as an amount of key presses to do
    howMany = random.randint(20, 50)

    for i in range(howMany):
        # For however many key presses, pick a random one and press it
        nextKey = random.choice(keyOptions)
        keyboard_layout.write(nextKey)
        time.sleep(1)
        # This incriments the position above from 0,0 to track how far from start.
        # This whole section could be omitted if movement can be unconstrained
        if nextKey == "w":
            position[1] +=1
        if nextKey == "s":
            position[1] -=1
        if nextKey == "a":
            position[0] +=1
        if nextKey == "d":
            position[0] -=1
        
        # If we get too far in one direction, correct it by moving back to 0.
        if position[0] >= 10:
            keyboard_layout.write(ssssssssss)
        if position[0] <= -10:
            keyboard_layout.write(aaaaaaaaaa)
        if position[1] >= 10:
            keyboard_layout.write(dddddddddd)
        if position[1] <= -10:
            keyboard_layout.write(aaaaaaaaaa)

    led.value = False
    time.sleep(0.1)
    keyboard.release_all()
    # Sleep a random number of seconds between 200 and 300 seconds
    nextSleep = random.randint(200, 300)
    time.sleep(nextSleep)

    # Incriment how many runs have been done
    total_runs = total_runs+1

    # If the total runs is too many, break the loop and essentially "stop".
    if total_runs > 48:
        running = False

Monday 2024-08-26 – Link List

Blogging Intensifies Link List for Monday 2024-08-26

Twitter Archive to Markdown

I have been wanting to convert my Twitter Export archive into a more useful format for a while. I had vague dreams of maybe turning them into some sort of digest posts on WordPress but there are a fuckton of tweets (25,000) and it would require a LOT of scrubbing. I could probably manage to find and remove Retweets pretty easily, but then there is the issue of media and getting the images into the digest posts and it’s just not worth the hassle.

What I can, and did do, is preserve the data is a better, more digestible, and searchable format. Specifically, Markdown. Well, ok, the files are not doing anything fancy, so it’s just, plaintext, pretending to be Markdown.

I have no idea if Twitter still offers an export dump of your data, I have not visited the site at all in over a year. I left, I deleted everything, I blocked it on my DNS. But assuming they do, or you have one, it’s a big zip file that can be unrolled I to a sort of, local, Twitter-like interface. There are a lot of files in this ball, and while I am keeping the core archive, I just mostly care about the content.

If you dig in, it’s easy to find, there is a folder called data, the tweets are in a file called “tweets.js.”. It’s some sort of JSON/XML style format. If you want to media, it’s in a folder called “Tweets_Media” or something like that. I skimmed through mine, most of the images looked familiar, because I already have them, I removed the copy because I didn’t need it.

But I kept the Tweets.js file.

So, what to do with it? It has a bunch of extraneous meta data for each Tweet that makes it a cluttered mess. It’s useful for a huge website, but all I want is the date and the text. Here is a sample Tweet in the file.

{
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "508262277464608768"
          ],
          "editableUntil" : "2014-09-06T15:05:44.661Z",
          "editsRemaining" : "5",
          "isEditEligible" : true
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
      },
      "display_text_range" : [
        "0",
        "57"
      ],
      "favorite_count" : "0",
      "id_str" : "508262277464608768",
      "truncated" : false,
      "retweet_count" : "0",
      "id" : "508262277464608768",
      "created_at" : "Sat Sep 06 14:35:44 +0000 2014",
      "favorited" : false,
      "full_text" : "\"Sorry, you are over the limit for spam reports!\"  Heh...",
      "lang" : "en"
    }
  },

So I wrote a quick and simple Python Script (it’s below). I probably could have done something fancy with Beautiful Soup or Pandas, but instead I did a quick and basic scan that pulls the data I care about. If a line contains “created_at” pull it out to get the data, if it has “full_text”, pull it out to get the text.

Once I was able to output these two lines, I went about cleaning them up a bit. I don’t need the titles, so I started by splitting on “:”. This was quickly problematic if the Tweet contained a semicolon and because the time contained several semicolons. Instead I did a split on ‘ ” : ” ‘. Specifically, quote, space, semicolon, space, quote.”. Only the breaks I wanted had the spaces and quotes, so that got me through step one. The end quotation mark was easy to slice off as well.

I considered simplifying things by using the same transformation on the date and the text, but the data also had this +0000 in it that I wanted to remove. It’s not efficient, but it was just as simple to just have two, very similar operations.

After some massaging, I was able to output something along the lines of “date – text”.

But then I noticed that for some reason the Tweets are apparently not in date order. I had decided that I was just going to create a series of year based archival files, so I needed them to be in order.

So I added a few more steps to sort each Tweet during processing into an array of arrays based on the year. Once again, this isn’t the cleanest code, It assumes a range of something like, 2004 to 2026, which covers my needs for certain. I also had some “index out of range” errors with my array of arrays, which probably have a clever loopy solution, but instead it’s just a bug pre-initialized copy/paste array.

Part of the motivation of doing the array of arrays was also that I could make the script output my sorted yearly files directly, but I just did it manually from the big ball final result.. the job is done, but it could easily be done by adjusting the lower output block a bit.

Anyway, here is the code, and a link to a Git repository for it.

# A simple script that takes an exported tweets.js file and outputs it to a markdown text file for archiving.
# In pulling data for this, I noticed that older Twitter exports use a csv file instead of a .js file.
# As such, this is for newer exports.
# The Tweets.js file is in the 'data' directory of a standard Twitter archive export file.

# Open the tweet.js file containing all the tweets, should eb in the same folder
with open("tweets.js", encoding="utf-8") as file:
    filedata = file.readlines()

tweet_data = []
current_tweet = []
# The Tweets don't seem to be in order, so I needed to sort them out, this is admitedly ugly
# but I only need to cover so many years of sorting and this was the easiest way to avoid index errors
sorted_tweets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

# Does a simple search through the file.  It pulls out the date posted and the full text.
# This does not do anything with images, sorry, that gets more complicated, it would be doable
for line in filedata:
    if "created_at" in line:
        timesplit = line.split(":")
        posted_at = line.split("\" : \"")[1].replace(" +0000 ", " ")[:-3]
        current_tweet.append(posted_at)
    elif "full_text" in line:
        current_tweet.append(line.split("\" : \"")[1][:-3])
        #        current_tweet.append(line.split(":")[1].split("\"")[1])
        tweet_data.append(current_tweet)
        current_tweet = []
        # Because full text is always after the date, it just moves on after it gets both
    else:
        pass

# An ugly sort, it simply looks for the year in the date, then creates an array of arrays based on year.
# I did it this way partly in case I wanted to output to seperate files based on year, but I can copy/paste that
# It probably is still out of order based on date, but whatever, I just want a simple archive file
for each in tweet_data:
    for year in range(2004, 2026):
        if str(year) in each[0]:
            sorted_tweets[year - 2004].append(each)

# Prints the output and dumps it to a file.
with open("output.md", encoding="utf-8", mode="w") as output:
    for eachyear in sorted_tweets:
        for each in reversed(eachyear):
            output.write(each[0] + " : " + each[1] + "\n")
            print(each[0] + " : " + each[1])