Coding

Advent of Code 2024 – Day 01 – Historian Hysteria

Apparently I only attempt do Advent of Code every other year.  I had this vague thought about doing this thing in JavaScript this year, but considering I am way better at Python than JS, and I have yet to actually complete one of these 100%, i should stick with what am good-ish at.  

The early day are always stupid easy.  It’s usually around halfway they have some super goofy and hard problem that I get stuck on.  The main change this year is doing it through the Bash terminal.  Though I think I did it this way in 2020.  I may swap later, who knows.  The only real annoyance is opening and closing files to test run.

I also have not done a lot of coding in a while, so I had to search up on how to do a few things.  It was a good starter scenario though, it used a lot of things I have forgotten.

I don’t do anything super fancy with my code on these, I have seen some annoyingly unreadable one liners a few times.  I always prefer clean code.  The thing runs in essentially zero time anyway, I don’t need to be “milliseconds more efficient”.

Anyway, here is my solution for part 1 and 2 of Day 1.

with open("Day01Input.txt") as file:
    data = file.read()

sets = [each.split(" ") for each in data.split("\n")]
sets.pop()

list1 = []
list2 = []

for each in sets:
   list1.append(int(each[0]))
   list2.append(int(each[1]))

list1.sort()
list2.sort()

total = 0
total2 = 0
for i in range(len(list1)):
    total += abs(list1[i]-list2[i])
    total2 += list1[i]*list2.count(list1[i])

print(total)
print(total2)

Iterating Again, Already

A few days ago I posted about updates to my little FreshRSS script. I already did some fairly quick simple updates, though some didn’t quite work out as hoped initially.

Firstly, I didn’t care for the big blob daily article file it pushed out to the archive. So now it spits out a bunch of individual articles, formatted “YYYY-MM-DD-Article Title.md.” this created my first issue. I got a “Filename too long error.” So I set up a check to truncate the article title to 100 characters, if its longer than 100 characters. I can’t just always slice it, because if its short, I will get an “index out of range” error. I then ran into a second issue. One of the articles I tagged was about the new Ranma 1/2 series. The system, totally choked on the / in 1/2.

I tried a few things to replace any /?%$&#@ symbols, because they all felt like potential catches. Working with slashes is a pain though since its “the escape character”. I ended up just untagging the article so the problem would go away for now.

The second change I made was to add a conditional for if the feed has a “Yes” under “Publish”. This allows me to add more feeds that simply, write articles, without posting links anywhere.

After running it, I noticed something though. None of the new non posting Notes or Recipes were making files. It seems they were all older than the “last run” time. Since I only needed them to work once, I just, force the feed to look at the last 1000+ hours.

This created another error with the WordPress module. These files only feeds don’t have URLs or usernames etc, so when the wp module called for this information, it just craps out. I moved the call to create the connection to inside the “should I post” conditional, which fixed the issue.

Another, thing I tried to do, though this is less coding and more convenience. the script runs in a virtual environment right now. Which is “good practice” but annoying to use. I’ve been trying to work out how to run it without all the hassle, though so far nothing I have found is real definitive. I have tried running the “version of python in the virtual environment, ie “venv/bin/python3 main.py”, but that doesn’t seem to work. I saw a suggestion fo using a hashang in the top line, which I tried, but that didn’t work. So I am just, back to “cd Scripts/Python/FreshRSS-Poster/” then “source venv/bin/activate” then “python3 main.py”, which is annoying, but it works.

There are still a few things I want to work out.

  • When I save files in other methods, it embeds images in them. I would like to adjust the formatting to mimic that.
  • The dates on the files are all the date run, though they should not be. I need to look into that.
  • I should sort the file outputs into directories based on the blog or topic tag.

2024.11.08 – Code Project – Python – Improved FreshRSS Link Lists

Today, I want to talk about recent improvements I have made on my FreshRSS to WordPress Digest Python script. And to make a note on what I would like to do next.

This is the script I used to produce these Link List Posts on [Blogging Intensifies] and Lameazoid. The Github Repository for it is here.

  • The first version was simple, it pulled from the sharded feed of FreshRSS, collected favorited articles, formatted them a bit, then posted a wordpress post of links.
  • Overtime, I wanted these posts to be prettier, so I added a bit more massaging to the formatting, and some HTML code so the links would show up in pretty little formatted boxes. I also decided some sort of summary would be useful, so it pulls the first 100 words or so from a feed item as a teaser.
  • Initially I was using the main share feed from FreshRSS, but I have two blogs, each with vague but disting “themes”. Sharing video game news to [BI] felt a bit silly. Not that it matters, no one reads those lists and its mostly for my reference. I found you could narrow things down by personal tags, so altered the Python script to handle any number of configured blogs and now both get seperate link lists.

So, whats new this round?

Not a lot on the externally visible end. A week or so ago, I found that the Raspberry Pi I had running the script on had died on me. Or, more likely, the SD card did. Whatever the case, the script was not running as scheduled. I have always had a bit of a love/hate with the scheduled run. Some days I barely share anything from the reader, so it makes for weird small posts. Also it ran at like, 10:30P, which was kind of late, and occasionally, I found myself rushing to get through everything to make sure I flagged anything relevant, because I was flipping through it at 5 minutes till.

Right now, I am running it manually. And at irregular intervals. This created a new problem, but its one I had already been planning to fix. The way the FreshRSS shared feed works, you can append a number, X, and get “everything in the last X hours.” When it ran on a cronjob schedule every 24 hours, this number was simply, hardcoded at 24.

When manually running things, I needed X to be “however many hours since it last ran.” So now, it writes out a simple file with a time stamp, after each run. It also pulls in that file, and calculated the time difference. If the time difference is less than an hour, it defaults to an hour, because “X=0”, just gives the default feed, which may be everything, or may be the last ten items. I am not sure on the limits. If there isn’t a timestamp file, most likely if its being first run, it sets the hours different to 0, and gets everything.

Something else I added this round, everytime I wanted to do modifications, I needed to comment out some lines and uncomment others, so the script would not spam my blog with the same post over and over. Also, I have a time stamp file now that I don’t want to overwrite when testing, since it will probably stop seeing feed items unless they were marked in the last hour.

So I added a flag variable at the top, and encapsulated the business end output in some conditional statements. Now, when I want to test, I just change the “runmode” variable at the tol to “False”, and it stops posting or editing the time stamp file.

This was also needed for my third new feature. In addition to posting to the blog, it spits everything out into a simple, dated markdown file. This way, I have a private record of everything shared, since a lot of the point is, “I want to keep these links for reference.” Initially I just spit out the post data, but that was ugly since it was full of HTML tags. So It now compiles together a second, markdown formatted, variable, that gets written to the file. Another key difference, in my private files, I dump the entire article, and not just the 100 word summary.

I don’t want to repost entire articles on my blog, its rude and ugly, hence the summary, but for a private archive of text data, dumping it all is preferable. Articles disappear ALL the time, this is literally 100% of why I save and archive shit like this in the first place.

Some changes I still want to make

  • Right now it dumps everything into one output file, I may split this across blogs/topics
  • Another advantage of the private archive, I can add any number of additional tags to pull, that don’t have to be posted anywhere. I can just, pull them to a text archive. I already have started a recipes tag, for example. I also added a to be used flag in the config for it a feed gets posted anywhere.
  • I am kind of down on the idea of AI, but I still may look into hooking the summary function to some sort of AI service to create actual summaries. In my very very vague testing, it had a hard time keeping it short, even when instructed to do so.
  • I kind of want to modify the script to also produce a queue of links, then maybe a second script on a schedule that posts any links out of a file on microblogging services like Mastodon.
  • I would love to find a way to share links I find elsewhere to FreshRSS or this script.
  • I kind of want to find a way to sort and group posts under categories (Music, Coding, Video Games, etc). I have ideas oh how, but they are not all very… Elegant.
  • I kind of dislike having the timestamp file, I would like to figure out a way to query the WordPress Blog itself for “The last post marked link list,” and go off of that as the “last run date.”

The “Holy Grail” want, is the ability to add comments to shares. I put in a suggestion on the Github page for a “Notes” feature. I am seriously just considering making my own plug in. This would be super useful for WHY I shared a link. I could use this later in the social sharing queue system as well. The idea would be, as an example, I tag a post for a new CHVRCHES album, then add a little note, “I am super looking forward to this!”, then on the digest, the comment would show up.

Twitter Archive to Markdown

I have been wanting to convert my Twitter Export archive into a more useful format for a while. I had vague dreams of maybe turning them into some sort of digest posts on WordPress but there are a fuckton of tweets (25,000) and it would require a LOT of scrubbing. I could probably manage to find and remove Retweets pretty easily, but then there is the issue of media and getting the images into the digest posts and it’s just not worth the hassle.

What I can, and did do, is preserve the data is a better, more digestible, and searchable format. Specifically, Markdown. Well, ok, the files are not doing anything fancy, so it’s just, plaintext, pretending to be Markdown.

I have no idea if Twitter still offers an export dump of your data, I have not visited the site at all in over a year. I left, I deleted everything, I blocked it on my DNS. But assuming they do, or you have one, it’s a big zip file that can be unrolled I to a sort of, local, Twitter-like interface. There are a lot of files in this ball, and while I am keeping the core archive, I just mostly care about the content.

If you dig in, it’s easy to find, there is a folder called data, the tweets are in a file called “tweets.js.”. It’s some sort of JSON/XML style format. If you want to media, it’s in a folder called “Tweets_Media” or something like that. I skimmed through mine, most of the images looked familiar, because I already have them, I removed the copy because I didn’t need it.

But I kept the Tweets.js file.

So, what to do with it? It has a bunch of extraneous meta data for each Tweet that makes it a cluttered mess. It’s useful for a huge website, but all I want is the date and the text. Here is a sample Tweet in the file.

{
    "tweet" : {
      "edit_info" : {
        "initial" : {
          "editTweetIds" : [
            "508262277464608768"
          ],
          "editableUntil" : "2014-09-06T15:05:44.661Z",
          "editsRemaining" : "5",
          "isEditEligible" : true
        }
      },
      "retweeted" : false,
      "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
      },
      "display_text_range" : [
        "0",
        "57"
      ],
      "favorite_count" : "0",
      "id_str" : "508262277464608768",
      "truncated" : false,
      "retweet_count" : "0",
      "id" : "508262277464608768",
      "created_at" : "Sat Sep 06 14:35:44 +0000 2014",
      "favorited" : false,
      "full_text" : "\"Sorry, you are over the limit for spam reports!\"  Heh...",
      "lang" : "en"
    }
  },

So I wrote a quick and simple Python Script (it’s below). I probably could have done something fancy with Beautiful Soup or Pandas, but instead I did a quick and basic scan that pulls the data I care about. If a line contains “created_at” pull it out to get the data, if it has “full_text”, pull it out to get the text.

Once I was able to output these two lines, I went about cleaning them up a bit. I don’t need the titles, so I started by splitting on “:”. This was quickly problematic if the Tweet contained a semicolon and because the time contained several semicolons. Instead I did a split on ‘ ” : ” ‘. Specifically, quote, space, semicolon, space, quote.”. Only the breaks I wanted had the spaces and quotes, so that got me through step one. The end quotation mark was easy to slice off as well.

I considered simplifying things by using the same transformation on the date and the text, but the data also had this +0000 in it that I wanted to remove. It’s not efficient, but it was just as simple to just have two, very similar operations.

After some massaging, I was able to output something along the lines of “date – text”.

But then I noticed that for some reason the Tweets are apparently not in date order. I had decided that I was just going to create a series of year based archival files, so I needed them to be in order.

So I added a few more steps to sort each Tweet during processing into an array of arrays based on the year. Once again, this isn’t the cleanest code, It assumes a range of something like, 2004 to 2026, which covers my needs for certain. I also had some “index out of range” errors with my array of arrays, which probably have a clever loopy solution, but instead it’s just a bug pre-initialized copy/paste array.

Part of the motivation of doing the array of arrays was also that I could make the script output my sorted yearly files directly, but I just did it manually from the big ball final result.. the job is done, but it could easily be done by adjusting the lower output block a bit.

Anyway, here is the code, and a link to a Git repository for it.

# A simple script that takes an exported tweets.js file and outputs it to a markdown text file for archiving.
# In pulling data for this, I noticed that older Twitter exports use a csv file instead of a .js file.
# As such, this is for newer exports.
# The Tweets.js file is in the 'data' directory of a standard Twitter archive export file.

# Open the tweet.js file containing all the tweets, should eb in the same folder
with open("tweets.js", encoding="utf-8") as file:
    filedata = file.readlines()

tweet_data = []
current_tweet = []
# The Tweets don't seem to be in order, so I needed to sort them out, this is admitedly ugly
# but I only need to cover so many years of sorting and this was the easiest way to avoid index errors
sorted_tweets = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

# Does a simple search through the file.  It pulls out the date posted and the full text.
# This does not do anything with images, sorry, that gets more complicated, it would be doable
for line in filedata:
    if "created_at" in line:
        timesplit = line.split(":")
        posted_at = line.split("\" : \"")[1].replace(" +0000 ", " ")[:-3]
        current_tweet.append(posted_at)
    elif "full_text" in line:
        current_tweet.append(line.split("\" : \"")[1][:-3])
        #        current_tweet.append(line.split(":")[1].split("\"")[1])
        tweet_data.append(current_tweet)
        current_tweet = []
        # Because full text is always after the date, it just moves on after it gets both
    else:
        pass

# An ugly sort, it simply looks for the year in the date, then creates an array of arrays based on year.
# I did it this way partly in case I wanted to output to seperate files based on year, but I can copy/paste that
# It probably is still out of order based on date, but whatever, I just want a simple archive file
for each in tweet_data:
    for year in range(2004, 2026):
        if str(year) in each[0]:
            sorted_tweets[year - 2004].append(each)

# Prints the output and dumps it to a file.
with open("output.md", encoding="utf-8", mode="w") as output:
    for eachyear in sorted_tweets:
        for each in reversed(eachyear):
            output.write(each[0] + " : " + each[1] + "\n")
            print(each[0] + " : " + each[1])

Free Code Camp Responsive Web Design

I have a huge pile of online courses bookmarked that I would like to run through. This does have some pitfalls, maybe I’ll get to that another day. Today I want to discuss one I finished. The FreeCodeCamp Responsive Web Design course. You get a little certification for completing these, which mostly just bolsters the part of me that, doesn’t really find much value in certifications. I didn’t really need to take this course, but it’s part of the basic “core list” on FreeCodeCamp’s website, and my obsessive completionism mind says I should do those, in addition to anything else I might find on the site. I am not a web design expert (maybe, something something Imposter Syndrome), I don’t really need to take this course. Of all of the coding I have done, web design is what I’ve done the most. I figure this would be a breeze.

Boy was it not.

And not because it was hard.

I have been doing some other courses on this website, and there is a lot of variety in teaching styles, so this is less a criticism of FCC and more a criticism of this course. I also will add that if you were a complete beginner, it probably would be, less tedious. It touches a bit on the “maybe I’ll get to that another day” back at the top of this post, in that soooo much online learning is sooooo beginner oriented it’s a bit of a trap.

But anyway, the Responsive Web Design course. I’ll run through some of the stuff built in a bit, but I want to address some issues I had with the course, not even the content, just the structure. It’s probably just a limitation of the automated system more than anything else.

It’s incredibly hand holdey.

To the point of being a bit tedious, and possibly to some extent being bad for actually learning. One of the praises I have had for that Angela Yu Python course was how well it ramped up its projects. It presented an idea, it hand held you through a project, it guided you through a second project using that concept, then it would tell you to free-form a related project. Repeat, for each new concept. Where I felt that this FCC course missed is the part where eventually it lets you do more on your own. One easy example, early on, it covers the basic structure of an HTML page and things to put in the header, like the link to the style sheet or metadata. And then, every lesson after, it just, keeps repeating the same 3 or 4 steps to add these items.

Ideally, at some point, it would just be a step to “Add the standard boilerplate HTML and header”. With no prompts on what goes in there, so that you have to do it entirely on your own.

This sort of thing shows up a lot in later lessons too. It will do things you have done several times before, in this clunky step-by-step fashion. “Add a width of XXX to this class”, “Now set the background color to #XXXXXX”, now set the positioning. At some point, it really feels like it would be beneficial to just say, “Set up this div block with these parameters, so you are forced to do it on your own completely. instead of one step at a time, explicitly spelled out each time.

You can skip anything that isn’t part of the actual 5 main tests. Which is totally doable, because another issue I found was that the previous teaching, rarely had anything to do with the “free-form test” part. For the first one you build a survey form, and that one matched pretty well. The Tribute Page was close-ish, but starting to stray. The following two sections end with a Product Landing Page and a Technical Documentation Page. Which are basically just, slight variations of the Tribute Page. As is the final project of a Personal Portfolio page. The exercises though are these slightly painfully slow little CSS-based art projects. You kind of learn some neat techniques, but honestly, as someone who has done some front-end dev work, a lot of it is simply not practical. The CSS penguin is neat, but if I want a penguin on my webpage, I’m just going to find a PNG.

The final challenges themselves are kind of simple to cheese through as well. There is a checklist of what it’s looking for, if you meet those requirements, you pass. It doesn’t matter if the end result is even functional. For the Profile page, I took the code from my existing Github.io page and added some ID tags to it.

There is also this weird inconsistency of methodology. It’s most obvious in colors. There are several ways to assign colors in CSS. Which one is used in this course is inconsistent, though it seems to prefer RGB (R G B). Personally, I prefer just using hex, it’s simple and easy Just an easy #aaaaaa, that sort of thing. There is a lot in this course that actually kind of feels like there is an instructor trying to push some supremely anal-retentive and less-used CSS concepts on the world. using rgb instead of hex doesn’t make you a graphic designer even though it feels fancier. Also, classes are much preferred to ids. There are a lot of places using ids in this course where it should use classes.

Anyway, the projects themselves. I’ve posted the whole thing on GitHub, and I’ll point out my personal highlights.

  • CSS Colored markers – As tedious as the course was in its teaching methods, the little artsy CSS things do turn out neat like these little markers. This was probably the most interesting from the first section.
  • Flexbox Photo Gallery – I actually reused this code to build a new version of my home dashboard to replace the one I lost when my Raspberry Pi crapped out. This is probably the most valuable lesson in this whole lesson set.
  • CHVRCHES Tribute Page – Not that exciting of a design, but the final project was to build a tribute page, so I made one for CHVRCHES.
  • Balance Sheet – It’s nice looking, but there are jQuery libraries that basically just do this, with tables.
  • Picasso Painting – I have no idea WTF this is supposed to be. Apparently, others don’t either because I noticed that at some point this lesson was replaced with one where you build a cat painting.
  • Piano – I actually want to see about combining this with part of what was learned in the FCC JavaScript class to make the Piano functional at some point.
  • Magazine Layout – It’s kind of a neat layout, I may reuse some of this code at some point, but I don’t know what I would use it for.
  • Penguin – Look at him wave, isn’t he adorable?