Code Project: Fresh RSS to WordPress Digest V 2

A while back, I talked about a little simple project that I build that produces a daily RSS digest post on this blog. This of course broke when my RSS Reader died on me. I managed to get Fresh RSS up and running again in Docker, and I’ve been slowly recovering my feeds, which is incredibly slow and tedious to do because there are a shitload of feeds, and i essentially have to cut and paste each URL into FreshRSS, and select the category and half the time they don’t work, so I need to make a note of it for later checking and it’s just… slow.

But since it’s mostly working, I decided to reset up my RSS poster. I may look into setting up a Docker instance just for running Python automations, but for now, I put it on a different Pi I have floating around that plays music. The music part will be part of a different post, but for this purpose, it runs a script, once a day, that pulls a feed, formats it, and posts it. It isn’t high overhead.

While poking around on setting this up, I decided to get a bit more ambitious and found out that, basically every view has it’s own RSS feed. Previously, I was taking the feed from the Starred Articles. But it turns out that Tags each have their own feed. This allowed me to do something I wanted from the start here, which is create TWO feeds, for both of my blogs. So now, articles related to Technology, Politics, Food, and Music, get fed into Blogging Intensifies, and articles related to toys, movies, and video games, go into Lameazoid.

I’ve also filtered both of these out of the main page. I do share these little link digests for others, if they want to read them, but primarily, it’s a little record for myself, to know what I found interesting and was reading that day. This way if say, my Fresh RSS reader crashes, I still have all the old interesting links available.

The other thing I wanted to do was to use some sort of AI system to produce a summary of each article. Right now it just clips off the first 200 characters or so. At the end of the day, this is probably plenty. I’m not really trying to steal content, I just want to share links, but links are also useful with just a wee bit of context to them.

I mentioned before, making this work involved a bit to tweaking to the scrips I was using. First off is an auth.py file which has a structure like below, one dictionary for each blog, and then each dictionary gets put in a list. Adding additional blogs would be as simple as adding a new dictionary and then adding the entry to the list. I could have done this with a custom Class but this was simpler.

BLOG1 = {
    "blogtitle": "BLOG1NAME",
    "url": "FEEDURL1",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG1URL",
}

BLOG2 = {
    "blogtitle": "BLOG2NAME",
    "url": "FEEDURL2",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG2URL",
}

blogs = [BLOG1, BLOG2]

The script itself got a bit of modification as well, mostly, the addition of a loop to go through each blog in the list, then some variables changed to be Dictionary look ups instead of straight variables.

Also please excuse the inconsistency on the fstring use. I got errors at first so I started editing and removing the fstrings and then realized I just needed to be using Python3 instead of Python2.

from auth import *
import feedparser
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import NewPost
from wordpress_xmlrpc.methods import posts
import datetime
from io import StringIO
from html.parser import HTMLParser

cur_date = datetime.datetime.now().strftime(('%A %Y-%m-%d'))

### HTML Stripper from https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

# Get News Feed
def get_feed(feed_url):
    NewsFeed = feedparser.parse(feed_url)
    return NewsFeed

# Create the post text
def make_post(NewsFeed, cur_blog):
    # WordPress API Point
    build_url = f'https://{cur_blog["wp_url"]}/xmlrpc.php'
    #print(build_url)
    wp = Client(build_url, cur_blog["wp_user"], cur_blog["wp_pass"])

    # Create the Basic Post Info, Title, Tags, etc  This can be edited to customize the formatting if you know what you$    post = WordPressPost()
    post.title = f"{cur_date} - Link List"
    post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
    post.content = f"<p>{cur_blog['blogtitle']} Link List for {cur_date}</p>"
    # Insert Each Feed item into the post with it's posted date, headline, and link to the item.  And a brief summary f$    for each in NewsFeed.entries:
        if len(strip_tags(each.summary)) > 100:
            post_summary = strip_tags(each.summary)[0:100]
        else:
            post_summary = strip_tags(each.summary)
        post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></$                        f'<p>Brief Summary: "{post_summary}"</p>'
        # print(each.summary_detail.value)
        #print(each)

    # Create the actual post.
    post.post_status = 'publish'
    #print(post.content)
    # For Troubleshooting and reworking, uncomment the above then comment out the below, this will print results instea$    post.id = wp.call(NewPost(post))

    try:
        if post.id:
            post.post_status = 'publish'
            call(posts.EditPost(post.id, post))
    except:
        pass
        #print("Error creating post.")

#Get the news feed
for each in blogs:
    newsfeed = get_feed(each["url"])
# If there are posts, make them.
    if len(newsfeed.entries) > 0:
        make_post(newsfeed, each)
        #print(NewsFeed.entries)

Sunday 2023-06-11 – Link List

Blogging Intensifies Link List for Sunday 2023-06-11

11-Jun-2023 – Hyundai is Doomed: Porting the 1993 Classic To a Hyundai Head Unit

Brief Summary: “In the natural order of the world, porting DOOM to any newly unlocked computing system is an absolut”

11-Jun-2023 – Marc Andreessen Criticizes ‘AI Doomers’, Warns the Bigger Danger is China Gaining AI Dominance

Brief Summary: “This week venture capitalist Marc Andreessen published “his views on AI, the risks it poses and the “

11-Jun-2023 – How to make digital business cards and share them via QR codes

Brief Summary: ”

A previous employer found it important that the whole team had business c”

11-Jun-2023 – Modern Brownie Camera Talks SD and WiFi

Brief Summary: “If you’re at all into nostalgic cameras, you’ve certainly seen the old Brownie from Kodak. They were”

11-Jun-2023 – Chocolate Mint drink with 10 times the minty refreshment: Is it really as strong as it looks?

Brief Summary: ”
Cafe de Crie chain celebrates its 10th anniversary in a big way.

With the weather heating up, it’s”

Friday 2023-06-09 – Link List

Blogging Intensifies Link List for Friday 2023-06-09

09-Jun-2023 – Trump took nuclear secrets and kept files in shower, charges say

Brief Summary: “Donald Trump is accused of keeping classified documents in a ballroom and bathroom at his Florida ho”

09-Jun-2023 – Recreating an Analog TV Test Pattern

Brief Summary: “While most countries have switched to digital broadcasting, and most broadcasts themselves have prog”

09-Jun-2023 – Apple To Stop Autocorrecting Swear Word To ‘Ducking’ On iPhone

Brief Summary: “At Apple’s developer conference earlier this week, the company said it has tweaked the iPhone’s auto”

09-Jun-2023 – ‘Shadow of the Dragon Queen’: Steel Edition’ Should Be In Any Dragonlance Fan’s Horde

Brief Summary: “It’s time to return to the Pandemonium Warehouse as today I look at Shadow of the Dragonqueen: Steel”

09-Jun-2023 – Elon Musk Says Twitter Is Going To Get Rid Of The Block Feature, Enabling Greater Harassment

Brief Summary: “One of the most important tools for trust and safety efforts is the “block” feature, allowing a user”

09-Jun-2023 – I Have No Sympathy For The Stack Overflow Moderator Strike

Brief Summary: “Well, well, well, what do we have here? The guardians of Stack Overflow, those volunteer moderators “

Re-mulching and other Activities Outside the House

I have been slacking on my posts, though technically still doing better than I had been. It’s a combination of being busy and just being generally meh overall. One think keeping me busy was re-mulching the flower beds around the house. Not just throwing down new mulch though, I mean raking up the old and putting down new weed barrier. This meant going around the existing plants and the little metal stakes to hold the weed barrier down were a pain because there is a ton of super packed rock in the area that makes them hard to insert into the ground.

In the case of the tree out back, it also meant digging up the ground around the tree to add a new flower bed space completely. We added a lot of new plants to the area as well, though most in pots for ease of use.

Then my wife put all her decor out again.

We also started working on the basic garden set up for the year. In the past we’ve had issues with trying to garden at this house because there is a lot of wildlife that comes around that eat or dig up everything. Right now it’s in buckets, though I plan to put legs on these wooden boxes we have to put the buckets into. Which is part of what the pile of wood behind the garden plants at the bottom is for. We also may use the stairs as a tiered herb garden. It’s all wood that was salvaged from my parent’s deck which they recently had replaced.

Anyway, here are some photos of the completed set up.

Here is a random bonus of the backyard from when I was mowing recently.

Dead Memory Cards and Using Docker

More often that it feels like it should, something in technology breaks or fails. I find that this can be frustrating, but often ultimately good, especially for learning something new, and forcing myself to clean up something I’ve been meaning to clean up. I have a Raspberry Pi I’ve been using for a while for several things as a little web server. It’s been running probably for years, but something gave out on it. I’m not entirely sure it’s the SD card or the Pi itself honestly, because I’ve been having a bit of trouble trying to recover through both. It’s sort of pushed me to try a different approach a bit.

But first I needed a new SD card. I have quite a few, most are “in use”. I say “in use” because many are less in use and more, underused. This has resulted in doing a bit of rebuild on some other projects to make better use of my Micro SD cards. The starting point was a 8 GB card with just a basic Raspbian set up on it.

So, for starters, I found that the one I have in my recently set up music station Raspberry Pi is a whopping 128gb. Contrary to what one might thing, I don’t need a 128gb card in my music station, the music is stored on the NAS over the network. It also has some old residual projects on it that should really be cleaned out.

So stuck the 8GB card in that device and did the minor set up needed for the music station. Specifically, configure VLC for Remote Control over the network, then add the network share. Once I plugged it back into my little mixer and verified I could remote play music, I moved on.

This ended up being an unrelated side project though, because I had been planning on getting a large, speedy, Micro SD card to stick in my Retroid Pocket. So I stuck that 128GB card in, the Retroid and formatted it. This freed up a smaller, 32GB card.

I also have a 64GB that is basically not being used in my PiGrrl Project I decided to recover back for use. The project was fun, but the Retroid does the same thing 1000x better. So now it’s mostly just a display piece on a shelf. Literally an overpriced paperweight. I don’t want to lose the PiGrrl configuration though, because it’s been programmed up to work with the small display and IO Control Inputs. So I imaged that card off.

In the end though, I didn’t end up needing those Micro SD cards though, I opted for an alternative option to replace the Pi, with Docker on my secondary PC. I’ve been meaning to better learn Docker, though I still find it to be a weird and obtuse bit of software. There are a handful of things I care about restoring that I used the Pi for.

  • Youtube DL – There seem to be quite a few nice Web Interfaces for this that will work much better than my old custom system.
  • WordPress Blog Archives – I have exported data files from this but I would like to have it as a WordPress Instance again
  • FreshRSS – My RSS Reader. I already miss my daily news feeds.

YoutubeDL was simple, they provided a nice basic command sequence to get things working.

The others were a bit trickier. Because the old set up died unexpectedly, The data isn’t easily exported for import, which means digging out and recovering off of the raw database files. This isn’t the first time this has happened, but its a lot bigger pain, which isn’t helped by not being entirely confident in how to manipulate Docker.

I still have not gotten the WordPress archive working actually. I was getting “Connection Reset” errors and now I am getting “Cannot establish Database connection” issues. It may be for nothing after the troubles I have had dealing with recovering FreshRSS.

I have gotten FreshRSS fixed though. Getting it running in Docker was easy peasy. Getting my data back, was… considerably less so. It’s been plaguing me now when I try to fix it for a few weeks now, but I have a solution. It’s not the BEST solution, but it’s… a solution. So, the core thing I needed were the feeds themselves. Lesson learned I suppose, but I’m going to find a way to automate a regular dump of the feeds once everything is reloaded. I don’t need or care about favorited articles or the articles contents. These were stored in a MySQL database. MySQL, specifically seems to be what was corrupted and crashed out on the old Pi/Instance because I get a failed message on boot and i can’t get it to reinstall or load anymore.

Well, more, I am pretty sure the root cause is the SD card died, but it affected the DB files.

My struggle now, is recovering data from these raw files. I’ve actually done this before on a surver crash years ago, but this round has lead to many many hurdles. One, 90% of the results looking up how to do it are littered with unhelpful replies about using a proper SQL dump instead. If I could open MySQL, I sure as hell would so that. Another issue seems to be that the SQL server running on the Pi was woefully out of date, so there have been file compatibility problems.

There is also the issue that the data may just flat out BE CORRUPTED.

So I’ve spun up and tried to manually move the data to probably a dozen instances of MySQL and MariaDB of various versions, on Pis, in Docker, on WSL, in a Linux install. Nothing, and I mean NOTHING has worked.

I did get the raw data pulled out though.

So I’ve been brute forcing a fix. Opening the .ibd file in a text editor gives a really ugly chuck of funny characters. But, strewn throughout this, is a bunch of URLs for feeds and websites and well, mostly that. i did an open “Replace” in Notepad++ that stripped out a lot of the characters. Then I opened up Pycharm, I did a find and replace with blanks on a ton of other ugly characters. Then I write up this wuick and dirty Python Script:

# Control F in Notepad++, replace, extended mode "\x00"
# Replace "   " with " "
# replace "https:" with " https:"
# rename to fresh.txt

## Debug and skip asking each time
file = "fresh.txt"
## Open and read the Log File supploed
with open(file, encoding="UTF-8") as logfile:
    log = logfile.read()

datasplit = log.split(" ")
links = []

for each in datasplit:
    if "http" in each:
        links.append(each)

with open("output.txt", mode="w", encoding="UTF-8") as writefile:
    for i in links:
        writefile.write(i+"\n")

Which splits everything up into an array, and skims through the array for anything with “http” in it, to pull out anything that is a URL. This has left me with a text file that is full of duplicates and has regular URLs next to Feed URLS, though not in EVERY case because that would be too damn easy. I could probably add a bunch of conditionals to the script to sort out anything with the word “feed” “rss”, “atom” or “xml” and get a lot of the cruft removed, but Fresh RSS does not seem to have a way to bulk import a text list, so I still get to manually cut and paste each URL in and resort everything into categories.

It’s tedious, but it’s mindless, and it will get done.

Afterwards I will need to reset up my WordPress Autoposter script for those little news digests I’ve been sharing that no one cares about.

Slight update, I added some filtering ans sorting to the code:

# Control F in Notepad++, replace, extended mode "\x00"
# Replace "   " with " "
# replace "https:" with " https:"
# rename to fresh.txt


## Debug and skip asking each time
file = "fresh.txt"
## Open and read the Log File supploed
with open(file, encoding="UTF-8") as logfile:
    log = logfile.read()

datasplit = log.split(" ")
links = []

for each in datasplit:
    if "http" in each:
        if "feed" in each or "rss" in each or "default" in each or "atom" in each or "xml" in each:
            if each not in links:
                links.append(each[each.find("http"):])

links.sort()

with open("output.txt", mode="w", encoding="UTF-8") as writefile:
    for i in links:
        writefile.write(i+"\n")