Why Can’t I Hold All These RSS Feeds

I’ve mentioned my woes with my RSS reader off and on in posts here, but I almost had another one. Thankfully, I learned my lesson last time. I ended up breaking my Fresh RSS install. I came across this post on Hacker News, where someone had asked for people to post their personal blogs. Someone had set up an OPML Feed for this list and stuck it on GitHub. I thought to myself, “Why not, I like these types of people, surely there are some good things in here”.

So I hooked the OPML up to my Fresh RSS. This tripled how many feeds I was subscribed to. It also broke my reader. I don’t know exactly what happened, but it stopped updating feeds, and would not even load the main page. I did some investigation and found that one of the SQL tables had become corrupted. THANKFULLY it was not the one with the feeds themselves. Literally everything else can be rebuilt if needed, easily, but recovering the feed list is paramount. I immediately created an export dump of the feed list. After some troubleshooting, I completely deleted the Fresh RSS database, then reloaded a months old backup, then reimported the recent feed list tables.

The only thing that was missing, I had added some categories since the last backup. I created some dummy categories, “Category 32, Category 33”, that sort of thing. Due to the relational way databases work, feeds automatically fell into these categories, which allowed me to figure out what the actual category name was. For example, one has some comic and book feeds in it, so clearly, this was originally my “Books and Comics” category.

Eventually, I’ll weed some of these feeds out. There are some in languages I don’t understand, nothing personal, but I have plenty to read without hassling with translations. Some feeds tend to post TOO MUCH and dominate the RSS reader. I’m pretty relentless about chopping these and Hacker News is pretty much the only one that floods, that I allow to remain. Techmeme and Slashdot are sometimes borderline but not usually, so they get to stay as well.

Everything is sorted into categories, and I usually read through in category chunks, and no, I don’t read everything, I skim for interesting headlines or updates from my favorites and read those. I can’t find a good number for how many feeds but I think it’s just over 1200 now, sorted out across categories. Currently, I use the following categories.

  • Anime/Japan
  • Books and Comics
  • Food
  • Friend’s Blogs
  • Games – Deal and Bundles
  • Games – Tabletop
  • Games – VG News
  • Games – VG Reviews
  • Language Learning
  • Lifestyle and Family
  • Movies/TV
  • Music
  • My Blogs
  • News – Conservative Bull Shit (Currently all Muted)
  • News – Illinois/Decatur/Local
  • News – Liberal Opinions
  • News – US News (Empty, they all end up in World)
  • News – World
  • PersBlogs – Tech Enthusiasts
  • PersBlogs – Toy Collectors
  • PersBlogs – Gaming
  • PersBlogs – Nerd Blogs
  • Photography
  • Science/Space
  • Second Life and Virtual Worlds
  • Tech – Coding and IT
  • Tech – Crypto Bullshit
  • Tech – General
  • Tech – Security
  • Tech – VR/XR
  • Toys – Transformers
  • Toys – LEGO
  • Toys – News
  • Uncategoriezed
  • Webcomics
  • Writing and Writers

This is essentially the gamut of my interests, and sometimes if a category becomes too unwieldy, I’ll break out some of the feeds into a refined category. Which is where the prefixes come from (News, Tech, PersBlogs, Toys).

I mentioned before, I mostly read in the category view. Anything I find interesting I’ll tag with either BI or Lameazoid tags, and then my news digest script goes to work, I think at 11 PM each night. I don’t always check it every day, which is fine, sometimes I check it 2-3 times a day. Often while eating breakfast, sometimes again in the evening.

Code Project: Fresh RSS to WordPress Digest V 2

A while back, I talked about a little simple project that I build that produces a daily RSS digest post on this blog. This of course broke when my RSS Reader died on me. I managed to get Fresh RSS up and running again in Docker, and I’ve been slowly recovering my feeds, which is incredibly slow and tedious to do because there are a shitload of feeds, and i essentially have to cut and paste each URL into FreshRSS, and select the category and half the time they don’t work, so I need to make a note of it for later checking and it’s just… slow.

But since it’s mostly working, I decided to reset up my RSS poster. I may look into setting up a Docker instance just for running Python automations, but for now, I put it on a different Pi I have floating around that plays music. The music part will be part of a different post, but for this purpose, it runs a script, once a day, that pulls a feed, formats it, and posts it. It isn’t high overhead.

While poking around on setting this up, I decided to get a bit more ambitious and found out that, basically every view has it’s own RSS feed. Previously, I was taking the feed from the Starred Articles. But it turns out that Tags each have their own feed. This allowed me to do something I wanted from the start here, which is create TWO feeds, for both of my blogs. So now, articles related to Technology, Politics, Food, and Music, get fed into Blogging Intensifies, and articles related to toys, movies, and video games, go into Lameazoid.

I’ve also filtered both of these out of the main page. I do share these little link digests for others, if they want to read them, but primarily, it’s a little record for myself, to know what I found interesting and was reading that day. This way if say, my Fresh RSS reader crashes, I still have all the old interesting links available.

The other thing I wanted to do was to use some sort of AI system to produce a summary of each article. Right now it just clips off the first 200 characters or so. At the end of the day, this is probably plenty. I’m not really trying to steal content, I just want to share links, but links are also useful with just a wee bit of context to them.

I mentioned before, making this work involved a bit to tweaking to the scrips I was using. First off is an auth.py file which has a structure like below, one dictionary for each blog, and then each dictionary gets put in a list. Adding additional blogs would be as simple as adding a new dictionary and then adding the entry to the list. I could have done this with a custom Class but this was simpler.

BLOG1 = {
    "blogtitle": "BLOG1NAME",
    "url": "FEEDURL1",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG1URL",
}

BLOG2 = {
    "blogtitle": "BLOG2NAME",
    "url": "FEEDURL2",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG2URL",
}

blogs = [BLOG1, BLOG2]

The script itself got a bit of modification as well, mostly, the addition of a loop to go through each blog in the list, then some variables changed to be Dictionary look ups instead of straight variables.

Also please excuse the inconsistency on the fstring use. I got errors at first so I started editing and removing the fstrings and then realized I just needed to be using Python3 instead of Python2.

from auth import *
import feedparser
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import NewPost
from wordpress_xmlrpc.methods import posts
import datetime
from io import StringIO
from html.parser import HTMLParser

cur_date = datetime.datetime.now().strftime(('%A %Y-%m-%d'))

### HTML Stripper from https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

# Get News Feed
def get_feed(feed_url):
    NewsFeed = feedparser.parse(feed_url)
    return NewsFeed

# Create the post text
def make_post(NewsFeed, cur_blog):
    # WordPress API Point
    build_url = f'https://{cur_blog["wp_url"]}/xmlrpc.php'
    #print(build_url)
    wp = Client(build_url, cur_blog["wp_user"], cur_blog["wp_pass"])

    # Create the Basic Post Info, Title, Tags, etc  This can be edited to customize the formatting if you know what you$    post = WordPressPost()
    post.title = f"{cur_date} - Link List"
    post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
    post.content = f"<p>{cur_blog['blogtitle']} Link List for {cur_date}</p>"
    # Insert Each Feed item into the post with it's posted date, headline, and link to the item.  And a brief summary f$    for each in NewsFeed.entries:
        if len(strip_tags(each.summary)) > 100:
            post_summary = strip_tags(each.summary)[0:100]
        else:
            post_summary = strip_tags(each.summary)
        post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></$                        f'<p>Brief Summary: "{post_summary}"</p>'
        # print(each.summary_detail.value)
        #print(each)

    # Create the actual post.
    post.post_status = 'publish'
    #print(post.content)
    # For Troubleshooting and reworking, uncomment the above then comment out the below, this will print results instea$    post.id = wp.call(NewPost(post))

    try:
        if post.id:
            post.post_status = 'publish'
            call(posts.EditPost(post.id, post))
    except:
        pass
        #print("Error creating post.")

#Get the news feed
for each in blogs:
    newsfeed = get_feed(each["url"])
# If there are posts, make them.
    if len(newsfeed.entries) > 0:
        make_post(newsfeed, each)
        #print(NewsFeed.entries)

Code Project: Fresh RSS to WordPress Digest

I actually briefly mentioned this project when I write about moving from TinyTinyRSS to FreshRSS. This has become a bit of an evolving and ongoing project however, so I’ve decided to catalogue it in it’s own page. This little script worked out much better than I expected, and I’ve modified it a bit over time, and have ideas to modify it going forward even more. Starting off, the code can be found here in this Github GIST.

I’ve left a bit of commented out code that i might use later for troubleshooting or adding additional features. The general gist of the code, it pulls the last 24 hours worth of news stories I have favorited from my FreshRSS install, then formats them into a digest format and posts it here, in this blog. They get sorted into their own category, you can find them here.

This is basically a thing I’ve seen others do that I’ve wanted to do for a while. It’s also partially just for my reference more than anything, it’s sort of a log of everything I have found interesting on a particular day more than anything. Others may or may not find it interest, which is why I also filter that category out of the home page feed.

Originally, it was just a list of URLs and titles. I realized that it might be useful to have SOME idea what the link was about before clicking it, so I have been playing with the summary as well. My first attempt was a bit dodgy because it actually posted the entire article as the summary. Currently, it just arbitrarily chops it off at a few hundred characters. I want to improve it even farther at some point by pushing it through some summarizing AI and getting an actual proper summary but I have not gotten there yet.

There re a few other things I want to add but I’m not sure they re easily possible. Firstly, I would love to be able to parse some sort of categories into the digest. So say, all the “Video Game” links are together and Music links are together. FreshRSS has categories but they don’t seem to show up in the feed anywhere.

This would also allow me to split these posts between this blog and my other blog, Lameazoid. I do share interesting video games news from FreshRSS, but I mostly don’t share Toy related articles, because it feels a little TOO FAR out there for what I want to post to this blog. If there were a way to have the categories, I could easily have the script split the feed by categories and post a digest to each blog.

I also wish there was a way to add my own notes and commentary occasionally. I don’t think it showed up in the feed either, but TinyTinyRSS had a notes feature. I am not sure if FreshRSS has that as well. I probably should try to at least suggest these features to the creators on GitHub, or maybe get really adventurous and create my own plug-ins for FreshRSS to accomplish these tasks.

FreshRSS and RSS Feed Posts

Keen observers (ha ha ha no one reads this), might have noticed that a few posts of links showed up in the feed.  These are basically, stories I read in my RSS reader that I found interesting, and wanted to share, or at least, keep track of.  The posts as of now are a little ugly, and I’ll probably clean up the formatting over time, but I wanted to go ahead and write a bit about the process.  I’ll have the Code on Github at some point.

As for the factors, firstly, this is something I’ve wanted to have on my blog for a while.  Like a long while.  I might even try to see if there are ways to better slit up the links by topic later.  A fair number of blogs I subscribe to have these sort of link digest posts, and I’ve always just liked the idea.  It’s also good for personal reference to when I may have read something.  It is limited as it only comes from y RSS Reader.

Speaking of my RSS Reader.  I’ve moved on from TinyTinyRSS, for a few reasons.  One, the interface is a little meh, honestly.  Maybe the newer version is better but it’s only available in Docker, and Docker is such a PItA to use.  Also, while looking for alternatives, it sounds like the folks who make TTRSS are kind of a bunch of gatekeeping jerk types, and I’d rather not support that.  I also find the need to keep the update daemon running with Screen to be a pain.  So I’ve moved over to FreshRSS, which I just run locally on a Raspberry Pi.  I may move it to a publicly accessibly machine at some point, but I am not entirely convinced that TT-RSS wasn’t the entry point for my previous server malware woes.

So, like TT-RSS, Fresh RSS has a way to get an RSS feed out of your Favorited posts.  In the past I’ve used tools like IFTTT to automate posting these links around, but I don’t use IFTTT anymore for reasons I’m not going into.  Fortunately, I’ve been working to become a pretty good Python coder for the last month or so.  So instead I wrote a script.  

It’s not even a particularly complicated script.  There are only two things it really needs to do, get new articles, and then post them to WordPress. Since the script runs locally, on the same Raspberry Pi even, it easily can reach and pull the RSS feed.  One nice thing I noticed with Fresh RSS, the feed included a time interval, so just getting new posts was super simple, because the interval is just “24” for “24 hours”.  The script eventually will run on a cronjob at the exact same time daily.  Anyway, after pulling the RSS, the entries are already in an easily usable Dictionary.  which gets fed into the construction of the WordPress Post.

def get_feed(feed_url):
    NewsFeed = feedparser.parse(feed_url)
    return NewsFeed

The posting part was pretty easy as well, WordPress has an API, and Python also has a library that can use that API.  It just needs some log in information and a post payload to send.  

def make_post(NewsFeed):
    wp = Client(f'https://{wp_url}/xmlrpc.php', wp_user, wp_pass)
    post = WordPressPost()
    post.title = f"{cur_date} - Link List"
    post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
    post.content = f"<p>Blogging Intensifies Link List for {cur_date}</p>"
    for each in NewsFeed.entries:
        post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></p>'

The trickiest part was formatting the date a bit prettier.  I mentioned cleaning up the formatting a bit, I’m thinking maybe a simple invisible table, so the date and the links don’t wrap oddly like they do now.   i also added a check that if there are no new favorited posts, it will skip making a post.  Otherwise I’ll end up with empty posts on days I forget to check my feed reader

While writing the script, at first I was just outputting a text copy of the post to the console until satisfied.  Eventually, I pushed out a real post, then verified that things worked.  The next day, was just a straight test by opening the project, then running it again.  The third day, I copied the files and installed the lobraries needed, then posted from the Pi.  Phase 4 of this will be to set up Cron to run it automatically.  If that works then it will certainly, “just run” for the foreseeable future.

Tiny Tiny RSS, Possibly my Perfect RSS Solution

rss_iconSo, I mentioned recently, I wanted to migrate off of my shared Hosting to a VPS on Digital Ocean.  One reason sited was more control over what I can do with the server.  It’s essentially just a cloud based Linux machine, I can do anything I would do on a locally hosted Ubuntu box with it.  I came across Tiny Tiny RSS recently, and it’s the perfect example of the kind of thing I wanted the VPS for.

While nowhere near the main reason, the final straw with my tolerance of Google’s increasing level of crap was the closing of Reader, a service I’d depended on pretty much since it’s inception.  I’d tried a few alternative solutions but nothing really did anything for me next tot he simplicity to Google Reader.

Eventually I just sort of lost the want for RSS feeds.  The whole web seems to be abandoning the idea 9probably because it’s not nearly as easy to plaster crap ads all over an RSS feed) so I just decided to let it go.

Recently I’ve been trying to find a good solution again.  I really hate not being able to keep up with infrequently updated blogs i find.  That’s like 90% of the reason i liked having Google Reader, so when that interesting niche blog I like that updates once ever 4 months updates, I can know.

I looked into some Firefox extensions but using them tends ot be clunky.  I’ve tried a few different apps on my phone but nothing is idea.  The biggest issue is a lack of sync across everything.

tinytinyrss

Tiny Tiny RSS is a self hosted RSS Reader.  You download it (with Git in this case), set up a database for it, and let it roll.  I’ve set it up on my little sandbox domain BloggingIntensifies.com and added feeds I was pulling with other services to it.

It’s web based, so I can get to it from anywhere.  Need number one.

It’s hosted by me, so I won’t have to worry about some “thinks they know best” company screwing me over again, need number two.

There is a built in API so it can be access via mobile with an app.  Need number 3.  BONUS!  There is even a compatible Windows Phone app.

The next step is to figure out what I did with my old list of Google Reader feeds and start loading it up.