RSS

Code Project: Fresh RSS to WordPress Digest V 2

A while back, I talked about a little simple project that I build that produces a daily RSS digest post on this blog. This of course broke when my RSS Reader died on me. I managed to get Fresh RSS up and running again in Docker, and I’ve been slowly recovering my feeds, which is incredibly slow and tedious to do because there are a shitload of feeds, and i essentially have to cut and paste each URL into FreshRSS, and select the category and half the time they don’t work, so I need to make a note of it for later checking and it’s just… slow.

But since it’s mostly working, I decided to reset up my RSS poster. I may look into setting up a Docker instance just for running Python automations, but for now, I put it on a different Pi I have floating around that plays music. The music part will be part of a different post, but for this purpose, it runs a script, once a day, that pulls a feed, formats it, and posts it. It isn’t high overhead.

While poking around on setting this up, I decided to get a bit more ambitious and found out that, basically every view has it’s own RSS feed. Previously, I was taking the feed from the Starred Articles. But it turns out that Tags each have their own feed. This allowed me to do something I wanted from the start here, which is create TWO feeds, for both of my blogs. So now, articles related to Technology, Politics, Food, and Music, get fed into Blogging Intensifies, and articles related to toys, movies, and video games, go into Lameazoid.

I’ve also filtered both of these out of the main page. I do share these little link digests for others, if they want to read them, but primarily, it’s a little record for myself, to know what I found interesting and was reading that day. This way if say, my Fresh RSS reader crashes, I still have all the old interesting links available.

The other thing I wanted to do was to use some sort of AI system to produce a summary of each article. Right now it just clips off the first 200 characters or so. At the end of the day, this is probably plenty. I’m not really trying to steal content, I just want to share links, but links are also useful with just a wee bit of context to them.

I mentioned before, making this work involved a bit to tweaking to the scrips I was using. First off is an auth.py file which has a structure like below, one dictionary for each blog, and then each dictionary gets put in a list. Adding additional blogs would be as simple as adding a new dictionary and then adding the entry to the list. I could have done this with a custom Class but this was simpler.

BLOG1 = {
    "blogtitle": "BLOG1NAME",
    "url": "FEEDURL1",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG1URL",
}

BLOG2 = {
    "blogtitle": "BLOG2NAME",
    "url": "FEEDURL2",
    "wp_user": "YOURUSERNAME",
    "wp_pass": "YOURPASSWORD",
    "wp_url": "BLOG2URL",
}

blogs = [BLOG1, BLOG2]

The script itself got a bit of modification as well, mostly, the addition of a loop to go through each blog in the list, then some variables changed to be Dictionary look ups instead of straight variables.

Also please excuse the inconsistency on the fstring use. I got errors at first so I started editing and removing the fstrings and then realized I just needed to be using Python3 instead of Python2.

from auth import *
import feedparser
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import NewPost
from wordpress_xmlrpc.methods import posts
import datetime
from io import StringIO
from html.parser import HTMLParser

cur_date = datetime.datetime.now().strftime(('%A %Y-%m-%d'))

### HTML Stripper from https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

# Get News Feed
def get_feed(feed_url):
    NewsFeed = feedparser.parse(feed_url)
    return NewsFeed

# Create the post text
def make_post(NewsFeed, cur_blog):
    # WordPress API Point
    build_url = f'https://{cur_blog["wp_url"]}/xmlrpc.php'
    #print(build_url)
    wp = Client(build_url, cur_blog["wp_user"], cur_blog["wp_pass"])

    # Create the Basic Post Info, Title, Tags, etc  This can be edited to customize the formatting if you know what you$    post = WordPressPost()
    post.title = f"{cur_date} - Link List"
    post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
    post.content = f"<p>{cur_blog['blogtitle']} Link List for {cur_date}</p>"
    # Insert Each Feed item into the post with it's posted date, headline, and link to the item.  And a brief summary f$    for each in NewsFeed.entries:
        if len(strip_tags(each.summary)) > 100:
            post_summary = strip_tags(each.summary)[0:100]
        else:
            post_summary = strip_tags(each.summary)
        post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></$                        f'<p>Brief Summary: "{post_summary}"</p>'
        # print(each.summary_detail.value)
        #print(each)

    # Create the actual post.
    post.post_status = 'publish'
    #print(post.content)
    # For Troubleshooting and reworking, uncomment the above then comment out the below, this will print results instea$    post.id = wp.call(NewPost(post))

    try:
        if post.id:
            post.post_status = 'publish'
            call(posts.EditPost(post.id, post))
    except:
        pass
        #print("Error creating post.")

#Get the news feed
for each in blogs:
    newsfeed = get_feed(each["url"])
# If there are posts, make them.
    if len(newsfeed.entries) > 0:
        make_post(newsfeed, each)
        #print(NewsFeed.entries)

Code Project: Fresh RSS to WordPress Digest

I actually briefly mentioned this project when I write about moving from TinyTinyRSS to FreshRSS. This has become a bit of an evolving and ongoing project however, so I’ve decided to catalogue it in it’s own page. This little script worked out much better than I expected, and I’ve modified it a bit over time, and have ideas to modify it going forward even more. Starting off, the code can be found here in this Github GIST.

I’ve left a bit of commented out code that i might use later for troubleshooting or adding additional features. The general gist of the code, it pulls the last 24 hours worth of news stories I have favorited from my FreshRSS install, then formats them into a digest format and posts it here, in this blog. They get sorted into their own category, you can find them here.

This is basically a thing I’ve seen others do that I’ve wanted to do for a while. It’s also partially just for my reference more than anything, it’s sort of a log of everything I have found interesting on a particular day more than anything. Others may or may not find it interest, which is why I also filter that category out of the home page feed.

Originally, it was just a list of URLs and titles. I realized that it might be useful to have SOME idea what the link was about before clicking it, so I have been playing with the summary as well. My first attempt was a bit dodgy because it actually posted the entire article as the summary. Currently, it just arbitrarily chops it off at a few hundred characters. I want to improve it even farther at some point by pushing it through some summarizing AI and getting an actual proper summary but I have not gotten there yet.

There re a few other things I want to add but I’m not sure they re easily possible. Firstly, I would love to be able to parse some sort of categories into the digest. So say, all the “Video Game” links are together and Music links are together. FreshRSS has categories but they don’t seem to show up in the feed anywhere.

This would also allow me to split these posts between this blog and my other blog, Lameazoid. I do share interesting video games news from FreshRSS, but I mostly don’t share Toy related articles, because it feels a little TOO FAR out there for what I want to post to this blog. If there were a way to have the categories, I could easily have the script split the feed by categories and post a digest to each blog.

I also wish there was a way to add my own notes and commentary occasionally. I don’t think it showed up in the feed either, but TinyTinyRSS had a notes feature. I am not sure if FreshRSS has that as well. I probably should try to at least suggest these features to the creators on GitHub, or maybe get really adventurous and create my own plug-ins for FreshRSS to accomplish these tasks.