Code Project: Fresh RSS to WordPress Digest V 2
A while back, I talked about a little simple project that I build that produces a daily RSS digest post on this blog. This of course broke when my RSS Reader died on me. I managed to get Fresh RSS up and running again in Docker, and I’ve been slowly recovering my feeds, which is incredibly slow and tedious to do because there are a shitload of feeds, and i essentially have to cut and paste each URL into FreshRSS, and select the category and half the time they don’t work, so I need to make a note of it for later checking and it’s just… slow.
But since it’s mostly working, I decided to reset up my RSS poster. I may look into setting up a Docker instance just for running Python automations, but for now, I put it on a different Pi I have floating around that plays music. The music part will be part of a different post, but for this purpose, it runs a script, once a day, that pulls a feed, formats it, and posts it. It isn’t high overhead.
While poking around on setting this up, I decided to get a bit more ambitious and found out that, basically every view has it’s own RSS feed. Previously, I was taking the feed from the Starred Articles. But it turns out that Tags each have their own feed. This allowed me to do something I wanted from the start here, which is create TWO feeds, for both of my blogs. So now, articles related to Technology, Politics, Food, and Music, get fed into Blogging Intensifies, and articles related to toys, movies, and video games, go into Lameazoid.
I’ve also filtered both of these out of the main page. I do share these little link digests for others, if they want to read them, but primarily, it’s a little record for myself, to know what I found interesting and was reading that day. This way if say, my Fresh RSS reader crashes, I still have all the old interesting links available.
The other thing I wanted to do was to use some sort of AI system to produce a summary of each article. Right now it just clips off the first 200 characters or so. At the end of the day, this is probably plenty. I’m not really trying to steal content, I just want to share links, but links are also useful with just a wee bit of context to them.
I mentioned before, making this work involved a bit to tweaking to the scrips I was using. First off is an auth.py file which has a structure like below, one dictionary for each blog, and then each dictionary gets put in a list. Adding additional blogs would be as simple as adding a new dictionary and then adding the entry to the list. I could have done this with a custom Class but this was simpler.
BLOG1 = {
"blogtitle": "BLOG1NAME",
"url": "FEEDURL1",
"wp_user": "YOURUSERNAME",
"wp_pass": "YOURPASSWORD",
"wp_url": "BLOG1URL",
}
BLOG2 = {
"blogtitle": "BLOG2NAME",
"url": "FEEDURL2",
"wp_user": "YOURUSERNAME",
"wp_pass": "YOURPASSWORD",
"wp_url": "BLOG2URL",
}
blogs = [BLOG1, BLOG2]
The script itself got a bit of modification as well, mostly, the addition of a loop to go through each blog in the list, then some variables changed to be Dictionary look ups instead of straight variables.
Also please excuse the inconsistency on the fstring use. I got errors at first so I started editing and removing the fstrings and then realized I just needed to be using Python3 instead of Python2.
from auth import *
import feedparser
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.methods.posts import NewPost
from wordpress_xmlrpc.methods import posts
import datetime
from io import StringIO
from html.parser import HTMLParser
cur_date = datetime.datetime.now().strftime(('%A %Y-%m-%d'))
### HTML Stripper from https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
class MLStripper(HTMLParser):
def __init__(self):
super().__init__()
self.reset()
self.strict = False
self.convert_charrefs= True
self.text = StringIO()
def handle_data(self, d):
self.text.write(d)
def get_data(self):
return self.text.getvalue()
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
# Get News Feed
def get_feed(feed_url):
NewsFeed = feedparser.parse(feed_url)
return NewsFeed
# Create the post text
def make_post(NewsFeed, cur_blog):
# WordPress API Point
build_url = f'https://{cur_blog["wp_url"]}/xmlrpc.php'
#print(build_url)
wp = Client(build_url, cur_blog["wp_user"], cur_blog["wp_pass"])
# Create the Basic Post Info, Title, Tags, etc This can be edited to customize the formatting if you know what you$ post = WordPressPost()
post.title = f"{cur_date} - Link List"
post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
post.content = f"<p>{cur_blog['blogtitle']} Link List for {cur_date}</p>"
# Insert Each Feed item into the post with it's posted date, headline, and link to the item. And a brief summary f$ for each in NewsFeed.entries:
if len(strip_tags(each.summary)) > 100:
post_summary = strip_tags(each.summary)[0:100]
else:
post_summary = strip_tags(each.summary)
post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></$ f'<p>Brief Summary: "{post_summary}"</p>'
# print(each.summary_detail.value)
#print(each)
# Create the actual post.
post.post_status = 'publish'
#print(post.content)
# For Troubleshooting and reworking, uncomment the above then comment out the below, this will print results instea$ post.id = wp.call(NewPost(post))
try:
if post.id:
post.post_status = 'publish'
call(posts.EditPost(post.id, post))
except:
pass
#print("Error creating post.")
#Get the news feed
for each in blogs:
newsfeed = get_feed(each["url"])
# If there are posts, make them.
if len(newsfeed.entries) > 0:
make_post(newsfeed, each)
#print(NewsFeed.entries)
Josh Miller aka “Ramen Junkie”. I write about my various hobbies here. Mostly coding, photography, and music. Sometimes I just write about life in general. I also post sometimes about toy collecting and video games at Lameazoid.com.