Code Project: Automated List From Reddit Comments

This is one of those quick and kind of dirty projects I’ve been meaning to do for a while. Basically, I wanted a script that would scrape all of the top level comments from a Reddit post and push them out to a list. Most commonly, to use on /r/AskReddit style threads like, well, for this example, “What is a song from the 90s that young people should listen to.”

Basically, threads that ask for useful opinions on list. Sometimes it’s lists of websites or something. Often it’s music. The script here is made for music but could be adjusted for any thread. Here is the script, I’ll touch on it a bit in more detail after.

## Create an APP for Secrets here:
## https://www.reddit.com/prefs/apps

import praw

## Thread to scrape goes here, replace the one below
url = "https://www.reddit.com/r/Music/comments/10c4ki0/name_one_90s_song_kids_born_after_2000_should_add/"

## Fill in API Information here
reddit = praw.Reddit(
    client_id="",
    client_secret= "",
    user_agent= "script by u/", # Your Username, not really required though
    redirect_uri= "http://localhost:8080",
)


submission = reddit.submission(url=url)
submission.comments.replace_more(limit=0)
submission.comment_limit = 1

for x in submission.comments:
    with open("output.txt", mode="a", encoding="UTF-8") as file:
        if "-" in x.body:
            file.write(str(x.body)+"\n")
            # print(x.body)

The script uses praw, Python Reddit API Wrapper. A Library made for use in Python and the Reddit API. It requires free keys which can be gotten here: https://www.reddit.com/prefs/apps. Just create an app, the Client ID is a jumble of letters under the name, the secret is labeled. User Agent can be whatever really, but it’s meant to be informative.

The thread URL also needs filled in.

The script then pulls the thread data and pulls the top level comments.

I’m interested in text file lists mostly, though for the sake of music based lists, if I used Spotify, I might combine it with the Spotify Playlist maker from my 100 Days of Python course. Like I said before though, this script is made for pulling music suggestions, with this but of code:

        if "-" in x.body:
            file.write(str(x.body)+"\n")
            # print(x.body)

It’s simple, but if the comment contains a dash, as in “Taylor Swift – Shake it Off” or “ACDC – Back in Black”, it writes it to the file. Otherwise it discards it. There is a chance it means discarding some submissions, but this isn’t precision work so I’m OK with that to filter out the chaff. If I were looking for URLs or something, I might look for “http” in the comment. I could also eliminate the “if” statement and just have it write all the comments to a file.

FreshRSS and RSS Feed Posts

Keen observers (ha ha ha no one reads this), might have noticed that a few posts of links showed up in the feed.  These are basically, stories I read in my RSS reader that I found interesting, and wanted to share, or at least, keep track of.  The posts as of now are a little ugly, and I’ll probably clean up the formatting over time, but I wanted to go ahead and write a bit about the process.  I’ll have the Code on Github at some point.

As for the factors, firstly, this is something I’ve wanted to have on my blog for a while.  Like a long while.  I might even try to see if there are ways to better slit up the links by topic later.  A fair number of blogs I subscribe to have these sort of link digest posts, and I’ve always just liked the idea.  It’s also good for personal reference to when I may have read something.  It is limited as it only comes from y RSS Reader.

Speaking of my RSS Reader.  I’ve moved on from TinyTinyRSS, for a few reasons.  One, the interface is a little meh, honestly.  Maybe the newer version is better but it’s only available in Docker, and Docker is such a PItA to use.  Also, while looking for alternatives, it sounds like the folks who make TTRSS are kind of a bunch of gatekeeping jerk types, and I’d rather not support that.  I also find the need to keep the update daemon running with Screen to be a pain.  So I’ve moved over to FreshRSS, which I just run locally on a Raspberry Pi.  I may move it to a publicly accessibly machine at some point, but I am not entirely convinced that TT-RSS wasn’t the entry point for my previous server malware woes.

So, like TT-RSS, Fresh RSS has a way to get an RSS feed out of your Favorited posts.  In the past I’ve used tools like IFTTT to automate posting these links around, but I don’t use IFTTT anymore for reasons I’m not going into.  Fortunately, I’ve been working to become a pretty good Python coder for the last month or so.  So instead I wrote a script.  

It’s not even a particularly complicated script.  There are only two things it really needs to do, get new articles, and then post them to WordPress. Since the script runs locally, on the same Raspberry Pi even, it easily can reach and pull the RSS feed.  One nice thing I noticed with Fresh RSS, the feed included a time interval, so just getting new posts was super simple, because the interval is just “24” for “24 hours”.  The script eventually will run on a cronjob at the exact same time daily.  Anyway, after pulling the RSS, the entries are already in an easily usable Dictionary.  which gets fed into the construction of the WordPress Post.

def get_feed(feed_url):
    NewsFeed = feedparser.parse(feed_url)
    return NewsFeed

The posting part was pretty easy as well, WordPress has an API, and Python also has a library that can use that API.  It just needs some log in information and a post payload to send.  

def make_post(NewsFeed):
    wp = Client(f'https://{wp_url}/xmlrpc.php', wp_user, wp_pass)
    post = WordPressPost()
    post.title = f"{cur_date} - Link List"
    post.terms_names = {'category': ['Link List'], 'post_tag': ['links', 'FreshRSS']}
    post.content = f"<p>Blogging Intensifies Link List for {cur_date}</p>"
    for each in NewsFeed.entries:
        post.content += f'{each.published[5:-15].replace(" ", "-")} - <a href="{each.links[0].href}">{each.title}</a></p>'

The trickiest part was formatting the date a bit prettier.  I mentioned cleaning up the formatting a bit, I’m thinking maybe a simple invisible table, so the date and the links don’t wrap oddly like they do now.   i also added a check that if there are no new favorited posts, it will skip making a post.  Otherwise I’ll end up with empty posts on days I forget to check my feed reader

While writing the script, at first I was just outputting a text copy of the post to the console until satisfied.  Eventually, I pushed out a real post, then verified that things worked.  The next day, was just a straight test by opening the project, then running it again.  The third day, I copied the files and installed the lobraries needed, then posted from the Pi.  Phase 4 of this will be to set up Cron to run it automatically.  If that works then it will certainly, “just run” for the foreseeable future.

Code Project: My Home Dashboard

So, this isn’t going to really have any code. I might, sometime int he distant future, publish some code, but this whole thing is very much a “add things as I go, ongoing project. The base code itself isn’t particularly complicated though. It’s a pretty simple HTML/PHP/CSS layout that wraps around various modules I’ve been building. I keep mentioning the Dashboard though when talking about the various projects, so I figure I should do a quick run down on what the Dashboard involves.

I’m actually building a more complex iteration of this project at work as well, to be used internally by my work group. The work one is considerably more complex, for example, it has a much more robust Admin area that is growing with features to manage locations, manage users, manage user teams, etc. The base layout framework is shared between the two dashboards, but the work one has a lot more actual functionality. Because I am the only one using the home version, I generally just code everything in, so it’s less modular. I also have to translate any code I write for one or the other version between the different Database back ends. I use MySQL at home, I use MS SQL at work.

At it’s base, it’s just a webpage on my project webserver that displays information. Some of that information is useful, some is just there to fill space and to practice coding something up. i mentioned above, it’s essentially a Header, sidebar, and Footer that wrap around a variable content box. On the home page, the content box contains what I have been calling “Quick Cards” with bits of information, that sometimes link to larger chunks of data. This is what it looks like, at the moment, on the home page.

I’ll dive in a bit on some of the menus and content but I am going to start with the Quick Card boxes, in order.

The Weather box seemed like an obvious choice for at a glance information. I want to make it link to a sub page with more forecast data, but for now, it just displays the current weather conditions for my location. Unfortunately it’s built on the Dark Sky API, which very recently announced is closing down, so I’ll have to find a new API to use.

Next is the COVID-19 stats widget. This is the other side of the COVID-19 Tracking Python Script I posted recent. It just pulls and displays the most recent information that the script has pulled. I may update this to link to a page with some timeline graphs on it, once I figure out how to put a data graph in a webpage.

Network devices is the most robust of all of the modules I’ve built so far. The Quick Card just shows the current number of active devices on the home network. Clicking it opens the Network Device page I talk about here.

Social Accounts is just link list to the various Twitter Accounts I have. I want to change this to be a modifiable list eventually, but for now it’s just a list. It does do a database pull to build the URLs, but I have not added a configuration page yet.

The next box displays how many unread posts are on each of my TT-RSS accounts. After Google killed Google Reader, I set up TinyTiny-RSS on my webserver and started using it for my feeds. I became overwhelmed so I broke all of my feeds into themed sub accounts. I would link to each sub account, but it’s all the same link, just with a different log in, so the links would be useless. Normally, I just use container tags to keep the different log in instances open.

Lastly is a tracker for Reddit Karma for several Reddit Accounts I have. Like my RSS feeds, I have broken my Reddit subs out into seperate themed accounts. i don’t really care that much about Reddit Karma, but I wanted to play around with APIs and JSON, so I figured this would be an easy project. I will probably post the script used in the future, but it’s essentially identical to the recently posted COVID-19 script. In fact the COVID-19 script was adapted from the Reddit Karma script.

Along the top navigation bar are some drop downs with useful links that don’t really have “At a glance” data. The first two, “My Websites” and “My Hosted Apps” are just drop downs with links to the Blogs I manage and my Webhosted apps for Email and TT-RSS.

The next drop down is similar in nature, in that it’s a list of links, but this one has an admin page so I can maintain the list as it changes. It also hasn’t quite found a home yet. I had it in the sidebar for a while, then I had it in a Quick Card, now it’s int he Navigation menu. It’s a list of links to web services on my internal home network. It’s linkes to Routers, Raspberry Pis, IP Cameras, my NAS, and to various things I have set up on my Project Server.

Next to that is the Gas Tracker, which is very much a WIP page. When I bought my car back in 2014, I decided I wanted to track my Gas consumption for the life of the car. Currently this lives in an Excel Spread Sheet on One Drive. I wanted to see about translating it into my own webpage and using SQL as the back end. Currently it just displays a table of data that I imported from Excel. There isn’t any way to add new data yet and it doesn’t calculate the Price/Gallon or total money paid or anything like that.

Lastly is an old project I did called Tweeter that I got up and running again and embedded into the Dashboard. Tweeter itself is fairly self contained and I will probably do a couple of detailed write ups on it in the future and post the code at that time. I also want to update it to use SQL as the back end so I’ll do a second write up when that happens.

A while back, I was looking for a way to automate posting Tweets, mostly so I could share links to articles, but have them space out over the course of a day. I couldn’t find a decent free service, and thus, Tweeter was born. Tweeter is a two part solution, it’s a PHP page that writes a text file, and some scripts (python and bash) that runs on a schedule and posts the contents of that text file to Twitter, one line at a time, on whatever cron schedule is set. I’m not going to go into anymore here, but I promise to post about it int he future. It’s also a little ugly and probably insecure as hell, but it works well.

The main fun of integrating Tweeter, the box is 140 characters wide, the same as a tweet. So I had to modify my core code framework to have a toggle for pages that don’t display the sidebar. It wasn’t anything complicated, but I hadn’t considered that need, and so I fixed it. That’s kind of part of the fun and point of doing these sort of code projects.

Tracking Covid-19 into a database using Python

At some point I need to do a little write up on my Home Dashboard Project, it’s inspired quite a few minor projects such as this one to make little web widgets. The dashboard is the simple part, it’s just dumping a database query into a table. Honestly, the script was easy too, because I adapted it from another script I built recently.

With COVID-19 all over the news, I wanted to add some stats to my dashboard for my state. Not so much because there aren’t already 1000 other places to get the numbers, but more to see if I could do it. The hardest part was finding a feed to stats. Then I found CovidTracking.com. Which has a nice little API. I then set to work adapting another script to pull from this API to dump stats for Illinois into the database. I am only interested in Illinois, but the script is built so the user can put a list of states into an array, and then it will loop through and add them all to the database.

The script is below, but this also requires some set up in SQL. Nothing complicated, mostly INT fields. an id as an int and primary key, negative_cases, positive_cases, and deaths, all as INT, state as a varchar with a length of 2, though technically the length is optional, then finally date_stamp as a DATETIME field with a default value of the current timestamp. The DATETIME isn’t directly touched here, but it makes it easier to manipulate the data later.

The code also requires you enter your database credentials. I’ve nammed my table “il_covid_stats, but you can change that to whatever you want down below in the “SQL = “INSERT….” line. I’ll leave it up to you what to do with the data, I pull mine into a PHP page.

Anyway, here is the python code:

# Python Covid Star Tracking to SQL
# use of json package
# Sample URL: https://covidtracking.com/api/states?state=IL

import json
import requests
import time
import MySQLdb

mydb = MySQLdb.connect(
  host="localhost",
  user="YOUR_DB_USERNAME",
  passwd="YOUR_DB_PASSWORD",
  database="YOUR_DB_NAME"
)
mycursor = mydb.cursor()
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'

#States to check as an Array, two letter abbreviations
states = ['IL']

def data_getter(statename):
  ####when reading from remote URL
  url = 'https://covidtracking.com/api/states?state='+statename

  user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
  headers = {'User-Agent': user_agent}
  response = requests.get(url,headers=headers)
  html = response.content
  statedata = json.loads(html)

  pos_cases = (statedata['positive'])
  neg_cases = (statedata['negative'])
  deaths = (statedata['death'])

  vals = (pos_cases,neg_cases,deaths,statename)

  mysqlinsert(vals)

def mysqlinsert(vals):
  ## This database name and columns can be changed but should be pre made in your database
  SQL = "INSERT INTO il_covid_stats (positive_cases, negative_cases, deaths, state) VALUES (%s, %s, %s, %s)"
  mycursor.execute(SQL, vals)
  mydb.commit()

# Loop through URLs for each state
for i in states:
  data_getter(i)

Code Project: Network Map Webpage, Making it Better

I wrote a bit about my Network Map Webpage recently. It’s part of a larger home dashboard project I’m working on, but as part of that I’ve updated things a bit to make them more streamlined and easier to use. The biggest problem with the page as it was originally coded is that it shows everything. I’ve cycled most of my regularly used electronics onto the network so they could be captured by an arp scan, though not all of them are on all the time. For example, I still have a Raspbery Pi and Arduino set up to capture temperature data. I also have several Next Thing CHIP devices, though Next Thing has gone out of business. In total, between my IOT stuff and laptops, phones and tablets and the duplicate IPs from the network extender, I have 55 devices in the raw table.

So I set out to make this more manageable at a glance. My original query in my PHP code looked something like this:

SELECT ip, arpscans.mac, arpscans_known_macs.device_name, arpscans_known_macs.device_description, last_seen, device_owners.user_name FROM arpscans LEFT JOIN arpscans_known_macs on arpscans_known_macs.mac = arpscans.mac LEFT JOIN device_owners on device_owners.id = arpscans_known_macs.device_owner ORDER BY ip

By slipping in “WHERE last_seen >= NOW() – INTERVAL 5 MINUTE” just before ORDER BY, I can make the code return only currently connected devices. The ARP scan runs every 5 minutes, anything that has a last seen time stamp within 5 minutes is assumed to still be attached. This interval could be shorted to almost real time, but I don’t really need that much of a check.

I can also view all disconnected devices with a simple change of the above command, making it “WHERE last_seen <= NOW() – INTERVAL 5 MINUTE”. This wouldn’t work if I were still keeping historical data, but I essentially only capture the last seen data for any device. Essentially what this does is return everything not seen in the last 5 minutes.

I also broke out my PHP code that builds my table from my query into it’s own PHP function. This was I could set the variable $SQL for the active devices, call the function to build the table, then set $SQL for inactive devices and build a second table, under the first.

I immediately scrapped this, because it was ugly. Plus, sometimes I do want to see “everything”.

Enter some GET calls and an if/else statement.

	if($_GET['show'] == "active") {
	// SQL for selecting active devices
	$tabletitle="Active Devices";
	$sql = "SELECT ip, arpscans.mac, arpscans_known_macs.device_name, arpscans_known_macs.device_description, last_seen, device_owners.user_name FROM arpscans LEFT JOIN arpscans_known_macs on arpscans_known_macs.mac = arpscans.mac LEFT JOIN device_owners on device_owners.id = arpscans_known_macs.device_owner WHERE last_seen >= NOW() - INTERVAL 5 MINUTE ORDER BY ip";
	}
	elseif($_GET['show'] == "inactive") {
	// SQL for selecting active devices
	$tabletitle="Inactive Devices";
	$sql = "SELECT ip, arpscans.mac, arpscans_known_macs.device_name, arpscans_known_macs.device_description, last_seen, device_owners.user_name FROM arpscans LEFT JOIN arpscans_known_macs on arpscans_known_macs.mac = arpscans.mac LEFT JOIN device_owners on device_owners.id = arpscans_known_macs.device_owner WHERE last_seen <= NOW() - INTERVAL 5 MINUTE ORDER BY ip";
	}
	else {
	// SQL for Selecting all devices
	$tabletitle="All Devices";
	$sql = "SELECT ip, arpscans.mac, arpscans_known_macs.device_name, arpscans_known_macs.device_description, last_seen, device_owners.user_name FROM arpscans LEFT JOIN arpscans_known_macs on arpscans_known_macs.mac = arpscans.mac LEFT JOIN device_owners on device_owners.id = arpscans_known_macs.device_owner ORDER BY ip";
	}

Basically, if nothing, or a random string is passed by the URL variable “show”, then it goes to the end, and displays everything when accessing the page at index.php. If it passes index.php?show=active, it sets $SQL for showing active devices and if it gets index.php?show=inactive, it shows inactive devices. It also sets a variable called $tabletitle which is just echoed out into some header tags. I then added links across the top of the page to each of these filters.

This allows for a quick and easy toggle of which data is pulled and displayed.

Additionally, I updated the way the Add Device form works. Previously, the form would fill in the MAC, a Device Name and a Device Description, then it would POST to another PHP page that would insert the data into the table, then forward on back to the index page with a header redirect. I’m not going to get into too much detail on it here, but I also integrated the Network Map into my dashboard framework with a header, navigation, sidebar, and footer. It also uses a table based navigation system, so in order to view the network map, I am hitting “index.php?page=4”. Pages basically all need to be wrapped in this structure to work properly, so in order to make things flow better, the Add Device form now POSTs back to the Network Map page itself, which checks to see if the POST variables are set, and if they are, it inserts the new information, before pulling the table.

This also meant slightly altering my page calls to look for “index.php?page=4&show=active” and “index.php?page=4&show=inactive”.

Eventually I want to move the Add Device form to appear at the top of the page, so the whole thing is all handled in one single page.

Lastly, I made up a quick block of code in it’s own page, that simple counts and displays the number of currently connected devices on the network. This block is embedded on the front page of my Dashboard Framework and links to the full Network Map page. The general idea on the Dashboard is to have widgets like this that show quick glance information, with links to detailed information.

I have not built a lot of them yet, but one of the others I have built works somewhat similar to the ARP scanning system. A script makes a call to my TT-RSS instance for each of the segmented accounts I have, then dumps the unread count into a table on the server. The widget shows how many unread articles each topic/account has. I am still really bad about only actually reading the Basic feed (mostly Toys and Video Games).

But I will get into the Dashboard Widgets thing a bit more in a future post probably.