This is one of those quick and kind of dirty projects I’ve been meaning to do for a while. Basically, I wanted a script that would scrape all of the top level comments from a Reddit post and push them out to a list. Most commonly, to use on /r/AskReddit style threads like, well, for this example, “What is a song from the 90s that young people should listen to.”
Basically, threads that ask for useful opinions on list. Sometimes it’s lists of websites or something. Often it’s music. The script here is made for music but could be adjusted for any thread. Here is the script, I’ll touch on it a bit in more detail after.
## Create an APP for Secrets here:
## Thread to scrape goes here, replace the one below
url = "https://www.reddit.com/r/Music/comments/10c4ki0/name_one_90s_song_kids_born_after_2000_should_add/"
## Fill in API Information here
reddit = praw.Reddit(
user_agent= "script by u/", # Your Username, not really required though
submission = reddit.submission(url=url)
submission.comment_limit = 1
for x in submission.comments:
with open("output.txt", mode="a", encoding="UTF-8") as file:
if "-" in x.body:
The script uses praw, Python Reddit API Wrapper. A Library made for use in Python and the Reddit API. It requires free keys which can be gotten here: https://www.reddit.com/prefs/apps. Just create an app, the Client ID is a jumble of letters under the name, the secret is labeled. User Agent can be whatever really, but it’s meant to be informative.
The thread URL also needs filled in.
The script then pulls the thread data and pulls the top level comments.
I’m interested in text file lists mostly, though for the sake of music based lists, if I used Spotify, I might combine it with the Spotify Playlist maker from my 100 Days of Python course. Like I said before though, this script is made for pulling music suggestions, with this but of code:
if "-" in x.body:
It’s simple, but if the comment contains a dash, as in “Taylor Swift – Shake it Off” or “ACDC – Back in Black”, it writes it to the file. Otherwise it discards it. There is a chance it means discarding some submissions, but this isn’t precision work so I’m OK with that to filter out the chaff. If I were looking for URLs or something, I might look for “http” in the comment. I could also eliminate the “if” statement and just have it write all the comments to a file.
I want to start off by saying, there isn’t going to be a ton of code here, and if there is code, it’s going to be super dirty. I’m fairly good at making code for “private use” that is pretty insecure, and not so great at code that’s scrubbed up and user friendly to distribute to others.
I’ve been working a bit on some local code projects, specifically for my little private “Dashboard” that runs on my file server. One project I’ve wanted to try for a while is a dynamic network tracker tool. I’ve looked into some options available, and they all seem to run as a plug in for some complicated 3rd party analytics software that often has some goofy complicated set up procedure that’s beyond “apt-get” or even just dumping a bunch of files in a web server directory.
This project is both kind of simple and not. It was fairly simple in set up and execution, but it’s somewhat complex in design. The first job was getting a list of currently connected devices on the network. This is easily done via the command line with an arp-scan request.
sudo arp-scan --localnet
The output of which looks something like this:
Using a pipe, I can shove all of this into a text file, which contains everything above.
sudo arp-scan --localnet | scan.txt
The trick is, how to display this output on a webpage. One way would be to pull it from a database. Pulling data from MySQL is pretty easy, dumping it to a pretty looking table is also easy. The harder part is getting the output of arp-scan to MySQL in a useful manner.
This is where Python comes into play. I am sure there are other methods or languages available, but like Python, and mostly know how to use Python. Essentially, I wrote a script that would open the file, scan.txt, that was created above. I am only concerned with lines that contain IP addresses, so I used the function “is_number()” to check if the first character of each line is numeric, if it is, it runs through a couple of operations.
Firstly, the output of arp-scan is tab delimited, so I can use the “split” function on “\t”, and dump the result into an array. This gives me an array of the IP address, MAC address, and Manufacturer. This sticks a new line in with the Manufacturer, so I did a “replace” on \n in the third item of the list. Lastly, I wanted the IPs to be uniformly formatted, so I write a little function that would add in leading zeros to the IP octets.
Finally, the Python builds an SQL statement from the line’s list, and make a call to the server to insert the values. A modified version of this code that just displays the resulting SQL commands instead of executing them is below.
# Open a file
return True except ValueError:
octets = ipstring.split(".")
for i in octets:
def add_zero(shortstring): return "0"+shortstring
mydb = MySQLdb.connect(
mycursor = mydb.cursor()
fo = open("scan.txt", "r")
#print ("Name of the file: ", fo.name)
# read each line of the list
for line in fo:
#check for lines that contain IP addresses
if is_number(line): #Convert lines into list
line_list = line.split("\t")
#remove line delimitors
#Make IP Octets 3 digits
line_list = format_ip(line_list)
SQL = "INSERT INTO arpscans (ip, mac, mfg) VALUES ("+line_l$ print SQL
It’s not super pretty, but it was a quick way to make sure everything came out looking correct. The table I used is called “arpscans” and contains columns called, “ip”, “mac”, “mfg”, and “last_seen”. The time stamp is an automatically generated time stamp.
I then created a shell script that would run the arp-scan piped into scan.txt then runt he python script. I set up this script in the root crontab to run once every half hour. Root is required to run the arp-scan command, so my user crontab wouldn’t cut it. Everything ran fine when I manually did a run of the script using sudo. The PHP on the other end out output the latest values based on the time stamp to a webpage.
This is where I ran into my first major hurdle. The script wasn’t running in cron. After a lot of digging and futzing, I found that basically, when cron runs the script, it works off of different environmental variables. I had to specify in ,y bash file, specifically where each command existed. The end result looks something like this:
Eventually the scan was running and posting data automatically as expected. After a few days, I ran into my second major issue. There was simply put, way too much data for my crappy old “server” to handle. The webpage slowed to a crawl as the table contained something like 9000+ entries. It’s possible and likely that my query was also rubbish, but rather than stress more figuring it out, I modified all of the code again.
Instead of adding a new entry for every MAC address every scan, I changed it to check if there already was an entry, and simply update the last_seen time. I had originally designed the system with the idea of getting legacy data for attached devices, but decided I only really cared about a generic history.
The new webpage table now displays all devices, current and previously seen, with the last seen date.
A few issues came up on the output end as well, though none of them were super hard to correct. One, I wanted a way to sort the table by clicking the headers. There are several scripts you can toss in your code to do this online.
I also wanted more data about each device, so I added a form where I could fill in more data about each device. Specifically, the network name, if there was one, a description of what the device is, the User of the device (which family member or if it’s just a network device). This also checks and updates based on MAC address.
I also ran into an issue with MAC addresses and my Network extender. When devices are connected to the Network Extender, the first half of the MAC is replaced with the first part of the Extender’s MAC, though they retain the last half. I may eventually write some code to detect and merge these entries, but for now, I’ve simply been labeling them in the description as “(Extender)”, so I know it’s the same device on the other connection.
The final end result looks something like this:
I used to have the network super organized before I moved, but the new router doesn’t work nicely with my Pi DHCP server, so I have not gotten things quite as nicely sorted as I would like. Everything in the picture is sorted, but above .100, it’s a mess. I also can’t assign IPs to some devices at all, like the DirecTV gear or my Amazon Echos, which is really annoying.
One of my future projects will hopefully correct this, as I want to put a second router on the network with DD-WRT, between the ISP gateway and everything else.
Overall, it’s been a fun little exercise in coding that combined a lot different techniques together in a fun way.
So, I made an app. A for real, runs on things app. More accurately, I followed a little tutorial to make an app, as part of the Xamarin Challenge over at Thurrott.com. The app itself is a little weather app, it even includes location based weather and forecasting. I know this isn’t particularly impressive but I think it’s pretty neat.
Part of the point of the exercise is that it show off the cross platform ability of Xamarin as a development plaform. The end app uses the same code and runs on Windows 10, iOS and Android. Unfortunately, I don’t own a MAC or an iOS device to test the iOS code but I was able to run both the Android and Windows 10 versions of the App. I even ran the Android APK on my Fire Tablet.
I can’t say I learned a ton about how to actually make apps using Xamarin, though I plan to poke through the code provided more later. What I learned more of was how to trouble shoot Visual Studio, which seems to be a bit more than buggy. There is a forum set up for people looking for assistance on this contest and there are a lot of issues that all seem unrelated to each other but all related to issues with Visual Studio.
For example, I had issues getting the location based weather to work, until I went through and updated the Android Emulator files and build an emulator that ran on Android 7. In a later step I found several of the NuGet packages weren’t installing properly, I never really figured out why but I ended up having to add them individually to each of the app platforms rather than the blanket “Install this on all platforms” system.
In the end, I did manage to get through and the app says everything was submitted and accepted.