Code Project: Automated List From Reddit Comments

This is one of those quick and kind of dirty projects I’ve been meaning to do for a while. Basically, I wanted a script that would scrape all of the top level comments from a Reddit post and push them out to a list. Most commonly, to use on /r/AskReddit style threads like, well, for this example, “What is a song from the 90s that young people should listen to.”

Basically, threads that ask for useful opinions on list. Sometimes it’s lists of websites or something. Often it’s music. The script here is made for music but could be adjusted for any thread. Here is the script, I’ll touch on it a bit in more detail after.

## Create an APP for Secrets here:
## https://www.reddit.com/prefs/apps

import praw

## Thread to scrape goes here, replace the one below
url = "https://www.reddit.com/r/Music/comments/10c4ki0/name_one_90s_song_kids_born_after_2000_should_add/"

## Fill in API Information here
reddit = praw.Reddit(
    client_id="",
    client_secret= "",
    user_agent= "script by u/", # Your Username, not really required though
    redirect_uri= "http://localhost:8080",
)


submission = reddit.submission(url=url)
submission.comments.replace_more(limit=0)
submission.comment_limit = 1

for x in submission.comments:
    with open("output.txt", mode="a", encoding="UTF-8") as file:
        if "-" in x.body:
            file.write(str(x.body)+"\n")
            # print(x.body)

The script uses praw, Python Reddit API Wrapper. A Library made for use in Python and the Reddit API. It requires free keys which can be gotten here: https://www.reddit.com/prefs/apps. Just create an app, the Client ID is a jumble of letters under the name, the secret is labeled. User Agent can be whatever really, but it’s meant to be informative.

The thread URL also needs filled in.

The script then pulls the thread data and pulls the top level comments.

I’m interested in text file lists mostly, though for the sake of music based lists, if I used Spotify, I might combine it with the Spotify Playlist maker from my 100 Days of Python course. Like I said before though, this script is made for pulling music suggestions, with this but of code:

        if "-" in x.body:
            file.write(str(x.body)+"\n")
            # print(x.body)

It’s simple, but if the comment contains a dash, as in “Taylor Swift – Shake it Off” or “ACDC – Back in Black”, it writes it to the file. Otherwise it discards it. There is a chance it means discarding some submissions, but this isn’t precision work so I’m OK with that to filter out the chaff. If I were looking for URLs or something, I might look for “http” in the comment. I could also eliminate the “if” statement and just have it write all the comments to a file.

100 Days of Python, Projects 58-65 #100DaysofCode

Judging by the comments on the lessons, The lessons are getting harder, though frankly, I am finding they are a bit easier.  I feel like my experience with Web Design is a lot of this.  I also have been looking a bit into how to properly host some of these apps to share here, on my Code Projects Page.  I believe I could set them up to run on a production Flask environment, with different ports, then just map sub directories n the domain to different ports.

But I am doing my best not to get distracted.  Yes, I have been building a few smaller projects here and there lately, but I don’t want to get TOO distracted.

Day 58 – Bootstrap Overview

Not a lot of really say here, like the Web Foundational section, this was a chunk of the Web Dev course offered by the same Instructor.  I may look into picking it up if it’s on sale for cheap, since at this point I’ve done like half of it.  My next plan for a major learning push is going to be Javascript.  I really need to get familiar with Javascript for some work projects.  

The project was essentially using different Bootstrap bits to format a silly Tinder knockoff website, a site for Dogs called Tindog.  Bootstrap is definitely useful, but I generally try to avoid it because it makes things look very samey.  I suppose familiarity in design is useful at times though.

Day 59 and 59 – Blog Capstone Project Part 2

The last part of the Intermediate+ section was the bones of a Clean Blog running in Python Flask.  These two days expanded on the Blog concept a bit, and a few later lessons bring it back around again.  I am almost interested in trying to use the blog once finished, except that I already have plenty of proper outlets for Blogging.  Wordpress works fine.  

The core concept was using Bootstrap to format the blog up nicely, as well as setting it up to reuse Header and Footer files.  It’s a common method for webdesign, and it’s one I have been using since Geocities when I discovered something called SHTML which would let me write Blog Texts in HTML, then encapsulate them into a common top and bottom.

Day 61 – Flask Forms

An intro to easier to use Flask based forms and libraries.  The project itself was to construct a simple log in page, which lead to a Rick Roll when unlocked.  I could see this piece being useful later to add in on the more complete Blog project.

Which brings to a bit of an aside topic.  A lot fo these projects feel disconnected, but really, they are demonstrating a proper way of handling larger Code Projects.

Break it into parts.

We built a simple blog page that pulled posts from a CSV file.

We formatted that blog page.

We created a log in form, that could easily be slipped into a blog page.

Later lessons work with more dynamic forms and data and SQL Lite databases.

As the pieces get laid out, the end goal becomes more and more clear.  My prediction is that the later Blog Capstone project will amount to, “Take everything from the last ten weapons and mash it into a functioning Blog app.

Day 62 – Coffee Shop Wifi Tracker

Like I said, building on the last.  For this lesson, we built a little table that would list Coffee Shops in  pretty Bootstrap table with ratings for Coffee Quality, Wifi Quality, and Power Outlet availability.  Instead of just a form that takes an input and verifies it’s good, it actually inserts new data into the CSV that holds all the coffee shops.

Day 63 – Book Collection App

I may revisit this one later to build an app for my other collections.  And to make it more robust.  The core though is once again, building on the Coffee App, essentially.  Instead of just a tracked list with an Add Form, we added persistence by learning about SQL Lite databases.  So the data persists even if the server is shut down.  

I took a few extra non required steps on this app and reused the Coffee Shop code to make this app look much nicer.  The core app and instructions just had a very basic page with an unordered list.  I turned it into a table with a bit more formatting and color.

Day 64 – Top Ten Movies App

This whole project was essentially the same as the Book Collection App, except it was intnded from the start to look prettier.  The main difference is that the add process had a few extra steps to pull from the API of The Movie DataBase website to get data about each film, and have the user select from a list of results.  In the instructions for the class, the idea was to make multiple API calls for the list, then specific movie data, but I had already worked ahead on the project and instead it pulls the data, the user selects the movie, then it’s just done.  It was a bit complicated to get the data to pass around between pages since it was a dictionary and not just a single variable, but I got it working using Global variables.

I mostly try to avoid Global variables, but it felt like this was a good use case and made sense to use given the way Flask works.

I am thinking I may integrate the log in app created previously with this website and use it to test out mapping some of these projects to a domain to make it a for real, live website. Also, you can’t tell it in the snapshot but the posters do this cool flip over effect when you mouse over them to show more information. Not really a Python thing, but it’s still a neat effect.

Day 65 – Web Design Principles

No project for this day, it’s another slice of the Web Developer course from the same instructor.  I’m definitely going to keep an eye out for a deep discount, especially with Black Friday coming, since I’ve basically completed half of that course now.

I was hoping to fit all of this more advanced Flask based projects into one page but it feels like this post is getting pretty long so I’m going to go ahead and break it up here.

100 Days of Python, Projects 45-50 #100DaysofCode

Things are continuing to be interesting and useful here with the introduction of Beautiful Soup, a tool used to parse unstructured data into usable structured data.  Well, more or less that’s what it does.  Useful for parsing through Scraped Web Page data that does not have it’s own API available.

As normal, everything is on GitHub.

Day 45 – Must Watch Movies List and Hacker News Headline Scraper

As an introduction to using the tool, Beautiful Soup, we had two simple projects.  The training project actually feels more useful than the official project of the day, though I also remixed the training project a bit.

The “Project of the Day” was to scrape the Empire Magazine top 100 Must watch movies and output them to a text file.  I am pretty sure this list does not change regularly and this it’s sort of a “one and done” run.

The trainer project was more interesting, because it scraped the news headlines from Hacker News, a Reddit-like site centered around coding and technology that is absolutely bare bones in it’s interface.  The course notes were just to get the “top headline of the day”, but I modified mine to give a list of all headlines and links.  I will probably combine this with the previously covered email tools to get a digest of stories each day emailed to myself.

Day 46 – Spotify Musical Time Machine

This one combines the web scraping with the use of APIs which was covered previously.  Specifically the Spotify API.  The object is to get the user to input a date, then scrape the Billboard Top 100 for that date and create a Spotify Playlist based on the return.

This one was actually tricky and, as I often do, I added a bit to keep it robust.  Firstly I created a function to verify if the entered date was, in fact, a valid date.  Knowing my luck, there is a function that does this in Date Time, but writing it up was fun.  It could be better though, it only verifies if the day is between 1 and 31, for example.  Something I may clean up later I think.

The real tricky part was dealing with the scraping.  Billboard’s tables are not very clean and not really scrap-able.  I had come up with a way to get all the Song Titles, but the resulting list was full of garbage data.  I set about collecting the garbage data out by filtering the results list through a second list of keywords, but I noticed someone int he comments had found a simple solution of using Beautiful Soup to search for “li” (list items) with an “h3” (heading 3), which easily returned the proper list.  

So I tried the same for the artists, filter by “li” then by “span” which …. returned 900 items.  So I added another filter on the class used by the “span” containing the artist, which did not help at all.  Fortunetely, I already had solved this problem before while working on the Song List.  I created a list of keywords and phrases to filter, then ran my result across it, eventually I was able to output 100 sets of “Song Title – Artist name”.

The real tricky part was using the Spotify API.   Ohhhh boy what a mess.  There seems to be several ways to authenticate, and they don’t work together, and the API Documentation for Spotify and SpotiPy are neither amazing. It took a lot of digging on searches and testing to get the ball rolling, then some more help with code around the web.  But hey, that’s part of what coding is, “Making it work”.

The first issue was getting logged in, which meant using OATH and getting a special auth token, which Spotipy would use to authenticate with.  

The second issue, once that was working, was to create the playlist, which didn’t end up being too hard, just one line of Spotipy code and output the goof ID key to a variable from the response. Still, I deleted so many “Test List” playlists from my account.

So, the real tricky part, was that Spotify doesn’t work super great if you just search with “artist” and “track”.  Instead you get the ID of the artist, then search within that artist for the track, which works much more smoothly.  Why? To add tracks to a playlist, you add them by Spotify IDs.  Thankfully, I could throw a whole List of them up at once.

The end result works pretty flawlessly though, which is cool.  Though It also shows some of the holes in the Spotify Catalogue as you get into older tracks.  My playlist for my birthday, in 1979, is missing 23 tracks out of 100.

Also, I may look into if there is an API for Amazon Music int he future, since that is what I use instead of Spotify, sometimes.

Day 47 – Amazon Price Tracker

Ok, this one will actually be useful to me in the long run.  Like actually useful.  I already use sites like Camel Camel Camel but running my own tracker would be even better.  Especially because one of my other primary hobbies is collecting Plastic Crack (toys).  Geting deals on things is definitely useful, especially given how expensive things are these days.

Also, I have not found a good way to monitor for sales/price drops on eBooks, which is another advantage to straight scraping web pages.  

So I even added to this one a bit.  Instead of looking for one item, it reads links and desired proces from a text file.  Now, if you look at the code, it probably could be cleaned up with a better import, treating it as a CSV instead of raw text, but I wanted to keep things as simple as possible for anyone who might run this script to monitor proces.  It’s just “LINK,PRICE”.  Easy, simple.

Day 48 – Selenium Chrome Driver

The Day 48 Lesson was an intro to the Selenium Chrome Driver software.  This is a bridge tool, that I imagine can connect to many languages, but in this case we used Python, that can open it’s own dummy web browser window, then read and interact with it.  

So the first bit was just some general example, followed by actually using it to pull the events list from Python.org and dump them into a dictionary.  I could actually see this being useful for various sites because so few sites have easy to find calendar links for events.  I’m sure there is some way to add calendar events to a calendar with Python.  Just one for the “future projects” list.

Afterwards we learned about some interaction with Selenium, filling in forms and clicking links to navigate Wikipedia.  

Finally the day’s project was to automate playing a Cookie Clicker game.  These “Clicker” games are pretty popular with some folks and basically amount to clicking an object as quickly as possible.  The game includes some upgrades and the assignment itself was pretty open on how to handle upgrades.  There was a sort of side challenge to see who would get the highest “Cookies Per Second”.  I set mine up to scan the prices each round and if something could be bought, buy it.  This got me up to about 50 CPS after 5 minutes.  It could be better.  I may go back and adjust it to stop buying lower levels once a higher level can be bought, which I think might be a better method.  Why buy Grandmas when you could buy Factories.

Day 49 – Linked in Job Applier

So, I completely overhauled this one, but kept it in the spirit of things, because the point is more to practice using Selenium in more complex ways.  The original objective, was to make a bot that would open LinkedIn, sign in to your account, go to the Jobs Page, search for “Python Developer”, find jobs with “Easy Apply” and click through the Apply Process.

I am not in the market for a job, so applying for random jobs seems like a dumb idea.  I also use 2-Factor on my LinkedIn account, so logging in automatically would be quite impossible.  It was suggested to make a “Fake Account” to get around this but that seems a bit rude.  It also suggested simply following companies instead of applying, but I’d rather not clutter up my feed with weird false signals.

So instead…

My Bot will open LinkedIn, go to to the Jobs Page for each term in an array of job terms individually, (for the test I used “Python Developer” and “Java Developer”).  Then it takes those results, strips out the Company Name and URL to the Job Opening, and compiles them into an email digest that it sends out.  

One issue I did have is that LinkedIn apparently uses different CSS for Chrome versus Firefox, because I was just NOT getting the results back for the links to each job, and it turns out the link bit has a different Class in Chrome, which Selenium was using, than Firefox, which I was using to inspect code (and use as my browser).

Anyway, it works in the spirit of what was trying to be accomplished, without actually passing any real personal data along.

Day 50 – Tinder Auto Swiper

So, I am really not in the market to use Tinder at all.  I was going to just skip this one.

Then I decided, “You know what, I can make a fake profile with a “https://www.thispersondoesnotexist.com/” profile.

But then it seems dumb to get people to match with a bot.

No wait, I can set up the bot to Reject everyone, swipe, whatever direction “reject” is.  No matches!

Oh, it needs a log in via Google, Facebook, or Phone Number.  Never mind.

No wait, I have some old Facebook Profiles for a couple of my cats, I will just use one of those to log in with!

Oh, it still wants a phone number.

So anyway, I decided even trying to fake it was not worth the trouble.   But hey, Halfway there!

100 Days of Python, Projects 1-14 #100DaysofCode

I’ve mentioned before about the concept of “always learning”.    One of those thing is coding.  I’d like to think I’m actually pretty good at basic to intermediate level coding, though I am certainly not an expert.  I kind of feel like I am at a point where I would definitely like to “level up” my ability a bit.  So I’ve been working on the Challenge, though a course on Udemy.  Specifically, 100 days of Python, and specifically, this course.

I’ve also been using this as an good excuse to sharpen up my Github skills a bit, so you can follow my progress along in my Github Repository.  I also figure I could talk about about some of my though processes and flow here as well, though some of the projects are very simple, so there may not be a lot to say about them specifically.  Especially since, frankly, I am already beyond the “Beginner Level” of this course.

I thought about making a post for each module, but that’s kind of overkill as well, so instead I’ll just break it up a bit across maybe the skill levels, or whenever I feel like it.

Project 1 – Band Name Generator

This one is pretty basic, and basically, the same sort of thing you see on Facebook trying to steal your information.  I promise I’m not trying to steal your data though.  It takes some simple inputs, the city you were born in, your pet’s name, and outputs them as a combination for a “Band Name”.  You can really put anything you want into these fields and it will just combine them.  Maybe a fun alternative would be to use the letters of the words entered, to pull a different word from a list or something.  

Project 2 – Tip Calculator

This one is mildly more complicated than the Band Name Calculator, but it is still just input fields, but this time, with MATH!  You enter the total bill, how much to tip, how many people ate, and it splits the bill among the various people evenly.  Be sure you order the steak dinner, so your salad eating friends can foot part of the bill for your expensive steak.

Project 3 – Treasure Island Game

I really enjoyed this one, maybe a little TOO much.  The core project is just practice on “if, elif, else” statements.  You build a little “choose your own adventure” game.  Like a very simple Zork Game.  I kind of just kept getting goofier and goofier with the descriptions though.  I guess that’s the “writer” part of my personality or something.

Project 4 – Rock Paper Scissors Game

Pretty straight forward, and mostly a practice for random numbers.  A rock paper scissors game.  One part I like about this game though it that it introduces the idea of using ascii graphics.  I mean, the core idea is simple, but in all the various online coding classes I have done, none have done this sort of thing.  It really helps these little projects feel way less mundane.

Project 5 – Password Generator

More random number practice, this one actually is probably the most actually useful project so far.  You enter how many letters, numbers, and symbols you want for a password and well, it generates it, randomly.  Especially useful because strong passwords are good to have.  Though using a random password, you probably will want a password keeper, and well, most of those include a random password generator.

Project 6 – Escaping the Maze

Project 6 was a little different, since it wasn’t strictly writing pure code, but instead was using a site called Reborg’s World.  https://www.reeborg.ca/index_en.html This site has a few puzzles where you use functions to navigate a little robot through some challenges.  It’s purpose is to help the learner get better at logic puzzles mostly.  It was interesting and I’ve made a note to go back and do the rest of the puzzles at some point.

Project 7 – Hangman Game

Hey, another game.  This one is honestly, pretty full featured, at least for what it’s supposed to be.  It’s still an ASCII based CLI game but it works like Hangman.  Guess the letters, your little man slowly gets hung.  Better save him.

Project 8 – Caesar Cypher

This round is essentially just a sort of “intro to cryptography,” that I feel like may come back around in a later lesson.   This program builds a basic Caesar Cypher.  You enter some text, pick a Cypher, then it just rotates each letter by the number of letters equal to the Cypher number.  It also lets you decode the messages.

Project 9 – Secret Auction

This one was kind of neat, though not overly complicated.  Essentially, you enter your name, place a bit, if there are more people, they do the same, then it announces who the highest bidder is.  My main frustration from this project.  It’s suggested you clear the screen between bidders.   Simple enough.  but it turns out that Python doesn’t really have a built in clear function.  The samples run on the Replit website, and you have to import a special library from Replit.  This doesn’t work in my local VS Code interpreter.  I looked into this, I could write a custom include for it, but it would only work in Windows, or Mac, or Linux, not all three.  Because it’s essentially a command specific to each OS’s respective shells.

Super annoying.

Project 10 – Calculator

This would almost be really cool, if it were actually clickable and not just something you type numbers and operators into.  Still, it’s one of my favorites I think in the end, because it introduces a really interesting and neat concept involving Dictionaries.  Specifically, you have a dictionary

operations = {
    "+": add,
    "-": sub,
    "*": mult,
    "/": div,
}

You can use the key, to assign it’s value to a variable, then call that variable, as the name of a function.  The names inside the dictionary all match the names of functions.  Really slick.  Feels like it reduces code readability quite a lot though.

Project 11 – Blackjack Game

This project was alright to do but it’s a little disappointing because it fakes the cards but just using value.  It doesn’t handle Aces properly, it doesn’t handle splits or anything.   The one mistake I made, that I fixed, originally, I had set it up so the Dealer would always “Hit” of it was below the player score.  Except the dealer wouldn’t know what cards the Player had to know the player score.

Also there is not actual betting, which could make things more interesting I suppose.

Project 12 – Number Guesser

This one almost feels a bit like a filler project, since it’s a pretty straight forward if else level project.  I will also add that the hard mode gives you 5 guesses, which feels extremely low.  The logical method is halving things, so 100 -> 50 -> 25 -> 12 -> 6, so you essentially have a 1/6 chance of getting the answer.\

Project 13 – Debugging

No actual projects on day 13, so maybe it’s not actually “100 Projects in 100 days”. except several of the days have what are two to three projects, so I am sure it makes it up somewhere.   Day 13 was revisiting 3 old projects, using different methods of debugging the code provided.

Project 14 – Higher Lower Game

This game was presented as if it’s super commonly know but I had never heard of it.  It’s almost a variation of the whole Facemash/Hot or Not Idea, but with less focus on looks.  Int he example shown from the web, it uses number of Google Results for a topic, in this version it uses number of Instagram Followers.   The user is presented with two celebrities, and they have to guess which has more Instagram Followers.

It’s all pulled from pre built data, so it’s not current follower counts.   I could see this making a return when the course gets into the Web Scraping portions.

And that’s the end of the “Beginner Section”.  The course has several sections, Beginner, Intermediate, Intermediate +, some Web training, Advanced, and Professional. Most of what I’ve done so far is not really anything new.  i could have worked most of these out.  I’ve actually been spicing them up a bit with my own bits to make things more robust, like most of the inputs have some level of input check to ake sure it’s value.  Like is it a number, or did the user enter “t” or “T” or “True” or “true”.

I’m more looking forward to the next sections as it gets into GUI style training.

Learning Python with Udacity

udacity_cs101

Just a note, this is not any sort of advertisement…

So I know some basic programming syntax, generally centered around C and C++ which I learned in college.  The C was through several Engineering based classes and the C++ was from a single Computer Science course I took when I had a semester to fill before transferring schools and didn’t want to completely lapse on the studying, schooling lifestyle.  I also know how to code HTML but that is barely programming by any stretch. 

I have tried various self taught methods to teach myself more C++ and some Java with little success.  I have some books to make Android apps but I have yet to get anywhere with them.  Then, I believe through the Windows Weekly podcast, I found out about this deal called Udacity. The first course offering is to learn how to code a basic Search Engine using Python.  I’ve found it pretty well designed though a handful of the examples were a little too abstract to be meaningful (I’m looking at the one about cost and RAM and memory and compute cycles which I still don’t understand).

Anyway, I’m done three out of the seven modules and I’m rather proud of the fact that I’ve actually managed to stick with it and learn some things.  I’ve got a little script now that I could use to extract links from any webpage or even a number of webpages, though right now all I know how to do is display them.  Presumably we’ll learn how to compile them into some sort of file or database.  My biggest hurdle really is I keep wanting to use C and C++ syntax.  Things like adding ; at the end of lines or 1++ or variable++.

It’s not a terrible problem really.