This is one of those quick and kind of dirty projects I’ve been meaning to do for a while. Basically, I wanted a script that would scrape all of the top level comments from a Reddit post and push them out to a list. Most commonly, to use on /r/AskReddit style threads like, well, for this example, “What is a song from the 90s that young people should listen to.”
Basically, threads that ask for useful opinions on list. Sometimes it’s lists of websites or something. Often it’s music. The script here is made for music but could be adjusted for any thread. Here is the script, I’ll touch on it a bit in more detail after.
## Create an APP for Secrets here: ## https://www.reddit.com/prefs/apps import praw ## Thread to scrape goes here, replace the one below url = "https://www.reddit.com/r/Music/comments/10c4ki0/name_one_90s_song_kids_born_after_2000_should_add/" ## Fill in API Information here reddit = praw.Reddit( client_id="", client_secret= "", user_agent= "script by u/", # Your Username, not really required though redirect_uri= "http://localhost:8080", ) submission = reddit.submission(url=url) submission.comments.replace_more(limit=0) submission.comment_limit = 1 for x in submission.comments: with open("output.txt", mode="a", encoding="UTF-8") as file: if "-" in x.body: file.write(str(x.body)+"\n") # print(x.body)
The script uses praw, Python Reddit API Wrapper. A Library made for use in Python and the Reddit API. It requires free keys which can be gotten here: https://www.reddit.com/prefs/apps. Just create an app, the Client ID is a jumble of letters under the name, the secret is labeled. User Agent can be whatever really, but it’s meant to be informative.
The thread URL also needs filled in.
The script then pulls the thread data and pulls the top level comments.
I’m interested in text file lists mostly, though for the sake of music based lists, if I used Spotify, I might combine it with the Spotify Playlist maker from my 100 Days of Python course. Like I said before though, this script is made for pulling music suggestions, with this but of code:
if "-" in x.body: file.write(str(x.body)+"\n") # print(x.body)
It’s simple, but if the comment contains a dash, as in “Taylor Swift – Shake it Off” or “ACDC – Back in Black”, it writes it to the file. Otherwise it discards it. There is a chance it means discarding some submissions, but this isn’t precision work so I’m OK with that to filter out the chaff. If I were looking for URLs or something, I might look for “http” in the comment. I could also eliminate the “if” statement and just have it write all the comments to a file.