The following is a quick and dirty way of pulling a lot of URLs out of a given pages source code, using two commands in vim, my new favourite text editor. So, right to the point!
Try it out right now on the source code of Imgur’s /r/ScarlettJohansson’s page
Right off the bat, I want to show you the results of this scraping, to give you a bit of motivation. Anyways, thanks to requests and BeautifulSoup, this is made trivially easy. Enough talking, let’s get down to the code! Don’t forget that as usual I’ll include the full source code at the bottom of the post.
The data is pulled from the Steam Search page using the 68 pages of HTML to scrape the data from. I’d post it all here, but you can’t really work with it in your applications, so I’ve uploaded it to TinyPaste and PasteBin for your use.
There is a title, metascore, release date, genre, and price.
Here’s the link to it here, and an image of what it looks like in action:
What’s up? It’s been a little while since I’ve updated again, but that’s because I’ve been busy with the latest project. I’ve been working on a GUI system, using XNA only so far, and it’s been an interesting experience. I’ll include some screen shots, but remember it’s super basic.