Scraping Data with Python and XPath

I decided to write a short post about how I use Python and XPath to extract web content. I do this often to build research data sets. This post was inspired by another blog post: Luciano Mammino – Extracting data from Wikipedia using curl, grep, cut and other shell commands.

Where Luciano uses a bunch of Linux command line tools to extract data from Wikipedia, I thought I’d demonstrate pulling the same data using Python and XPath. Once I discovered using XPath in Python, my online data collection for research became a whole lot easier! Continue reading