Scraping Data with Python and XPath

I decided to write a short post about how I use Python and XPath to extract web content. I do this often to build research data sets. This post was inspired by another blog post: Luciano Mammino – Extracting data from Wikipedia using curl, grep, cut and other shell commands.

Where Luciano uses a bunch of Linux command line tools to extract data from Wikipedia, I thought I’d demonstrate pulling the same data using Python and XPath. Once I discovered using XPath in Python, my online data collection for research became a whole lot easier! Continue reading

Analysis – Email list integrity, 96% of organisations handle my email address appropriately

How are businesses and organisations handling your email? I know how they’re handling mine!

For about 10 years I’ve used “burnable email addresses”. These are email addresses that I can use and expire. They are unique to every relationship between me and another organisation, business or blog that I register with. This means I know who’s got my email and if they’ve leaked it. I know if they’ve shared if or if they’re spamming it.

I guess that makes me a living honeypot? But, unlike many automated honeypots that try to trap malicious users, the data from my email servers are based on real-world interactions between myself and others. Continue reading