16 Apr 2015

The challenges of using data to hold power to account

Recently I investigated how much money MPs stepping down in the 2015 general election are due to make on their second homes for which they claimed expenses.

They made this money through both the expenses they claimed on their second homes, which can be extracted from Parliament.uk, and the increase in value of these properties since they took office.

I used a variety of tools to source my data. OutWit Hub was used to scrape data on how much each MP claimed in expenses for their second homes between 2005-12.

OutWib Hub involved processing several lists of URLs - each including information on specific claims, broken down by year, on each MPs’ page (e.g. David Willetts’ claims for mortgage repayments in 2009). The data is stored in particular <div> tags with simple classes, meaning it was relatively easy to scrape with Outwit.

SpraperWiki was also used, as data for earlier dates was stored in PDFs - there are a variety of similar PDF scrapers online. It is an easy process, but it can be a repetitive, time-eating exercise.

The drawbacks of estimations

After finding the area and type of property for each MPs' second home, eMoov provided estimations for the property values in 2005 and 2015.

This was a part of the process that required more thought. We don't have specific addresses for MPs (and even if we did, there would be data protection questions to answer), and so we have to rely on reports of their estimated London residences for Westminster. These are publicly available, but needed to be cross-referenced.

Then there's the estimations. EMoov could provide average property prices for the areas we supplied, but these were never going to be entirely accurate for each MPs' specific home. All they could do is provide an estimation of the amount of profit made by each MP.

In the end, it was still obvious that the increases were huge, especially when considering the amount the MPs had claimed for the properties - so we continued and made sure to communicate to the readers the methodology of our property price estimations. There was no alternative, as the story had to be told.

Even when it's about powerful people, it is important to trust the data, from official sources, that you have cross-referenced and fact-checked. Then you contact the people or organisations you are writing about and contact the experts who can verify and comment on the story - all the time making sure you explain your methodology.


Screenshot of scraping data with OutWitHub

What to do in the face of denial?

I collated the data in Google Sheets, using cleaning techniques to wrangle the messy, scraped data from 30 different sheets into the right format. As my data was obtained from a variety of search-, scrape- and contact-based sources, there was a fair bit of cleaning to do. 

As I updated the sheet, after calls with MPs, I colour-coded the sheet for the rest of the City AM team: green for confirmed second homes; pink for those who now rent; and red for no reply.

Once I had all of the relevant data, I called each MP’s office to confirm the information was correct – to which I got mixed, and some hostile, responses. As expected, most offices did not want to talk about the story.

This was a big challenge in the process. Although the spreadsheet proved that MPs had indeed made lots of money from their taxpayer-subsidised second homes, what could we do if we couldn't talk to them about it?

It once again became apparent that we had to trust the data. They are facts, published on the official Parliament website, and estimations of average property prices in the area we knew they lived (in the context of spiralling London prices). We were confident that the numbers showed that these MPs had made lots of money on these homes, for which they had claimed taxpayers' money.

This meant that negative responses were easier to deal with. We gave them all a right to reply to the numbers, and then contacted other relevant authorities and experts.

Screenshot of the analysis process on Google Sheets

Presenting the data

We produced an interactive to help visually enhance the story. This was important, as the story – however great its ability to outrage – could be quite tedious with its list of inaccessible numbers and tax explanations. So a bar chart and a map were used to help communicate the story to the reader.

The map helped interactive exploration of the data, producing a personal, geographical connection with readers. While it did not visualise a pattern across the country, it did communicate the data in a way where people could find their nearest MP. The bar chart then allowed people to easily compare the 30 MPs altogether.

0 comments:

Post a Comment