The facts are the focus

factchecking stop fakeFact-checking journalism is on the rise. The number of fact-checking websites has been growing for over a decade nowA recent study from the Duke University Reporters’ Lab has identified almost 60 fact-checking groups globally, with a quarter of them based in the US. Some pop up as part of media organisations like the Washington Post’s Fact Checker passing judgements in Pinocchios or the Pulitzer Prize-winning PolitiFact at the Tampa Bay Times. But more interestingly, some spring to life prompted by the need to verify specific events and developing stories.

Take Ukrainian StopFake – it verifies information, and refutes distorted information and propaganda about events in Ukraine covered by the media. Available in English and Russian, it has build an amazing audience of over 63,000 in only three months. And it’s growing.

Mainly, because there is a pressing demand to verify photos posted on social media, which go viral quickly retweeted by both sides of the conflict who feel strongly about the cause they are fighting for, but do not necessarily reflect the real situation in the region. This photo of the Sloviansk morgue – published by a Russian news outlet – had been taken five years ago by an Associated Press photographer in Mexico. And here is another fake, dating back to 1989 and Tienanmen Square events in China.

Unlike PolitiFakt that debunks mainly domestic news, StopFake tests the accuracy of content from another country aside to its own Ukrainian media and politicians. It handles both official reports and rumours. It was launched  by alumni and students of Mohyla School of Journalism, later joined by  journalists, marketing specialists, programmers, translators and other volunteers.

Another interesting example of a fact-checking group forming for the purpose of monitoring developments in an event is FactCheckEU – Europe’s first crowd-checking platform. It was established not only as a watchdog holding European politicians to account in the run-up to the elections in May, but also in a bid to spread awareness of EU politics.

FactCheckEU verified statements made by leading figures on the EU’s political scene and even some EU institutions like the Commission, which was caught misquoting some figures. The team live-checked (and live-tweeted) some of the debates between the EC Presidency candidates and partnered up with other think tanks and institutes across the continent.

And then there is Africa Check which covers the whole content promoting accuracy in public debate and the media. Run by a small team of three core staff, it produces reports for free republication and offers interviews with researchers. What I personally find most interesting is their guides and fact-sheets, like this one detailing the abduction of Nigerian girls by Boko Haram.

Interestingly, all three StopFake, FactCheckEU and Africa Check share a crowd-sourcing element. They appeal to their readers for submitting statements which lend themselves to verification. This allows Africa Check to be run by three staffers only, and FactCheckEU to make sure politicians from more EU countries feature on the website than the languages mastered by the team. All this while the audiences feel part of the process.

The raise of fact-checking journalism has been clearly marked in the media landscape over the past couple of years. Some see it as a symptom of failing media, others as popular highlight of mainstream politics. Is it simplified journalism? Or is it the the root of journalism – a starting point for all reporters?

With sites such as Faktomat in Germany, Les Decodeurs in France, Chequeado in Argentina, it is difficult to deny that fact-checking offers an interesting insight into detailed political discourse. It obviously has its flaws but all the sites mentioned above believe it adds to the debate rather than impoverish it. And it expands quicker than you think. Poyner Institute is to hold the first global fact-checking summit in London this June organised by Bill Adair from the Duke University. And soon there will even be a browser plug-in to ‘automatically fact check’ articles, which will tell you how accurate the information on the site is. I am looking forward to seeing further developments in the field.

(Disclaimer: I have been part of the team behind FactCheckEU)

Russian speakers in Ukraine – the media’s take

As tensions escalated in Ukraine, more and more media organisations took to data visualisation to convey the actual state of affairs in the country. One of the arguments in the dispute over Crimea was the large number of people inhabiting the region for whom Russian is the native language. Here’s how different media organisation pictured it:

The CNN:

Russian speakers in Ukraine

The New York Times:

Russian speakers in Ukraine

The Guardian:

Russian speakers in Ukraine

Non-media outlet map (the only interactive one, with a pop-out info-window):

Russian speakers in Ukraine(This map serves only as a reference point for my comparison.)

Legends and Scales

Clearly, the maps differ significantly between themselves, mainly because of the different number of colour buckets and colour ramps they use. The Guardian opted for the simplest division into “Predominantly Russian-/Ukrainian-speaking” which maybe gives a clear picture of the most general trends, but it does not allow a more nuanced insights into the situation. Especially when compared to the interactive map, the Guardian’s one seems over-simplified. Upon exploring the detailed information in the pop-out window, it’s easy to spot that the territory corresponding to the the Guardian’s blue field does not have more than 10% of Russian speakers per region (with the exception of Sumska Oblast in the north). The distribution is therefore relatively even across the regions and legitimises using just one colour for this part of Ukraine. However, in the territory corresponding to the Guardian’s yellow field, only three regions have as much as 68-77% of Russian speakers, three have a +/- 50% share of them and further three a bracket of only 25-33%. This clearly shows that the Guardian’s take is correct but not fully accurately reflecting the linguistic divisions in the country.

The CNN’s map is already a bit better in this respect but the choice of scale seems puzzling, with the bright red colour encompassing a rather broad bracket of 25-74% of Russian speakers. Adding another bucket (to make it 25-50% and 51-74%) would probably make the picture clearer and at the same time add the predominance factor to the map.

The NYT map on the other hand does not picture the whole of Ukraine, only its easternmost regions, where the tensions escalated in an especially violent way. What’s more the map is not explicitly showcasing the language division – it rather gives it as context to where the clashes took place. The scale it uses is definitely the most accurate one of all, but definitely not most readable (esp. against a greyish background, which sometimes intensifies the shades of blue). But all in all, it is enough to give the reader the general picture of the situation and therefore fulfils its function of background information.

Colours

In terms of colour ramps used, my personal preference is with the NYT, because the choice of sequential single colour scale is most suited to ordered data that progress from low to high.The same applies to the CNN map, which may even be considered better because the problem of colour opacity is not at issue.

The Guardian’s choice of Ukraine’s national colours is a nice pick, too. The choice of divergent scheme (two colours) emphasises the extreme ends of the scale (in this case the only two categories: Ukrainian/Russian) but – as mentioned above it fails to convey a more detailed picture.

Data

One more important aspect to touch on is also the source of data used for the visualisation. Unfortunately, only the CNN provided it for their map:

Publication Data source
CNN  2001 Ukraine Census
 NYT not given
 the Guardian  based on Washington Post map
 StoryMap/Esri  not given

However, it can be suspected that all of the media outlets based their stories on the 2001 census, since it is the only official data source available at this point of time. The problem with the data is that it’s 13 years old, and the current situation might be a far cry from what it was in the beginning of the 2000s. The question is: can therefore the language division be an argument in the case of the conflict in Ukraine?

Conclusion

It is difficult to assess which map did best. As mentioned above, each of them has some shortcomings and it should therefore be the purpose with which they have been created that decides about their usefulness for the reader. I think the most accurate one would be the StoryMaps/Esri one, but I do not particularly like the scale it adopted (accurate as it is). I think a clear milestones (e.g. <10%, 25%, 50%, 75% and >90%) would do a better job.

Hint: If you need a hand in choosing colours for your maps, check out Colorbrewer – it’s a nice little tool to solve all your shades and hues problems.

Hint 2: If you want to learn more about mapping, check out tutorials I listed here.

 

Interview with Kiln’s Duncan Clark

 

Duncan Clark kiln.it

Photo: kiln.it

Kiln is a design studio specialising in data visualisation, digital storytelling, maps and animation. It was founded and is run by Duncan Clark and Robin Houston, creators behind such projects as Women’s Rights or In flight for the Guardian. In this short interview Duncan Clark talks about how they go about their projects.

How do you choose what subjects to cover in your visualisations?

It’s a mix. Sometimes we have an idea that we know we want to pursue; sometimes the Guardian or another client will approach us with an idea.

What is key for you in the process of designing information?

One golden rule is to let the information speak for itself. There’s no point making a pretty visualisation if it doesn’t make the data clearer to understand and easier to interrogate.

What is your favourite project that kiln.it worked on so far and why? What do you think makes it interesting for people to explore?

In flight” is certainly the most ambitious thing we’ve done so far and possibly my favourite. I like that fact that almost everyone says “wow” at seeing the sheer number of planes that have flown through the air in the last 24 hours. But I also think it’s interesting as an experiment in combining different approaches to storytelling: it takes elements from documentary making, data visualisation, radio production, live mapping and tries to combine them into a coherent whole.in flight kiln.it multimedia

What’s your work process? How much leeway do you have in your work? Do you get precise instructions for your projects or do you only accept broadly defined commissions?

It varies. Sometimes the starting point of a commission is just a broad subject area; at other times a client might have a very specific visualisation technique in mind from the outset. Most commonly, though, we’re given a dataset and asked to work out how best to turn it into something compelling.

What advice would you give to a budding data journalist?

It depends what kind of data journalist you want to be. If you’re mainly interested in breaking stories, then getting acquainted with how to get unexplored data via Freedom of Information requests might be a good idea. If you’re more interested in interactives and visualisations then learning to code can’t hurt: access to good developers is always a bottleneck for journalists, so being able to do at least some of the coding yourself is a huge advantage. Try getting started with a free HTML, CSS and JavaScript course at Codecademy.kilnit logo

Do you need to be a data scientist to work in data-driven journalism? Interview with import.io’s Beatrice Schofield

Beatrice Schofield interview head of data intelligence import io

Picture: import.io

Do you need to be a data scientist to work in data journalism? What is the difference between data analysis and data science? Beatrice Schofield, Head of Data Intelligence at import.io debunks the data science myth.

What does your job as Head of Data Intelligence at import.io consist in?

On a day to day basis I think about new things we can do with data and how to engage new areas whereby people who are not technically trained can start using open data for their fields of research. I also work on news cases by approaching NGOs and data journalists with ideas for stories with data sets. A lot of it is content-driven. It is exploring open data, how to better use it, extract it from sites and build data sets – much of it has traditionally been the realm of people who can programme. I make sure we get the data and give it to people who would be interested to use it but have previously been unable to because they are lack this skill and are not data scientists.

Do you approach journalists or media organisations?

It depends. If there is something big coming up like the budget, I quickly build an extractor which for instance allows us to get the data off the BBC on a minute-by-minute update, which within an hour we give to the Guardian. They then inform their sentiment analysis whereby they could read what was happening. We often take a pro-active approach. We are responsive: when Nelson Mandela died and the Guardian wanted data quickly, we could respond by predetermining what data might be interesting at that time and providing it to journalists.

Who has import.io cooperated with thus far?

We provided data for the Financial Times, the Guardian, the New York Times. The big data story that has recently made the news is Oxfam’s analysis which shows that five richest families in the UK have more money than the poorest 20%. We worked with Oxfam to get this data before it became a media sensation. We’re after pre-determining things like this as well.

Do you hack to get data?

I am not technically trained so I do all my scraping via our tool. On an analytical level, I rely solely on this to get large amount of data and give it to people in whatever form, so there is no need for me to concentrate any attention on developing skills which aren’t necessary with the tool we’ve got.

What kind of skill set do you use in your work?

I have been doing data analysis since university in different roles. But Excel is where it begins and ends. A lot of it is qualitative and quantitative research because much of my work is content-driven. And on a day to day basis I am very much operating as any other data analysts without the need to delve into the realms of data science. It’s pretty much beyond me.

What would you recommend that a trainee data journalist learn in terms of software and skill?

From my perspective it is important to have written something before and being on the sharp edge of data analysis. Data journalism is now a fundamental part of journalism and you can’t be a journalist without being data-savvy. In terms of developing the right skill set, I don’t think it is necessary to be a good programmer. I think you can focus on other areas. Tools are now here, like import.io. to access the data, Tableau to visualise it and all that is left is analysis and seeing where the stories are. This is what data journalism is about. Being quite academic, realising where the holes in the data are, seeing how the bias is created by certain data sets. Because there is a tendency for people to see data as fact and not as a socially constructed set of numbers or letters. It is important to be very critical with what we are being presented with and looking at what is missing as opposed to just what is there.

I certainly think that with data journalism moving forward, you have to have the ability to engage wholly with the amount of data that there is on the web, and have the ability to look into it and see what you can do. Because at the moment we are still – for various reasons – only looking at a tiny section of what’s available. It is key to think imaginatively and creatively about how we can build data sets over time and to focus your skills qualitatively and quantitatively as opposed to focusing all our attention on being a good programmer when it’s no longer the time to be it. There are now tools that allow you to have data sets and spend time focusing on stories.

Is statistical knowledge key, then?

Mostly for journalist’s own time management. No one wants to spend a lot of time in untidy spreadsheets, cleaning data sets and thinking: “This is a bore”. To be able to do the analysis, you can spot trends and patterns and have insights early on but in terms of advanced statistical knowledge, I don’t think it’s necessary. I don’t have it myself. Data science is pretty much a fashion statement now.

You mentioned before a line that should be drawn between a data scientist and a data analyst. Where does it lie?

Where I believe the split lies in the technical skill set. Data scientists traditionally write a lot of script, are able to do mining on huge data sets using scripts. While I see a data analyst as being able to perform the same analysis as a data scientist without having the programming skills and science degrees under their belts. But the two come from the same realm.

Do you think newsrooms will start employing data scientists?

I don’t think they can afford them. A data analyst could easily perform the same job by using freely available tools as opposed to using their own technical know-how. In terms of mining large data sets, it can be a collaborative work of scientists and analysts, but not in terms of assistance to data journalism, which is spotting what you what to see in the stories as opposed to delivering a very methodical, technical approach. I think we are now developing tools that might almost push data scientist to the side.

What would be a prerequisite for becoming a data analyst?

You need to be quantitatively trained in some sense. It doesn’t need to be a degree. For instance, social sciences usually require a quantitive approach. Personally, I have learnt a lot about data analysis while being on the job. You can’t really set aside a certain skill set. Obviously there are certain skills like Excel that are needed to advance but beyond that, analysis can be done at a very qualitative level as well. And then you back it up with figures.

You have told me about your 6-month long project of monitoring alcohol prices on Tesco website. What happens when such a time-consuming undertaking does not yield results you expected?

That’s the nature of it. What you presume might happen might not always happen and your assumptions might be wrong. But with tools like import.io you can run a couple of projects at the same time, so it’s not as if you’re banking on one data set to provide you with the story that you want.

How do you go about generating an initial idea for a project?

I approach my work with an inquisitive approach. I wonder “what could you find out from that?”. Sometimes I don’t start with a pre-determined outcome, but just with creating databases over time and at some point a story is bound to come out of one of them. It’s just all about being imaginative.

And I am having a lot of fun with it. I know data analysis is considered a bit of a dull area but then if you draw the content out of it, you can make it fun. We have been looking at Dulux colours and names of paints because they are absolutely ridiculous and we made a game that pulls the names apart, for example “pomegranate champagne”. Previously we made a game which made people guess which newspaper said which headline. You just need to be creative with it.

I think the Guardian did well. They were the first ones to really push it to the front and say they are very much a data-driven newspaper. But it can be anyone who has the ability to see something unique in data, to bring different insight, different experience and apply it in the data set. This is what I believe sets people apart: the ability to communicate well through visualisation and good analysis and seeing possibilities in data.

Data journalism sits on the split between sciences and humanities – it relies on both to be able to be performed well. It does not require heaviness in the scientific field. It requires intuitive questioning and thinking about external factors that come from humanities.

 

Hint: If you want to learn about data and visualisation, check out my list of best tutorials here.

Best tutorials for data journalists

I compiled a round-up of video tutorials and webinars which I found most useful during the last couple of months of my training to become a data journalist.

Data scraping

A series of webinars by Alex Gimson from import.io on:

  1. Auto table extraction
  2. Building a data crawler
  3. Getting data behind passwords
  4. Datasets

And good news – there will be more! Watch this space: http://blog.import.io/

Data visualisation

A series of webinars by Jewel Loree from Tableau on:

  1. Basic Tableau Proficiency 
  2. Actions, Filters and Parameters in Tableau Public
  3. Data Formatting, joins, blends, and table calculations

Two more to come, stay tuned on Tableau Software YouTube channel.

Mapping

A tutotial by Andrew Hill on using CartoDB for mapping:

Online mapping for beginners

Two Google Fusion Tables tutorials which will teach you how to make:

  1. a point map
  2. a polygon map

and here come two webinars you can still take part in:

Obviously, the list is not exhaustive and you would need to do some more reading around the content of the tutorials. Blogs run by the people behind the software should be very helpful in getting more insight into the particular problems you might encounter on the way.

Gerard Baker: 5 pillars of business journalism in the digital age

The lecture room at City University London filled up quickly when Gerard Baker, editor-in-chief of Dow Jones, came to give a talk on the five pillars of business journalism in the digital age.

Baker, who is also the managing editor of the Wall Street Journal, discussed the important shift that media organisations need to make if their online publishing strategies are to survive the fast-growing market of specialised business publications.

Gerard-Baker-Dowjones-WSJ

Credit: Rèmi Steinegger

Bringing up this year’s 125th anniversary of the WSJ, he reflected on the fact that business journalism has come a long way from being a side project to being a thriving and sought-after news product. The boom started in the 1980s and 1990s, when the end of the Cold War and the ensuing explosion of financial assets boosted readers’ interest in understanding the economic events.

But for Baker, the “golden age for business journalism” is now.  He cited a plethora of recently launched, digitally native business publications, such as QuartzBusiness Insider and BuzzFeed Business, which thrive on the demand society has for business-oriented news.

He identified five pillars of WSJ’s digital strategy, which he considers crucial for every business publication to fulfil this demand:

  1. Genuinely embrace the digital revolution and change the culture of news organisations. “We have to fundamentally rethink our product and reshape it for the digital age,” said Baker. “Taking the newspaper approach and sticking it into the digital format is not viable.”

  2. Preserve and strengthen independence, and resist the temptation of relying on business organisations, even in the face of declining ad revenues.

  3. Maintain the right balance, since an open society needs a thriving press to hold both the public and corporate sectors to account. Journalistic ethics and standard are therefore of utmost importance in keeping reporters and companies’ staff apart.

  4. Invest in the high degree of specialisation. “The reality of the digital age is niche content and there are many business opportunities in offering deep insights into specific areas of coverage,” said Baker.

  5. Seize the opportunity to become a genuinely global news organisation. Baker believes this is necessary because of the economic and financial interdependence. “The only business journalism that will survive will cover a global outlook,” he said.

“Quality journalism is still rare in the world,” he added, pointing out that as emerging economies arrive on the world stage, news organisations such as the Wall Street Journal can use the opportunity to fill the gap.

With its total combined circulation of 2.3 million copies, the WSJ is the largest paid-for title in the US. It is also the only title that decided to charge readers for online news from the beginning of its digital existence, and bets on the subscription model to help it succeed in the future.

For Baker, the current opportunities alongside difficult financial circumstances mean that there has never been a greater hunger nor greater audience for business news.

“We can meet this desire,” he said. “I’m highly optimistic about what we can do.”

How to make a data visualisation with Infogr.am

organic market report 2014Infogr.am is a free online tool that helps you make quick and beautiful interactive data visualisations like the one I prepared for my online journalism blog. Its interface is intuitive and user-friendly, and majority of tools is drag-and-drop, which makes Infogr.am so easy to operate.

The first step is to choose your data and plan on what you want to present in your interactive visualisation. I opted for a data set from the Organic Market Report 2014 compiled by the Soil Association (available on demand).

Once you sign up to Infogr.am and start your creative process, you are invited to choose one of the ready-made templates:

infogram

Choose a colour palate that you want to go for and click “Use design”. A dashboard with editable elements appears to which you can add a chart, a map, a text, a photo or a video from the menu on the right.

infogram

Double-click on each element to edit it: change text or open a chart menu. First, give a title to your visualisation. Edit the existing chart or add a new one – make sure you choose the right type of chart for the type of data you have. Double-click on the chart. An Excel-like spreadsheet appears where you can paste your data:

infogramAfter the final tweaks to your data, go to the second tab “Settings”. Depending on the chart you chose, you will find here different editing options: colours, directions, chart’s size and other.

infogram

Pay close attention to how you manipulate your chart. It is important that it present the data in a clear and easily understandable way.

After you have finished adjusting your chart, click “Done” and go on to add more elements to your visualisation.

Infogr.am is a a great tool especially for beginners in data-driven journalism, yet it has a couple of major limitations:

  1. It is impossible to copy-paste text to and from text boxes, which makes typing time-consuming and rather laborious.
  2. As you manipulate the data in the Excel-like spreadsheet, the preview of the chart is unavailable, which makes you save and re-edit the chart a couple of times before you achieve the effect you want.
  3. It would be useful to be able to caption the charts directly, as opposed to having to add chart titles and captions as separate elements to your visualisation.