5 things that raised my eyebrow this week

  1. The claim that “25 countries are now involved in the vast search of the Malaysian airliner MH370″ is largely understated. About 3 million volunteers from around the world pore over satellite imagery in an effort to find any trace of the plane. Together they went through 24,000 square kilometers (9,300 square miles) of high-resolution imagery.mh370 search
  2. A London-based artist used Open Street Map data to generate images of cities at night because those taken from space were too blurry. These are stunning.cities at night
  3. What is the ROI on learning a foreign language? The Economist calculated that over a lifetime it can be even $128,000, but only if you study German.economist language calculator
  4. Now, did you know the Twitter bird’s name is Larry? Mashable compiled a list of famous tech mascots and stories behind their creation.tech mascots
  5. Have you played some origami recently? Good, you are now ready to fold a working (!) paper microscope. This hands-on science experiment could revolutionise healthcare in developing countries.

5 things that surprised me this week

  1. Associated Press reported that pilots often land at the wrong airport when several are close together. There has been 150 such flights since the early 1990s.plane
  2. You’ll be surprised to know that in Afghanistan there exists a “jihad museum”. It tells the story of a very particular chapter in the history of Afghanistan, namely the Soviet invasion in the 1980s. Watch out, it’s rather grisly.jihad museum
  3. London’s Euston is getting a redesign. Following the shiny example of King’s Cross, the neighbourhood is to become a lively quarter with shops, offices and new homes up for grabs. Oh, and a Euston Arch.euston redesign
  4. Norway has unveiled the designs for a memorial to remember the victims of a worst mass shooting in modern history that killed 77 people in Oslo and Utoya in 2011. It’s nothing short of powerful.norway shooting memorial
  5. Following the announcement of his campaign in Rome, Martin Schulz’s twitter bio sneakily changed from “the President of the European Parliament” to the “PES candidate for President of the EU Commission”, thus making all his followers – willingly or not  – endorse his candidature (In case you wondered, the EP Presidency got a brand new account). Where’s fair play?martin schulz

Reflections on my first hackathon

buildthenewslogoOn the weekend of 22/23 February, I took part in Build the News – the first ever coding event organised by The Times and The Sunday Times. This two-day “hackathon” brought together 10 teams of student journalists and developers who were tasked with creating a digital journalism product in one of the four categories: 

  • Stretch (or how to develop long form journalism across multiple devices),
  • Crowd (or how to develop a tool or platform that allows effective campaigning),
  • Tactile (or re-thinking Sunday paper reading/sharing experience),
  • Noise (or how to facilitate finding the details and people that matter around big events and moments).

It was the first time I got a chance to participate in a hackathon and surely not the last one. Inspiring in many ways, it was an eye-opener to how solutions are born to problems you face every day as a journalist.


Photo: MattieTK (Flickr)

So why hack?

Firstly because you stand a chance to tackle the knottiest problems you have always wanted solved. Who said you can’t be the first one to come up with a solution? This way you serve yourself and others, and learn loads in the process, simply by picking others’ brains.

Secondly, because only seated in front of a task you can unleash the creativity you never suspected yourself of. You are given 48h to brainstorm, build and refine a product. Unlike any other day, you can focus on doing just this so all your brainpower gets narrowed down onto this one particular task. It’s imperative to use it well.

And finally, for fun, because teamwork and filling the gaps others have difficulty with is very rewarding.

What to keep in mind?

Short and intense, hackathons are inherently unpredictable. You might prepare, start toying with an idea in your head, and yet end up doing a totally different thing. So don’t get too attached to your pre-conceived no-matter-how-awesome plans. They are likely to change in the meantime.

Set realistic expectations. Think big but make sure you account for what you can achieve in the given time frame without constraining your creativity.

And just because time is so scarce, identify the problem as accurately as possible and define the amount of detail you want to (and can) go into before you find yourself tangled in too many meandrous yet insignificant elements of your project.

Learn from others. Hackathons are usually a unique mix of people who decide to give their time and skill to solve different problems. Developer community, unlike its journalistic counterpart, is very open to sharing and exchanging ideas so make sure you listen up when they start recommending apps, software and tutorial websites. API? GitHub? JS? Find out what they are.

Don’t be afraid of launching your project for public scrutiny. Feedback and self-reflection are key to taking it to another level. Shed all your journalistic assumptions of keeping your idea exclusive and allow others to chip in, even if they pulverise your idea. It’s better to know it’s not working out right up front.

Refuel often. Breaks for meals and nibbles (and fresh air) are crucial to keep your inventiveness going.

Do get a developer for your team. Your idea might be brilliant, yet it won’t see the light of the day until a code-savvy person blesses it as feasible.
Interhacktives-build-the-news If you are interested in the project my friends from MA Interactive Journalism and I developed for the Build the News hackathon, take a look at our tumblr – it documents the whole process.

5 things that made me stop and wonder this week

  1. Here’s a list of companies in which interns earn more than the US median household income. As expected, the list is tech-heavy. An excellent prompt to make you re-think your future career.interns high salary
  2. Orange juice to disappear from your breakfast table. As sales have been dropping almost every year for the last decade, the start-the-day-with treat is on the best way to become a luxury product.orange-juice
  3. Google joined forces with Lego for virtual brick-building, which means you can now play with all sorts of colourful plastic pieces. And it’s limitless. Welcome Build with Chrome – the new form of procrastination at work.

4. Starlings in the skies. These little birds don’t like to fly on their own – instead they flock together in what we call “murmurations”. The Atlantic compiled a beautiful gallery of these formations.starlings

5. And finally, to celebrate 2014 as the year with no leap day in February, meet the “leap second”. Watch Demetrios Matsakis, chief scientist for time services at the US Naval Observatory, explain the concept.


How to scrape data without coding? A step by step tutorial on import.io

import.ioImport.io (pronounced import-eye-oh) lets you scrape data from any website into a searchable database. It is perfect for gathering, aggregating and analysing data from websites without the need for coding skills. As Sally Hadadi told Journalism.co.uk: The idea is to “democratise” data. “We want journalists to get the best information possible to encourage and enhance unique, powerful pieces of work and generally make their research much easier.” Different uses for journalists, supplemented by case studies, can be found here.

After downloading and opening import.io browser, copy the URL of the page you want to scrape into the import.io browser. I decided to scrape the search result website of orphanages in London:

001 Orphanages in London

After opening the website, press the tiny pink button in top right corner of the browser and follow up with “Let’s get cracking!” in the bottom right menu which has just appeared.

Then, choose the type of scraping you want to perform. In my case, it’s a Crawler (we’ll be getting data from multiple similar pages on the same site):crawler

And confirm the URL of the website you want to scrape by clicking “I’m there”.

As advised, choose “Detect optimal settings” and confirm the following:data

In the menu “Rows per page” select the format in which data appears on the website, whether it is “single” or “multiple”. I’m opting for the multiple as my URL is a listing of multiple search results:multiple

Now, the time has come to “train your rows” i.e. mark which part of the website you are interested in scraping. Hover over an entire “entry” or “paragraph”:hover over entry

…and he entry will be highlighted in pink or blue. Press “Train rows”.train rows

Repeat the operation with the next entry/paragraph so that the scraper gets the hang of the pattern of your selections. Two examples should suffice. Scroll down to the bottom of your website to make sure that all entries until the last one are selected (=highlighted in pink or blue alternately).

If it is, press “I’ve got all 50 rows” (the number depends on how many rows you have selected).

Now it’s time to focus on particular chunks of data you would like to extract. My entries consist of a name of the orphanage, address, phone number and a short description so I will extract all those to separate columns. Let’s start by adding a column “name”:add column

Next, highlight the name of the first orphanage in the list and press “Train”.highlighttrain

Your table should automatically fill in with names of all orphanages in the list:table name

If it didn’t, try tweaking your selection a bit. Then add another column “address” and extract address of the orphanage by highlighting the two lines of address and “training” the rows.

Repeat the operation for a “phone number” and “description”. Your table should end up looking like this:table final

*Before passing on to the next column it is worth to check if all rows have filled up. If not, highlighting and training of individual elements might be necessary.

Once you’ve grabbed all that you need, click “I’ve got what I need”. The menu will now ask you if you want to scrape more pages. In this case, the search yielded two pages of search results so I will add another page. In order to this this, go back to your website in your regular browser, choose page 2 (or any next one) of your search results and copy the URL. Paste it into the import.io browser and confirm by clicking “I’m there”:i'm there

The scraper should automatically fill in your table for page 2. Click “I’ve got all 45 rows” and “I’ve got what I needed”.

You need to add at least 5 pages, which is a bit frustrating with a smaller data set like this one. The way around it is to add page 2 a couple of times and delete the unnecessary rows in the final table.

Once the cheating is done, click “I’m done training!” and “Upload to import.io”.upload

Give the name to your Crawler, e.g. “Orphanages in London” and wait for import.io to upload your data. Then, run crawler:run crawler

Make sure that the page depth is 10 and that click “Go”. If you’re scraping a huge dataset with several pages of search results, you can copy your URLs to Excel, highlight them and drag down with a black cross (bottom right of the cell) to obtain a comprehensive list. Paste it into the “Where to start?” window and press “Go”.go

crawlingAfter the crawling is complete, you can download you data in EXCEL, HTML, JSON or CSV.dataset

As a result, we obtain a data set which can be easily turned into a map of orphanages in London.

Do you have any further tips for import.io extraction? Do you know any other good scrapers? Share your thoughts in the comments below.

Hint: If you need to structure and clean your data, here’s how to do it.

In the meantime, look out for another post in which I will explain the next step: how to visualise the data you have.

5 things in tech, language and human nature that caught my eye this week

  1. A restaurant, a swimming pool or a concert hall theater? 16 dormant subway stations in Paris are waiting for re-design. Here‘s how one majoral candidate has imagined them to be.dormant subway station in Paris
  2. Now your smartphone can help cure cancer, and for free. Thanks to its incredible processing power, your phone can help researchers compute similarities between different protein sequences. All this when you’re asleep.app that cures cancer
  3. Deborah Fallows has recently asked what you think people actually mean when they ask “Where do you live?” or “Where are you from?”Here‘s what she found out and it’s really interesting.
    where are you from
    Picture: Paul Thurlby
  4. The annoying typing indicator in online chats is there for a reason. Here’s the man behind the bubble, justifying his paranoia-inducing invention. The logic behind it is quite smart.typing indicator
  5. Facebook knows when you’re starting a new relationship just by looking at the frequency of your posts. When you think of it, it is rather creepy.Facebook knows you are in a relationship

Structuring data: the basics

In order to properly analyse data, you need to structure it first. Here is a couple of tips and tricks on how to do it in an Excel table if you are only at the beginning of your adventure with data journalism.

  1. You will want to start with a table, which contains rows and columns. Each column corresponds to a variable, and each row corresponds to a record.structuring data
  2. Make sure you include only one header row at the very top of the spreadsheet. It should contain column names, one next to another. If you come across a table with multiple headers, simplify it into a single header or divide the data into multiple tables.data structuring header
  3. Remember to include only one type of data per variable – one column should only include one type of data.data structuring
  4. Make copies of spreadsheets with data that you are about to analyse. You might want to use your raw data for another analysis at a later stage so keep the original file untouched.
  5. Add new data to the table as new rows, not new columns. Columns correspond to new variables (which you haven’t looked at before), not new “data entries” or “data records”.data structuring
  6. Once the data is structured into an orderly table, it is time to decide what’s needed and what’s simply obscuring the big picture. Remove or modify any rows and columns which are not necessary or sufficiently accurate.
  7. Take care to name your variables (columns) in a clear and concise way. You might not be the only person dealing with the file so making the names as straightforwards as possible is key to make it work.
  8. Make sure your data for each variable is clear and readable. It must be entered in the uniform way into each column.
  9. Look out for any missing data and handle it as appropriate. Leaving a cell empty is in most cases safer than inserting a “0”.
  10. Finally, make sure you format all data according to its type (date/number/location/text…) so that Excel and any other processing software read is correctly.

Hint: All data structured and cleaned? Visualise it. In this post, I explain how to do it quickly and easily.