- The claim that “25 countries are now involved in the vast search of the Malaysian airliner MH370″ is largely understated. About 3 million volunteers from around the world pore over satellite imagery in an effort to find any trace of the plane. Together they went through 24,000 square kilometers (9,300 square miles) of high-resolution imagery.
- A London-based artist used Open Street Map data to generate images of cities at night because those taken from space were too blurry. These are stunning.
- What is the ROI on learning a foreign language? The Economist calculated that over a lifetime it can be even $128,000, but only if you study German.
- Now, did you know the Twitter bird’s name is Larry? Mashable compiled a list of famous tech mascots and stories behind their creation.
- Have you played some origami recently? Good, you are now ready to fold a working (!) paper microscope. This hands-on science experiment could revolutionise healthcare in developing countries.
- Associated Press reported that pilots often land at the wrong airport when several are close together. There has been 150 such flights since the early 1990s.
- You’ll be surprised to know that in Afghanistan there exists a “jihad museum”. It tells the story of a very particular chapter in the history of Afghanistan, namely the Soviet invasion in the 1980s. Watch out, it’s rather grisly.
- London’s Euston is getting a redesign. Following the shiny example of King’s Cross, the neighbourhood is to become a lively quarter with shops, offices and new homes up for grabs. Oh, and a Euston Arch.
- Norway has unveiled the designs for a memorial to remember the victims of a worst mass shooting in modern history that killed 77 people in Oslo and Utoya in 2011. It’s nothing short of powerful.
- Following the announcement of his campaign in Rome, Martin Schulz’s twitter bio sneakily changed from “the President of the European Parliament” to the “PES candidate for President of the EU Commission”, thus making all his followers – willingly or not – endorse his candidature (In case you wondered, the EP Presidency got a brand new account). Where’s fair play?
On the weekend of 22/23 February, I took part in Build the News – the first ever coding event organised by The Times and The Sunday Times. This two-day “hackathon” brought together 10 teams of student journalists and developers who were tasked with creating a digital journalism product in one of the four categories:
- Stretch (or how to develop long form journalism across multiple devices),
- Crowd (or how to develop a tool or platform that allows effective campaigning),
- Tactile (or re-thinking Sunday paper reading/sharing experience),
- Noise (or how to facilitate finding the details and people that matter around big events and moments).
It was the first time I got a chance to participate in a hackathon and surely not the last one. Inspiring in many ways, it was an eye-opener to how solutions are born to problems you face every day as a journalist.
So why hack?
Firstly because you stand a chance to tackle the knottiest problems you have always wanted solved. Who said you can’t be the first one to come up with a solution? This way you serve yourself and others, and learn loads in the process, simply by picking others’ brains.
Secondly, because only seated in front of a task you can unleash the creativity you never suspected yourself of. You are given 48h to brainstorm, build and refine a product. Unlike any other day, you can focus on doing just this so all your brainpower gets narrowed down onto this one particular task. It’s imperative to use it well.
And finally, for fun, because teamwork and filling the gaps others have difficulty with is very rewarding.
What to keep in mind?
Short and intense, hackathons are inherently unpredictable. You might prepare, start toying with an idea in your head, and yet end up doing a totally different thing. So don’t get too attached to your pre-conceived no-matter-how-awesome plans. They are likely to change in the meantime.
Set realistic expectations. Think big but make sure you account for what you can achieve in the given time frame without constraining your creativity.
And just because time is so scarce, identify the problem as accurately as possible and define the amount of detail you want to (and can) go into before you find yourself tangled in too many meandrous yet insignificant elements of your project.
Don’t be afraid of launching your project for public scrutiny. Feedback and self-reflection are key to taking it to another level. Shed all your journalistic assumptions of keeping your idea exclusive and allow others to chip in, even if they pulverise your idea. It’s better to know it’s not working out right up front.
Refuel often. Breaks for meals and nibbles (and fresh air) are crucial to keep your inventiveness going.
- Here’s a list of companies in which interns earn more than the US median household income. As expected, the list is tech-heavy. An excellent prompt to make you re-think your future career.
- Orange juice to disappear from your breakfast table. As sales have been dropping almost every year for the last decade, the start-the-day-with treat is on the best way to become a luxury product.
- Google joined forces with Lego for virtual brick-building, which means you can now play with all sorts of colourful plastic pieces. And it’s limitless. Welcome Build with Chrome – the new form of procrastination at work.
4. Starlings in the skies. These little birds don’t like to fly on their own – instead they flock together in what we call “murmurations”. The Atlantic compiled a beautiful gallery of these formations.
5. And finally, to celebrate 2014 as the year with no leap day in February, meet the “leap second”. Watch Demetrios Matsakis, chief scientist for time services at the US Naval Observatory, explain the concept.
Import.io (pronounced import-eye-oh) lets you scrape data from any website into a searchable database. It is perfect for gathering, aggregating and analysing data from websites without the need for coding skills. As Sally Hadadi told Journalism.co.uk: The idea is to “democratise” data. “We want journalists to get the best information possible to encourage and enhance unique, powerful pieces of work and generally make their research much easier.” Different uses for journalists, supplemented by case studies, can be found here.
After downloading and opening import.io browser, copy the URL of the page you want to scrape into the import.io browser. I decided to scrape the search result website of orphanages in London:
After opening the website, press the tiny pink button in top right corner of the browser and follow up with “Let’s get cracking!” in the bottom right menu which has just appeared.
And confirm the URL of the website you want to scrape by clicking “I’m there”.
In the menu “Rows per page” select the format in which data appears on the website, whether it is “single” or “multiple”. I’m opting for the multiple as my URL is a listing of multiple search results:
Repeat the operation with the next entry/paragraph so that the scraper gets the hang of the pattern of your selections. Two examples should suffice. Scroll down to the bottom of your website to make sure that all entries until the last one are selected (=highlighted in pink or blue alternately).
If it is, press “I’ve got all 50 rows” (the number depends on how many rows you have selected).
Now it’s time to focus on particular chunks of data you would like to extract. My entries consist of a name of the orphanage, address, phone number and a short description so I will extract all those to separate columns. Let’s start by adding a column “name”:
If it didn’t, try tweaking your selection a bit. Then add another column “address” and extract address of the orphanage by highlighting the two lines of address and “training” the rows.
*Before passing on to the next column it is worth to check if all rows have filled up. If not, highlighting and training of individual elements might be necessary.
Once you’ve grabbed all that you need, click “I’ve got what I need”. The menu will now ask you if you want to scrape more pages. In this case, the search yielded two pages of search results so I will add another page. In order to this this, go back to your website in your regular browser, choose page 2 (or any next one) of your search results and copy the URL. Paste it into the import.io browser and confirm by clicking “I’m there”:
The scraper should automatically fill in your table for page 2. Click “I’ve got all 45 rows” and “I’ve got what I needed”.
You need to add at least 5 pages, which is a bit frustrating with a smaller data set like this one. The way around it is to add page 2 a couple of times and delete the unnecessary rows in the final table.
Make sure that the page depth is 10 and that click “Go”. If you’re scraping a huge dataset with several pages of search results, you can copy your URLs to Excel, highlight them and drag down with a black cross (bottom right of the cell) to obtain a comprehensive list. Paste it into the “Where to start?” window and press “Go”.
As a result, we obtain a data set which can be easily turned into a map of orphanages in London.
Do you have any further tips for import.io extraction? Do you know any other good scrapers? Share your thoughts in the comments below.
Hint: If you need to structure and clean your data, here’s how to do it.
In the meantime, look out for another post in which I will explain the next step: how to visualise the data you have.
- A restaurant, a swimming pool or a concert hall theater? 16 dormant subway stations in Paris are waiting for re-design. Here‘s how one majoral candidate has imagined them to be.
- Now your smartphone can help cure cancer, and for free. Thanks to its incredible processing power, your phone can help researchers compute similarities between different protein sequences. All this when you’re asleep.
- Deborah Fallows has recently asked what you think people actually mean when they ask “Where do you live?” or “Where are you from?”. Here‘s what she found out and it’s really interesting.
- The annoying typing indicator in online chats is there for a reason. Here’s the man behind the bubble, justifying his paranoia-inducing invention. The logic behind it is quite smart.
- Facebook knows when you’re starting a new relationship just by looking at the frequency of your posts. When you think of it, it is rather creepy.
In order to properly analyse data, you need to structure it first. Here is a couple of tips and tricks on how to do it in an Excel table if you are only at the beginning of your adventure with data journalism.
- You will want to start with a table, which contains rows and columns. Each column corresponds to a variable, and each row corresponds to a record.
- Make sure you include only one header row at the very top of the spreadsheet. It should contain column names, one next to another. If you come across a table with multiple headers, simplify it into a single header or divide the data into multiple tables.
- Remember to include only one type of data per variable – one column should only include one type of data.
- Make copies of spreadsheets with data that you are about to analyse. You might want to use your raw data for another analysis at a later stage so keep the original file untouched.
- Add new data to the table as new rows, not new columns. Columns correspond to new variables (which you haven’t looked at before), not new “data entries” or “data records”.
- Once the data is structured into an orderly table, it is time to decide what’s needed and what’s simply obscuring the big picture. Remove or modify any rows and columns which are not necessary or sufficiently accurate.
- Take care to name your variables (columns) in a clear and concise way. You might not be the only person dealing with the file so making the names as straightforwards as possible is key to make it work.
- Make sure your data for each variable is clear and readable. It must be entered in the uniform way into each column.
- Look out for any missing data and handle it as appropriate. Leaving a cell empty is in most cases safer than inserting a “0”.
- Finally, make sure you format all data according to its type (date/number/location/text…) so that Excel and any other processing software read is correctly.
Hint: All data structured and cleaned? Visualise it. In this post, I explain how to do it quickly and easily.