Hemant Tailor

Hemant is Virtual Clarity's Senior Data Scientist.

View articles by Hemant

Data Science: Magic for the 21st Century

“It’s a kind of magic”
– Queen

Yes! Magic! Let that sink in for a moment…. right now, you are either shouting at the screen burning in rage or intrigued by this wonder that will solve all your problems – I hope it’s not the latter!

I began my introduction like this because it surprises me that I still often find myself in conversations where ‘magic’ and ‘data science’ are used in the same sentence, sometimes leading to the belief that they are the same thing. It’s as if people think that data scientists have a mystical tool box where they can throw in every data source given to them and deliver ground breaking insight with a push of button. In an instant, they’re able to deliver a shaft of light that shows the way to one prize, one goal.

Data science isn’t a dark art, but an evolution of what people have been doing for a long time. The foundation is built upon a multi-disciplinary field that has refined how the scientific method is used to solve industry problems; an applied science. The goal is to apply an academic approach to the analysis using a range of analytical techniques, and draw conclusions to make better driven decisions/experiments.

“Any sufficiently advanced technology is indistinguishable from magic.”

– Arthur C. Clarke

To generalise in a 10,000 ft overview, the first step to a successful project is to identify the realistic problem you are trying to solve and determine whether you can divide and conquer; take a big problem and break it into smaller ones. Next, gather the data sources that will help inform or formulate your hypothesis. Data, and getting access to data, is key. You should be asking yourself at the same time whether the data you need even exists!

In most cases, the initial data analysis is exploratory because you need to develop an understanding of whether the data is usable. It’s no good being given GBs of data when only 1% is useful; you may need to go back and get more data. Data preparation and triage must be done, or you may find that your hypothesis needs to change.

These are just some of the reasons why it’s a good idea to start data science projects small before developing full proofs of concept. It gives you an opportunity to generate ideas, but more importantly, to fail fast with ideas that don’t work out before any serious investment is made – whether it be intellectual, time or money (in that order).

Now for the next phenomenon: ‘data science / big data envy’ - the phrase isn’t new, but it perfectly describes the attitudes some people have when making business decisions at the senior level. It’s important to recognise this, since it can sometimes be the reason why a lot of data science and big data projects fail at the first hurdle. People hear the terms like data science, big data, machine learning, deep-learning etc., and feel the anxiety of falling behind their competitors - so they must have it. But their biggest mistake is first failing to ask, “What problem am I trying to solve?” before throwing money at it.

You don't need magical powers to do data science, but get it right and your colleagues will think you are pretty flash. Win big and you may find that your management really want it all and that you and your team really are the champions.