There are all kinds of issues involved in big data. To succeed, at some stage, you will have to wrestle with questions of information strategy, governance, skill shortages, market drivers and various flavours of analytics.
How does your bank automatically know that the person buying clothes with your credit card isn’t you? How can a machine make an instant assessment of your tastes, health and age, just by looking at you? How can a video camera spot a potential terrorist just by the way he walks? The answer, as we shall explain later, lies in Big Data.
But first we must define our terms. What is data? It’s the raw material for something interesting. But it’s only really useful if you cross reference it and give it some context and relevance. For example, data tells you that the red fleshy things you bought from the supermarket are tomatoes. Knowledge tells you that the tomato is a fruit. But experience will tell you never to use it in a fruit salad again.
Big data is about the masses of information that is out there. It exists in all kinds of forms, both structural (as in the rows and columns of spreadsheets or the fields and tables of databases) and unstructured (as in tweets, emails, Facebook posts, videos and every other type of social media). Big data is the art of amassing this data into one large repository. Data science is applied to layer different data sets onto each other, find patterns and extract meaning. Data science would put the fruit salad question into context for the tomato.
Big data – though nominally the term for the mass of information – has become the colloquial definitive term for the practice of analysing masses of data. Confused? You should be. Before you discuss and develop a big data strategy for you organisation, it is important to put your definitions into context.
Questions to ask yourself
What are you doing with all of the data your company collects? How do you want to use it – to save money or to be more productive? What are your sources of information? You have many disparate seams of raw data, ranging from your company’s dark data to partner, employee, customer, and supplier information. This could be cross referenced (or layered over as we say in data science) with public, commercial and social media data.
It all needs to be sliced and diced and needlessly linked and cross referenced, so that the links between all pieces of information become apparent and some patterns emerge. Then you will be able to understand the value of this intelligence and exploit it fully.
Your company policy for using big data should be based on a simple strategy comprising of three elements:
User expectations: Your employees are demanding more access to big data sources. What's your plan to manage access to these information sources? What are the use cases?
Costs: How can you deliver access to big data in a rapid and cost-effective way to support better decision-making?
Tools: How will you link these new sources of diverse data? You need to plan the impact on your data center. Have you identified the processes, tools and technologies you need to support big data in your enterprise?
Your next challenge is to think about how you want to process the data. How do you match the tools with your objectives?
You need to draw more insight from your big data analytics or large and complex datasets. You need to predict future customer behaviors, trends and outcomes.
As with big data itself, there are structured and unstructured techniques, The three main structured disciplines are predictive analytics, behavioural analytics and data interpretation.
This helps you assess your intelligence and work out what’s going on, and what’s likely to happen, in the various domains within your
organisation. It can be used by mobile phone operators, for example, to help their clients have a better experience while they are online. The logic being that this will entice them to spend more money. So if users are downloading a film, and customer records suggest that they are a valuable client, the network could be configured to give them preferential access, or bonus minutes on their account.
You can use unstructured information here too. Sometimes real time analysis might make an assessment of, say, sentiments being expressed on Facebook or Twitter. This would involve investigation of new data types such as sentiment data, clickstream data, video, images and text.
Some retailers use cameras to capture data about shoppers, use this to assess their age, weight and sex, then layer this information onto a database about target audiences for their products. The next time each shopper passes a giant display screen, an advert pops up for a product the machine has decided that person really needs!
How will you tap into complex data sets to create new models to drive business outcomes, decrease costs, drive innovation or improve customer satisfaction?
Researchers at Kingston University have been working on a system that analyses video data and applies algorithms that compare movements and body language with known data sets. They can use this information to automatically spot people who are acting suspiciously (or unusually) and raise an alert. The application is mooted for use in counter terrorism.
What new business analysis can be drawn from your data? How will IT help support insight discovery and new information trends? You need to know which data to integrate for new product innovation.
Banks now use big data analytics to spot anomalies and unusual patterns of spending. This is the intelligence that causes the bank to automatically stop your credit card when you go abroad. The system of data interpretation is relatively new and there is massive scope for improvement.
The big data market is growing like Japanese knot weed and its appetite seems just as voracious. It is swallowing up IT budgets and talent as it expands faster than any other market in the entire spectrum of technology. Analyst Wikibon forecasts that Big Data market will grow at around 58 per cent a year between now and 2017, hitting the $50 billion in five years. The boom in machine generated data, created by the rise of machine to machine communications, will accelerate growth even further.
A number of elements have come together at the right time to fuel this perfect storm. The hardware needed to crunch all this data continues to improve by leaps and bounds. There are questions being asked, by experts in silicon technology, how much further the current design for CPUs (central processing units (CPUs) can go. Some say that in eight year’s time, we could come to an impasse. But other factors will compensate for this.
Meanwhile, the open source movement will continue to create systems such as Hadoop and hardware makers will pioneer techniques such as in memory processing, to make the machinery of data processing ever more powerful. Techniques for processing, such as the data analyst language R, machine learning algorithms and embedded analytics, continue to push the boundaries of possibilities of what can be achieved by searching into data for patterns and meaning. Meanwhile, Splunk, the big data platform for machine code, will help us make sense of the world when machines take over.
Currently, most big data projects involve banking, trading, telecoms, pharmaceuticals, retail, web businesses and computer security. In the majority of cases, it will be used to detect patterns of risk and fraud in those industries. That means we are only scratching the surface and the techniques and disciplines being honed now can be employed on a far wider stage.