I was asked by Managing Partners Forum (MPF) recently to give a brief overview of the current status and industry trends in Big Data and Data Analytics, topics I’ve been keeping an eye on for several years. The slides are available on Slideshare. The following is shortened abstract from the presentation.
One of the issues I have with with Big Data is just that – the term “Big Data”. It’s fairly abstract and defies a precise definition. I’m guessing the name began as a marketing invention, and we’ve been stuck with it ever since. I’m a registered user of IBM’s Watson Analytical Engine, and their free plan has a dataset limit of 500MByte. So is that ‘Big Data’? In reality it’s all relative. To a small accountancy firm of 20 staff, their payroll spreadsheet is probably big data, whereas the CERN research laboratory in Switzerland probably works in units of terabytes.
Eric Schmidt (Google) was famously quoted in 2010 as saying “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created in 2 days”. We probably don’t need to understand what an ‘exabyte’ is, but we can get a sense that it’s very big, and what’s more, we begin to get a sense of the velocity of information, since according to Schmidt it’s doubling every 2 days, and probably less than that since we’ve moved on by 6 years since his original statement.
It probably won’t come as a surprise to anyone that most organisations still don’t know what data they actually have, and what they’re creating and storing on a daily basis. Some are beginning to realise that these massive archives of data might hold some useful information that can be potentially deliver some business value. But it takes time to access, analyse, interpret and apply actions resulting from this analysis, and in the mean-time, the world has moved on.
According to the “Global Databerg Report” by Veritas Technologies, 55% of all information is considered to be ‘Dark’, or in other words, value unknown. The report goes on to say that where information has been analysed, 33% is considered to be “ROT” – redundant, obsolete or trivial. Hence the ‘credibility’ gap between the rate at which information is being created, and our abilities to process and extract value from this information before it becomes “ROT”.
But the good news is that more organisations are recognising that there is some potential value in the data and information that they create and store, with growing investment in people and systems that can make use of this information.
The PwC Global Data & Analytics Survey 2016 emphasises the need for companies to establish a data-driven innovation culture – but there is still some way to go. Those using data and analytics are focused on the past, looking back with descriptive (27%) or diagnostic (28%) methods. The more sophisticated organisations (a minority at present) use a forward-looking predictive and prescriptive approach to data.
What is becoming increasingly apparent is that C-suite executives who have traditionally relied on instinct and experience to make decisions, now have the opportunity to use decision support systems driven by massive amounts of data. Sophisticated machine learning can complement experience and intuition. Today’s business environment is not just about automating business processes – it’s about automating thought processes. Decisions need to be made faster in order to keep pace with a rapidly changing business environment. So decision making based on a mix of mind and machine is now coming in to play.
One of the most interesting bi-products of this Big Data era is ‘Machine Learning‘ – mentioned above. Machine learning’s ability to scale across the broad spectrum of contract management, customer service, finance, legal, sales, pricing and production is attributable to its ability to continually learn and improve. Machine learning algorithms are iterative in nature, constantly learning and seeking to optimise outcomes. Every time a miscalculation is made, machine learning algorithms correct the error and begin another iteration of the data analysis. These calculations happen in milliseconds which makes machine learning exceptionally efficient at optimising decisions and predicting outcomes.
So, where is all of this headed over the next few years? I can’t recall the provenance of the quote “never make predictions, especially about the future”, so treat these predictions with caution:
- Power to business users: Driven by a shortage of big data talent and the ongoing gap between needing business information and unlocking it from the analysts and data scientists, there will be more tools and features that expose information directly to the people who use it. (Source: Information Week 2016)
- Machine generated content: Content that is based on data and analytical information will be turned into natural language writing by technologies that can proactively assemble and deliver information through automated composition engines. Content currently written by people, such as shareholder reports, legal documents, market reports, press releases and white papers are prime candidates for these tools. (Source: Gartner 2016)
- Embedding intelligence: On a mass scale, Gartner identifies “autonomous agents and things” as one of the up-and-coming trends, which is already marking the arrival of robots, autonomous vehicles, virtual personal assistants, and smart advisers. (Source: Gartner 2016)
- Shortage of talent: Business consultancy A.T. Kearney reported that 72% of market-leading global companies reported that they had a hard time hiring data science talent. (Source: A.T Kearney 2016)
- Machine learning: Gartner said that an advanced form of machine learning called deep neural nets will create systems that can autonomously learn to perceive the world on their own. (Source: Ovum 2016)
- Data as a service: IBM’s acquisition of the Weather Company — with all its data, data streams, and predictive analytics — highlighted something that’s coming. (Source: Forrester 2016)
- Real-time insights: The window for turning data into action is narrowing. The next 12 months will be about distributed, open source streaming alternatives built on open source projects like Kafka and Spark. (Source: Forrester 2016)
- Roboboss: Some performance measurements can be consumed more swiftly by smart machine managers aka “robo-bosses,” who will perform supervisory duties and make decisions about staffing or management incentives. (Source: Gartner 2016)
- Algorithm markets: Firms will recognize that many algorithms can be acquired rather than developed. “Just add data”. Examples of services available today, including Algorithmia, Data Xu, and Kaggle (Source: Forrester 2016)
The one thing I have taken away from the various reports, papers and blogs I’ve read as party of this research is that you can’t think about Big Data in isolation. It has to be coupled with cognitive technologies – AI, machine learning or whatever label you want to give it. Information is being created at an ever-increasing velocity. The window is getting ever narrower for decision making. These demands can only be met by coupling Big Data and Data Analytics with AI.