Big Data: full-size disruptive
When did Big Data actually get into our lives and more importantly, what caused it to be so widespread? What are the causal factors behind the popularity of Big Data? In this article we will focus on some of these factors and their history to better evaluate the increase in demand for Big Data and consequently try to showcase whether Big Data is a buzzword or an actual medium of opportunity.
What we, in fact, see with Big Data history is that some Big Data capabilities were present long before the term – ‘Big Data’ – gained prominence. But there was not enough demand for it until a certain point in time.
Remainder of this article will focus on some points back in time to see what kind of Big Data competences were out there but were seldom used. We will then focus on the significant factors that show us Big Data demand actually has been and presumably will be rising fast.
Past in time around Big Data
Most of the time you will see that the history of Big Data space began with Google’s GFS (Google File System) paper⁽¹⁾ , written around 2003. Around 2006 we saw Hadoop⁽²⁾ in action for the first time. However, the fact that the history of Big Data began with the GFS paper is not entirely true although it is maybe one of the most important milestones; at least till now!
Eric Brewer’s CAP theorem is an example that has been present for a while now, and also existed before the GFS paper. The CAP theorem is still relevant and extremely important in the Big Data domain.
CAP Theorem
The CAP theorem is actually pretty straightforward:
Out of the three pillars of Consistency, Network Partitioning and Availability; you can only choose two at the same time. (see above figure)
You cannot have all three at once!
Not to lose the focal point around this article; we will not explain how CAP Theorem works in detail as this will need further detailed technical explanations. We will rather focus on the impact of the CAP theorem.
Big Data platforms are obviously distributed. Assuming Network Partitioning is a must in a Big Data environment, the choice becomes clear: a trade-off between Consistency and Availability.
Let’s first see two basic examples for understanding the ‘trade off’ better before focusing on when the idea regarding CAP theorem first came out:
Example 1:
Let’s think of a financial Big Data system that provides a certain context of information to their customers. Most probably a financial system needs to be extremely precise with the information provided to the customers. So these kind of systems will most likely favor to have content consistent rather than to be available all the time. Think of a scenario where a customer logs in to two different channels one after the other to check his/her account. Doubtless it is preferable to show the same and correct account details and not have high availability at all times compared to having high availability always and at the same time having the rare possibility to show inconsistent account details.
Example 2:
In contrast to the above, let’s think of an entertainment system. An entertainment system might favor to be available all times instead of making sure the content is consistent over all partitions. Therefore this time the preference might be to be available compared to being consistent all the time.
Eric Brewer came up with the idea of CAP Theorem around 1998 e.g. 5 years before the GFS paper⁽¹⁾, 6 years before MapReduce paper⁽³⁾, 8 years before first Hadoop release, 9 years before first HBase⁽⁴⁾ release. However, it was not accepted as a valid theorem until 2002. But still a valid theorem; whenever we give a decision on which storage option to use (e.g. HBase vs Cassandra⁽⁵⁾) we should visit this theorem once again. So it was already known, at least by some people, how important Big Data concepts like Partition Tolerance were (alongside with availability and consistency).
Only after 2006 (after Google’s GFS paper release) the popularity of the Big Data domain soared! This explicitly suggests that back in 1998, only a tiny percentage of audience was interested in Big Data. An important theorem like the CAP only picked up attention and became an actual theorem after four years of its announcement. The attention that CAP theorem has received and its importance is unquestionable. Even Werner Vogels (Amazon CTO) published a post on the CAP theorem in his personal blog⁽⁶⁾. The fact that important theorems around Big Data started receiving significant attention indicates the importance of Big Data.
Data is an important part of data management systems. In other words, the demand for Big Data has definitely increased and the theorems, white papers released about it align with this rationale.
Let’s look at this increase in demand from a different perspective. One simplistic Big Data definition is the 3 Vs (although there are additional Vs other than the three, let’s focus on the three for simplicity): Volume, Velocity and Variety. The volume of data simply refers to the fact that volume can be larger within Big Data platforms than would be the case for traditional data management systems. The velocity of data refers to the fact that Big Data platforms are able to process both data-in-motion (streaming data, e.g. live traffic data of New York) and data-at-rest (e.g. reporting layer fact-dimensions). The variety of data refers (e.g. structured data, weblogs, sensor data, video…etc.).
Is Big Data Yet Another CRITICAL Word (!) The answer’s check is pretty simple: Data. One thing is for certain, data volume/ variety/ velocity will grow in the near future.
Now; let’s pick one of the three, volume of data:
An article published by IDC predicts the total volume of data stored electronically in 2020 to be around 44 zettabytes, the same value being 4.4 zettabytes in 2013⁽⁷⁾. Considering a ten times growth in 7 years and knowing that the growth is exponentially, it might be hard to predict how much data will exist electronically 15+ years from now. Before taking a moment for the brief shock, we also have to remember another massive impact that has not had its affect fully – even briefly- so far: Internet of Things. Most of the enterprise data that is generated focuses on people, in another words data that is generated from the interactions amongst individuals. With the wider usage of IoT becoming more common in the coming years, we will also have the ability to capture more granular and enriched data from IoT as things can interact with people but also can interact with other things. We have a long and exciting path into the future for IoT in many fields like Smart Homes, Smart manufacturing, Smart Transportation, Healthcare…etc.
To sum up; we can clearly see the causal relationship between increase in demand for Big Data and CAP Theorem, Big Data white papers, increase in data volume/ variety/ velocity. These critical topics indicate that Big Data is and will continue to be one of the disruptive technologies.
Another perspective to CAP Theorem, white papers and 3Vs is that, they all link to disruptive technologies that heavily depend on data.
Ever heard of the term singularity? Simply put; singularity is radical changes brought to us by technological advancements. With the growth of data and advancement in AI technologies, we will and maybe have already started experiencing singularity moments in shorter life-cycles. Technology advancement acceleration is higher than ever before. Future beholds many more disruptive technologies and it is closing in fast!
Big Data brings many opportunities if used correctly. Big Data is not just this low-cost (lowest cost-per-byte) data storage platform anymore as it brings new use case possibilities as well. Use cases that were either very complex or costly to implement with traditional technologies, are now easily implemented within the Big Data domain. Big Data covers a wide spectrum of use cases which focus mainly on Advanced Analytics, Artificial Intelligence, Machine/ Deep Learning as well as the ability to have high computation and storage power cheap and ability to acquire/process/store structured/ unstructured, batch/ streaming data.
Additionally, it can be lucidly seen that one of the major advantages of Big Data use cases is that these are typically new business use cases; use cases that have never been implemented before. It is exciting to even think about how much more value Big Data use cases might bring compared to traditional ones.
“Change before you have to”, Jack Welch
One thing is pretty obvious, with Big Data we have way more ammunition than we are used to. Meaning, we can reach and make use of all data available in the organization with different types of new and high-value use cases. The motivation to operationalize these use cases needs to be present at your organization as the future will bring more innovations. Operationalizing these use cases will ensure that your organization is constantly ready for the continuous, shorter life-cycle innovations that after a point result in singularity moments.
The future definitely will be impacted by the data growth period we are currently going through. Moreover data growth will still keep on accelerating and the point where it will start to calm down is only prediction at the moment. With increasing level of competitiveness and higher expectations on organizations reaching business results faster, data will play a vital role. Organizations which are able to turn this into benefit will gain competitive advantage with new business models emerging from new Big Data based use cases. It is highly possible that organizations which are not able to keep up with Big Data and are late, will eventually also step towards Big Data. But this time the competitive nature of the advantage will no longer be valid. It will be a comparative advantage, that when realized it will not provide a lot of value relative to competitors, but if not realized it will affect organization’s competitiveness negatively.
Previously published at https://www.deloitteforward.nl/en/data-analytics/the-impact-of-big-data-past-and-future/