By Rajesh "Sheik" Kapoor,
What's all the buzz around Big Data?
Let’s first agree that “Big Data” has been around from the time we’ve been saving data to digital media. To give you a little history, here’s my version of the challenges/opportunities of Big Data starting in 1970. In the 70s, Big Data was considered something greater than 1KB, in the 80s it was hundreds of MBs, in the 1990s it was hundreds of GBs, in the past 10 years it’s hundreds TBs and now in this decade it’s hundreds PBs. If history repeats, which it always does, I’m pretty certain Big Data challenges/opportunities in 2020 are going to be hundreds EBs.
Please remember that in 1970s, 1000 KBs was Big Data, which equals 1 MB. In today’s scale you would need four times that amount to store one song in your music library.
What does that mean in relative terms?
In 1970, one elephant could remember all the data we had,
and now you need a herd of elephants to remember everything.
Now that we've reviewed the history…
So many times, these days, I get asked: What’s the difference between “big data” and traditional data? My answer: Big Data Analytics has a lot to do with the quantity, speed, and type of analysis on the data. For example, during the 2012 Summer Olympics in London there were 60GB of information per second expected to flow across British Telecom networks, and 200,000 hours of Big Data generated before the Games even started—just while testing the IT systems.
We are increasingly collecting more and more “stuff” and it’s imperative to be able to analyze it, but very few people know what that process looks like or really understand how to dig into the intense amount of “stuff” that’s being collected.
Here are some stats for perspective:
- According to IDC Digital Universe study, data creation and replication will increase four-fold by 2015, reaching 1.8 zettabytes. Data will grow 50X in the next 10 years.
- 80% percent of the world’s data is unstructured, and today most businesses don’t even attempt to use this data to their advantage because it either seems too expensive, they lack the tools and resources to process it, or they simply don’t know what’s possible.
- Unstructured information is growing at 15 times the rate of structured information.
- Raw computational power is growing at such an enormous rate that today’s off-the-shelf commodity box is starting to display the power that a supercomputer showed half a decade ago.
The technologies to do analytics at a reasonable price are just now coming into existence. These are game changing technologies such as Hadoop. Hadoop is an open-source technology that was developed by Google and reversed engineered by Yahoo, and now is available as a download from Apache. However, customers that want Hadoop as an enterprise offering can get this software by companies such as Cloudera, Hortonworks and MapR. This space is interesting and the NetApp E-Series offering is well positioned to help customer solve Big Data problems today.
Looking forward to your comments. And feel free to reach out to me on Twitter at @sheik230.