Big Data ABCs in NetApp BlogsWhere is place located?

Log in to ask questions, participate and connect in the NetApp community. Not a member? Join Now!

Recent Blog Posts

Refresh this widget

Jets_Formation_1_HiRes.jpgOn a recent trip to Phoenix my flight was delayed then cancelled.  Through rerouting and other delays what should have been a 1.5 hour direct flight turned into a 12 hour ordeal.

 

I wonder was I a victim of Big Data?  Let me explain.

 

One of my favorite stories of how Big Data can force changes in the way you do business and can cause you to call into question recommendations that are not intuitive is this hypothetical dilemma.

 

You are the flight operations manager of an airline.  You have 2 planes about to depart.  It's snowing hard.  The airport calls down to inform you that only 1 of your 2 planes will be granted permission to depart before the airport will shut down.  One plane has 4 passengers on board, the other is full with 200 passengers.

What do you do?

You run your new "Flight Operations Optimizer" application.  It's a new "Big Data" application that calculates the down stream 72 hour impact of canceling a flight based on multiple data sets including all passengers impacted, expected weather delays at all downstream destinations, airplane maintenance schedules and crew schedules.
The Flight Operations Optimizer comes back and advises that you should let the flight with 4 passengers depart.  That alternative has the least down stream impact to the airline - A better business outcome.

To which you say WHAT! That can't be right.  If I do that I'll have 200 very upset passengers at the counter to deal with and furthermore I won't make my goal of - maximizing passenger on time departures!

 

The story points out 2 major elements of Big Data.

  1. If you can analyze enough data that is often kept in different silos the results can well be counter intuitive.  Leading you to dismiss it and still take the decision that doesn't lead to the best business outcome.
  2. The employee goals you have in place to drive the best corporate outcomes might be driving behaviors that don't actually don't accomplish that.
Now back to my bad travel karma story.

 

Moments before boarding we were told our flight is now departing from another gate.  Quickly we hurried over to the new gate only to be told our flight was delayed 4 hours.  What happened?

 

From eavesdropping on the gate agents conversation I learned that the plane for the flight to LA had a mechanical and that our plane was allocated to the LA flight. Our flight was delayed while they looked for another plane.  Two hours later our flight was canceled.

 

Could it be that the flight operations manager ran the "Flight Operations Optimizer" application the result being that giving our plane to the LA flight had the least down stream impact? Then after most of the passengers on our flight found alternatives to waiting 4 hours there were very few left on our delayed Phoenix flight.  So the "Flight Operations Optimizer" application advised it be cancelled.

And that is how I suspect I fell victim to Big Data.  Best business outcomes don't always mean best personal outcomes.

 

As a footnote to this story upon my return I received a $200 voucher for the inconvenience. I hope that cost was factored into the "Flight Operations Optimizer"

All's Well That Ends Well.

Screen Shot 2012-12-08 at 10.19.21 AM.png

NetApp has offered Big Data Solutions since May of 2011.  We offer a portfolio of 10 solutions that address the major use cases of Big Data.   These solutions are based on both our storage platforms: FAS with Clustered Data ONTAP and E-Series with Santricity.

 

The Big Data market is loud and confusing.  In fact Big Data was named the most confusing term in IT this year surpassing Cloud which is now number 2.  Most of this confusion is the result of the fact that no two cases of Big Data are alike.  Additionally there are new technologies like Hadoop and it’s ecosystem of tools and applications that are causing a disruption to analytic technologies resulting a many innovations and considerable VC investment in start-up companies.

 

Big Data has captured the imagination of many enterprises as documented success stories point to new ways of doing business that change the game and result in considerable competitive advantage or significantly better business outcomes.

 

NetApp has a credible seat at any customer discussion about Big Data by way of our considerable experience in managing data at scale.  Our largest customer has over an Exabyte of data and we have hundreds of customers with over ten petabytes.  Many of the storage efficiency innovations that NetApp has lead such as deduplication and thin provisioning have lead to contemplation of “keep forever” data strategies.  The “delete key” is no longer the answer to Big Data.

 

What makes Big Data different is that customers reach an inflection point where they can no longer continue to what they did yesterday but just a little more.  Indeed they must fundamentally rethink their data storage strategies.   It is at that inflection point where without deployment of new approaches and technologies data growth and Big Data can become a liability or with the right approach become a propellant to the business.

 

It is at this inflection point that NetApp can be your trusted partner to help customers use Big Data to grow their businesses efficiently and flexibly.

abc.jpg

If you haven't noticed Big Data has created a lot of buzz lately.  Much of the buzz is from the absolute wow factor of how big is big.  With the number of smart phones nearing 6 billion all creating content, Facebook generating over 30 billion pieces of content a month and data expected to grow at 40% year on year it's easy to imagine big really is BIG.

 

In fact the digital universe has recently broken the zettabyte barrier which is approximately equal to a thousand exabytes or a billion terabytes.  How big is that?  To give you an idea of scale it would take everyone on the planet posting to Twitter 7*24 for 100 years to generate a zettabybe.

 

So you get the idea - it’s really big. 

 

As an IT organization you may be thinking that your own data growth will soon be stretching the limits of your infrastructure. A way to define big data is to look at your existing infrastructure, the amount of data you have now, and the amount of growth you're experiencing.  Is it starting to break your existing processes? If so, where?

 

“Big” refers to a size that's beyond the ability of your current tools to affordably capture, store, manage,and analyze your data. This is a practical definition since “big” might be a different number for each person trying but unable to extract business advantage from their data.

complexity-speed-volume.jpg

 

When we talk to our customers, we find that their existing infrastructure is breaking on three major axes:

 

  1. Complexity.  Data is no longer about text and numbers; It includes real-time events and shared infrastructure. Data is now linked at high fidelity and includes multiple types. The sheer complexity of data is skyrocketing. Having to apply normal algorithms for search, storage and categorization is a lot more complex.
  2. Speed.  How fast is the data coming at you? High definition video, streaming over the Internet to storage devices, to player devices, full motion video for surveillance – all of these have very high ingestion rates. You have to be able to keep up with the data flow. You need the compute, network and storage to deliver high definition to thousands of people at once, with good viewing quality. For high performance computing you need systems that can perform trillions of operations and store pedabytes of data per second.
  3. Volume.  For all of the data you are collecting and generating you have store it securely and make it available for ever. IT teams today are having making decisions about what is “too much data”. They might flush all data each week and start again. But there are certain applications like healthcare where you can never delete the data. It has to live forever.

 

These trends in data growth are something we at NetApp have been following for quite a while now.  We’ve been enhancing ONTAP to deal with the scale needed to handle large repositories of data and we have also made strategic acquisitions anticipating the need for high density high performance (Engenio) and infinite content repositories (Bycast).

 

In conversations with our customers dealing with the onslaught of data we have noticed 3 important use cases that are stretching the limits of their existing infrastructure.

 

We’ve named these axis’ the ABCs of Big Data.

  • Analytics.  - Analytics for extremely large data sets to gain insight and take advantage of that digital universe, and turning it into information. Giving you insight about your business to make better decisions.
  • Bandwidth - Performance for data-intensive workloads at really high speeds.
  • Content - Boundless secure scalable data storage to write it, can find it, and keep it forever.

 

BigDataABCs.jpg

 

We are organizing our product Big Data product portfolio under these three use cases.  In future posts I'll be discussing our solutions in each area.

 

To summarize - behind the hype there are multiple opportunities. You need to be asking, where are the opportunities where can I  take advantage of my data?  What are the insights that can really help my business?  Where are the places I can use my data to competitive advantage?  Can you link the trends in buying patterns to people's physical location at a point in time to give them a better experience? Can you detect when fraud is about to happen? Can you find the likely hotspots for failure before they fail?


Your universe of data can be a gold mine. Can you find the value and turn it into real business advantage.  If you don't you can be sure your competitor is.