Exposed in NetApp BlogsWhere is place located?

Log in to ask questions, participate and connect in the NetApp community. Not a member? Join Now!

Recent Blog Posts

Refresh this widget
valb

Lead by Agility in Exposed

Posted by valb Mar 11, 2013

Measuring success is an important part of business, which is great motivation for analyst firms to regularly publish their findings.  Trouble is the creativity of our technology sector results in so many different permutations and combinations for customers to deploy data infrastructure solutions – that no single metric properly captured that success – until now.  IDC (finally) recognizes NetApp’s Data ONTAP as the industry’s #1 Storage Operating System!

 

Inspect what you expect

I’ve always believed delivering value to customers in our sector should be measured by their own success rather than ours. Nevertheless although capacity per revenue measured was always a simple objective metric I used in this regard, it historically failed to capture the underlying innovation which really sets NetApp apart as the most significantly growing vendor in our sector the past 15 years.

 

Data is the new IT celebrity

There was a time that hardware infrastructure vendors such as Dell, EMC, IBM, Intel & HP influenced the IT industry, followed by the era of software giants such as Microsoft, Oracle & SAP.  Business Value has migrated today to Amazon, Facebook, Google, Twitter, Yahoo!, et al which leverage the abundance of computing resources at their disposal to redefine our expectations of technology.

 

Agility is the new enabler

In order to deliver on these new data-driven expectations, IT organizations are shifting to service orientation while abandoning technology silos.  Shared resources, speed of deployment and reallocation on demand trumps legacy principles of heavy metal, size or hierarchy.

 

Storage Infrastructure requirements for our new reality include:

  • Deploying pools of compute / networking / storage that can be non-disruptively expanded to serve apps that haven’t even been conceived yet
  • Shrink-wrapping infrastructure to meet growth as well as reduction / reallocation of demand
  • Shared multi-tenant infrastructure supporting thousands (or millions!) of users and applications without planned or unplanned down-time
  • Ability to start small, grow fast and stay big while reducing management effort to a relative logarithmic basis
  • Increased infrastructure usage efficiency to address tension of data growth in the face of reduced budgets

 

Only an Agile Data Infrastructure can get you there!

 

Clustered Data ONTAP rises to the occasion

Built on more than 20 years of innovation, Clustered Data ONTAP has evolved to meet changing needs of customers and drive their success.  Today’s latest version combines the richest data management feature set in the industry with clustering for unlimited scale, operational efficiency, and non-disruptive operations.

 

The unique ability to plan, build & run a shared virtualized data infrastructure for small, medium, large & huge IT requirements provides NetApp customers with a compelling value proposition that is unmatched in the industry.   No other storage vendor is able to address these requirements with a single, integrated solution.  Accelerated adoption quarter over quarter of Clustered Data ONTAP is validating our strategy!


Data ONTAP - Broadest Consumption Models.png

The Best is yet to Come!

Now that we’ve accomplished yet another industry milestone, what continues to excite me about working at NetApp?  The answer lies in our increasingly broad array of consumption models.  As a portfolio company, NetApp now addresses fast-growing market segments such as HPC, Analytics & Web-Scale with technology that augments Data ONTAP.  But the shared virtualized core of our industry is served via a variety of Clustered Data ONTAP solutions including:

 

 

The robustness of these Data ONTAP varieties gives me unbridled optimism that NetApp will continue to lead our industry as we apply our growing innovation to the only constant behavior our customers display – change.

** This following is a GUEST POST by Peter Corbett, Vice President & Chief Architect of NetApp (aka "Father of RAID-DP") **

 

There has been a large amount of interest in recent years in erasure codes.  The common class of erasure coding algorithms that people normally associate with the term includes algorithms that add a parameterized amount of computed redundancy to clear text data, such as Reed Solomon coding, and algorithms that scramble all data into a number of different chunks, none of which is clear text.  In both cases, all data can be recovered if and only if m out of n of the distributed chunks can be recovered.  It is worth noting that, strictly speaking, RAID4,5,6 and RAID-DP are also erasure coding algorithms by definition.  However, that is neither here nor there – XOR parity-based schemes have different properties than the “new” algorithms being used in some systems that are what people are thinking of when they talk about “erasure codes”.

ErasureCodes-Figure1.png

(Credit: http://nisl.wayne.edu/Papers/Tech/dsn-2009.pdf, Page 2)

 

Use-cases for each

Erasure codes are being used for deep stores, for distributed data stores and for very scalable stores.  They are commonly used in RAIN systems, where the code covers both disk, node and connectivity failures, requiring data reconstruction after a node failure (instead of HA takeover which is much much faster).  These erasure codes are more computationally intensive both on encode (write) and decode (reconstruct) than xor parity-based schemes like RAID-DP.  In fact, one of the big motivations for developing RAID-DP was that the “industry-standard” Reed-Solomon code for dual-parity RAID-6, as well as the less widely used Information Dispersal algorithms to protect against more than one failure, are more computationally intensive than RAID-DP.  RAID algorithms are also very suitable for sequential access in memory, as they can work with large word sizes.  Many of the complex erasure codes are based on Galois Field arithmetic  that works practically only on small (e.g. 4, 8 or 16 bit) quantities, although there are techniques for parallelizing in hardware, on GPUs, or using the SSE* instructions on Intel architecture processors.

XORRDP-Figure2.png

(Credit: http://nisl.wayne.edu/Papers/Tech/dsn-2009.pdf, Page 2)

 

 

Erasing Limitations

Basically, an erasure code only works when you know what you’ve lost.  It provides sufficient redundancy to recover from a defined number of losses (failures).  If you don’t know what you’ve lost, you need an error detection and correction code, which requires a higher level of redundancy.

 

For disks, the known loss failure modes that we can see are disk failures, sector media failures and platter or head scoped failures.  All of these require reconstruction from redundant information, which in our case is dual parity (row and diagonal parity covering all double failures).  There are other failures that can invoke reconstruction, i.e. connectivity failure to a disk shelf.

 

However, there are other modes of disk failure that are more insidious, and that require some way to determine which data is bad from a set of data that the disk sub-system is claiming is good i.e. silent errors.  To cover those cases, we add metadata and checksums to each block of data.  If we detect a problem there, we can then use parity to reconstruct.   So, just saying that you have an erasure code that protects against more than two losses is not sufficient to claim superior protection of data.  You have to have worked through all the failure modes and make sure you can protect against those failures.  We’ve hardened ONTAP over nearly 20 years of existence to provide a very high level of resiliency against all modes of disk failure in combinations.

 

That said, we are very aware of the trend to larger disks and the impact that has on reconstruction times and on the probability of higher order multiple failures.  While we can’t disclose plans and roadmaps publicly, we do have a good understanding of this area that will allow us to continue to meet the reliability needs of future FAS & E-Series systems as well as future disks and solid state media.

Early during my tenure as Cloud Czar, I was proud to have helped NetApp become the first infrastructure vendor to build an expansive enterprise cloud ecosystem of telco and service provider partners.  Together, we serve over 77 percent of the Fortune 500 companies’ core applications and NetApp is the storage foundation for data served to more than one billion cloud users.

Slide3.png

Today, I want to share my enthusiasm for new customer cloud capabilities announced by NetApp and Amazon Web Services (AWS) at the inaugural re:Invent conference.  NetApp Private Storage for AWS combines the availability, performance, agility, and control of NetApp’s enterprise storage with the highly reliable, scalable, and low-cost infrastructure of the AWS cloud.

 

(UPDATE DEC 02, 2012: Here are links to the Solution Page and Data Sheet)


Dev meet Ops, Ops meet Dev, Execs meet your Goals

For the first time, NetApp and AWS are combining the agility of AWS compute with the advantages of data residing on NetApp’s enterprise storage.  Developers are no longer bound by glass ceilings on the compute power at their disposal.  IT managers can still deliver high SLAs and regulatory / legal compliance with data under their complete control.  Business leaders can freely balance capex and opex investments precisely according to market and financial needs rather than legacy technical limitations. This kind of IT flexibility is what separates the winners from losers as IT continues to evolve.


Slide1.png

NetApp + AWS = Use Case Abundance

 

  • Secure synchronous or asynchronous data replication: Seamlessly move data between AWS regions, conquering data gravity while uniquely increasing AWS application redundancy.
  • Gain cost-effective data protection: Deploy cost-effective disaster recovery and/or avoidance by leveraging Amazon EC2 in failover scenarios. Leverage tiered disk backup and recovery using Amazon S3 (and eventually Glacier) as a more reliable alternative to a tape archive.
  • Run high performance workloads in the cloud: High IOPS workloads with discrete QoS, served directly to respective instances on EC2 from NetApp storage.
  • Power big data analytics at lower cost: Leverage NetApp Private Storage along with Amazon Elastic Map Reduce (EMR) to deliver big data analysis at a fraction of the cost of running those queries on premises.
  • Add powerful features like Data Deduplication, Instant Data Virtualization (via Thin FlexClones), NFS and CIFS support to your AWS cloud infrastructure: Easily move traditional Enterprise Apps to the cloud without compromise!

Slide2.png

Is this the Beginning of the End for On-Premises Storage?

Quite the contrary!  It’s actually an exciting time for the future of storage.  In order to meet the insatiable appetite of users for the Internet of Things, storage is becoming far more distributed and integrated.  For NetApp and Amazon Web Services, this announcement is the beginning of what we expect will be a long and mutually rewarding relationship offering joint customers unprecedented functionality in the cloud.  This announcement provides a wealth of new storage and compute options available to existing NetApp and AWS customers.  I look forward to hearing about  the use cases we haven’t even thought of yet!  Please share them and other thoughts in the comments below.

Kamesh Raghavendra is an Advanced Product Manager working in my team for the Chief Technology Office at NetApp. He covers New Application trends including Big Data and NoSQL/NewSQL.  After our big Agile Data Center launch last week, he decided to put some of his work into this new context.

 

Can you match Extreme Transactional Scale with Infrastructure Agility?

The advent of cloud computing and the ubiquitous presence of mobile platforms have caused a paradigm shift in the scale of business operations of enterprises not only in the web services (messaging, gaming, social et al) space but also retail, financial services, media, telecom, cloud service providers, public sector, healthcare and utilities verticals. In order to provide competitive quality of service to their customers, these enterprises need to sustain unprecedented demands of performance, availability & agility to accommodate a fast growing global scale of operations. This has led to the genesis of a new breed of super-agile applications that can service transactions at this scale by breaking out of the limits of relational models of data organization – and are called NoSQL (Not Only SQL) applications.

 

Although relational models provide very rich query abilities and powerful normalization of data, their consistency characteristics are too rigid to allow the read/write availabilities these workloads demand (from the CAP theorem). As business operations get hammered by Internet scale & multi-geo reach, latency SLAs get phenomenally squeezed out – leading to unreasonable demands of availability while maintaining the very bare minimal level of consistency. Also this scale is growing at a tremendous rate, forcing vendors to switch to scaled out NoSQL applications (where one can mindlessly add more instances/nodes and instantaneously scale without impacting uptime/performance) for extreme agility.

The multi-DC/multi-geo requirements preclude the use of file systems (and deal with volume level replication) and allow only HTTP/RESTful interfaced systems to scale at a K-V pair/object granularity.

Thus this new species of applications is getting neatly wedged in between RDBMS' and traditional file systems.

The key mantras of this species of applications are:

  1. Replicate transactions across the wire for containing the fault domain (/server failures)
  2. Cache as much data in memory to boost read & write latency (cross-wire replication avoids the need to persist to disk) - some applications go to the extent of storing all data in page files and map them to disk, thus it only need a virtual memory store manager and not a complete traditional file service
  3. Keep the node level capacity low (<2 TB) to contain the drain on the network during rebuilds
  4. Use versioning frameworks to achieve tunable/eventual consistency (some applications use quorum, while some use vector clocks for parallel version tracks)
  5. Extremely simplistic scaling out – you can just introduce a new node into the cluster and the application with automatically rebalance.

 

Where does Hadoop fit in?

These applications are very different from Hadoop – and Hadoop is only remotely connected to this world:

  1. Hadoop is relevant in DSS/DW workloads involving batch processing of unstructured data faster than relationally modeled ETL processes can do – whereas NoSQL/in-memory/K-V stores are relevant in tier-2 business processing/OLTP workloads that involve unprecedented scale & multi-geo conditions with very stringent per-transaction latency SLAs. In other words, these applications play in the IOPs tier (and hence replace traditional RDBMS'). Hadoop replaces traditional DW/ETL processes.
  2. Hadoop involves compute intensive map-reduce friendly processes, whereas these applications target latency sensitive OLTP-ish queries/transactions. Hadoop's ingest performance is in no way acceptable here.

 

The role of Intelligent Storage

However, these applications pose new problems to customers and hence opportunities for IT vendors like NetApp to provide value differentiation:

  1. The cross-wire replication for every transaction cannot sustain for long – even with 10gE networks – the increase in scale of transactions far outbeats the increase in network bandwidth
  2. The failure rebuilds drain the networks badly – the only remedy for which is to over provision the number of nodes (to contain the capacity per node). The average CPU utilization I have seen is around 5%, there is a great demand to perform with fatter nodes.
  3. These applications create a volume in every node (as they are mostly deployed on internal HDD), and it is a nightmare to perform DR/backup across large clusters. Recoverability is a very tough problem customers face (as against dealing with individual server failures – which is easy).
  4. Memory provisioning is another issue as latency performance is a function of the total memory in the cluster (some customers talk about "DGM" or "disk greater than memory" as a concern) - I have seen in-production clusters with 10TB of memory provisioned for a 50TB working set data.
  5. Their replication strategy is brute force – so no snapshot like functionality for test/dev clusters.

 

NetApp's Role

These applications can thus be bundled with best of breed external storage in smart ways to bring the following value differentiation:

  1. Use faster & more reliable disks to reduce amount of replication and bring about network I/O-less rebuilds
  2. Use host side presence to pass hints to the external storage for meaningful data management/check-pointing
  3. Simplify DR/backup by condensing the number of data storage volumes with external storage
  4. Reduce the amount of memory provisioned for a given performance SLA – and hence reduce the number of CPUs provisioned (& hence higher CPU utilization)
  5. De-link object/K-V management from storage management (through the host side framework) and hence give multiple (apart from internal HDD) storage architecture options including NFS/CIFS.

 

As smart phones continue to proliferate & more business operations leap frog into web-scale outpacing even Moore's law, IT infrastructures need to match the agility of newly evolved application paradigms. At NetApp, we have a strong track record of providing our customers the most flexible datacenter solutions in the industry. We continue to work with our enterprise customers in building agile data centers that would empower their competitive edge through this burst of web scale operations.

More

Categories

There are no categories.

Appearances

Val's Recommended Reading

Follow Val on Twitter