<span style="FONT-FAMILY: Arial; COLOR: #a2a2a2; FONT-SIZE: 10px">Posted by John Fullbright, Business Application Lead, NetApp Professional Services</span>
“Deploy Exchange on DAS, it’s cheaper and simpler.” We’ve all heard that argument for some time now. It goes against the grain, completely ignoring the value proposition of SAN; or does it? What exactly is the added value that SAN provides for Exchange deployments? It was the promise of performance, availability, and flexibility. In one device you could pool a bunch disk, carve it up and present it to one or more servers, and at the same time be resilient to individual disk failures and provide a high degree of availability.
Availability was achieved with RAID. That’s a concept from the 80’s where we take a bunch of small disks and combine them together in redundant arrays. The most popular RAID types are RAID 5 and RAID 10. RAID 10 is simply mirroring, while RAID 5 uses a parity stripe dispersed across the set. RAID 10 uses half of the spindles for redundant data; RAID 5 uses 1/n spindles where n is the number of spindles in the set. Both RAID 10 and RAID 5 are optimized for read operations; RAID 10 has a write penalty of 2, while RAID 5 has a write penalty of 4.
These read optimized arrays were fine in the days of Exchange 5.5, when reads outnumbered write 4:1 or more. By the time we made it to Exchange 2003, the application read/write ratio had dropped closer to 2:1, and RAID 5 was the first to fall by the wayside. The write penalty was simply too high. That, combined with a high IO rate and small IO size, resulted in RAID 10 designs short stroking large numbers of spindles. Thus began the era of “expensive SAN solutions” for Exchange.
Microsoft made major efforts to address storage in Exchange 2007. The move to 64 bit architecture, support for host cache up to 32GB, and an increase in the database page size from 4K to 8K resulted in a 70% reduction in IO. In fact, most of that IO reduction was a reduction of read IO. In a typical Exchange 2007 design, reads account for 53% of all IO. A typical design like table 1, results in space and IO requirements like table 2.
<font size="3">Tier-1 User Mailbox Configuration</font>
Total Number of Tier-1 User Mailboxes
Projected Malibox Number Growth
Send/Receive Capability / Mailbox /Day
Average Message Size (KB)
Tier-1 User Mailbox Size Limit (MB)
Predict IOPS Value?
IOPS Multiplication Factor
Tier-1 User IOPS / mailbox
Tier-1 Database Read:Write Ratio
Outlook Mode (Majority of Clients)
<font size="3">Disk Space & Performance Requirements</font>
<font size="3">Single Replica</font>
Database Space Required / Replica
Log Space Required / Replica
Database LUN Disk Space Required / Replica
Log LUN Disk Space Required / Replica
Restore LUN Size / Node (and / SCR Targets)
Total Database Required IOPS / Replica
Total Log Required IOPS / Replica
Database Read I/O Percentage
The IO is low enough that short stroking high numbers of spindles to meet the performance requirement is no longer needed. Microsoft IT produced a white paper detailing how they had deployed Exchange 2007 on DAS, and the DAS vs. SAN debate was born.
NetApp has a fundamentally different I/O pipeline architecture from your Traditional Storage Array. Instead of being optimized for reads after a write-in-place, NetApp storage uses dynamic optimization to balance data placement and cache resources for read and write requests in real-time. How does this impact the DAS vs. SAN debate? I decided to find out.
Based on the user mailbox configuration in Table 1, I chose the FAS2020 as the comparison platform. Configured with 12 300GB 15K SAS spindles, the FAS2020 provides 2000 IOPS and 2300GB of usable space. For this comparison I used iSCSI to connect from a single host to the FAS2020 (treat the FAS2020 as DAS), and did not license SnapRestore in order to keep it apples to apples. I compared this to both the HP MSA60 configured with 12 300GB SAS spindles and the Dell PV MD1000 configured with 12 300GB SAS spindles. My first surprise was the cost of the three systems. They were comparable; within $1,000 of each other. Costs do vary depending on where you buy the systems and what discounts are applied, so I do encourage you to do your own pricing exercise.
For both the HP and Dell configuration I chose RAID 10, which provides the highest performance. I was again surprised that even with RAID 10, given the workload consisting of 53% reads, both DAS configurations were only barely able to meet the performance requirement 1605 IOPS. Configured RAID 10, both DAS configurations could not meet the space requirement of 2100GB. In fact, even allowing the dedicated restore LUN to be RAID 0, the DAS configurations required 24 spindles as shown in table 3.
<font size="3">Restore Lun</font>
You see, it’s not only the performance and space requirements that determine spindle count in a DAS environment; layout and the size of the spindles compared to the data that resides on them plays a large role. If I were to deploy this Exchange 2007 design on DAS, I would need to purchase twice as many spindles as I would had I deployed on a FAS2020. The added spindles for the DAS configurations not only double the CAPEX of DAS, but they increase OPEX as well.
How does all of this apply to Exchange 2010? In Exchange 2010, Microsoft introduces the Database Availability Group. The idea is not to use RAID at all. Instead, availability is moved to the application level and out of storage altogether. Given the results of my sizing exercise, this is targeted not at SAN, but at RAID. In environments where writes dominate the IO workload, traditional RAID is just not providing any value. As one last thought, I decided to see how my FAS2020 configuration stacked up against JBOD with no RAID at all. I was once again surprised! The JBOD configuration required 12 spindles; the same number as used on the FAS2020 configuration. Given the comparable pricing, the NetApp solution still provides advantages over JBOD.