EMC XtremIO Gen2 and VMware

Synopis: My organization recently received and deployed one X-Brick of EMC’s initial GA release of the XtremIO Gen2 400GB storage array (raw 10TB flash; usable physical 7.47TB). Since this is a virgin product, virtually no community support or feedback exists, so this is a shout out for other org/user experience in the field.

Breakdown: We are a fully virtualized environment running on VMware ESXi 5.5 on modern Dell hardware and Cisco Nexus switches (converged to the host; fiber to the storage), and originally sourced on 3PAR storage. After initial deployment of the XtremIO array in early December (2013), we began our migration of VMs, beginning with lesser-priority, yet still production guests.

Within 24 hours, we encountered our first issues when one VM became unresponsive and upon a soft reboot (Windows 2012 guest), failed to boot–it hung at the Windows logo. Without going into too much detail, we hit an All Paths Down situation to the XtremIO array and later after rebooting the host, we still could not boot that initial guest. Only when we migrated (Storage vMotion) back to our 3PAR array could we successfully boot the VM.

Over the past two weeks, we’ve seen unfortunately low levels of deduplication (1.4-1.6 to 1), which we expected to be on par with Pure Storage’s offering. Apparently, Pure’s roughly 4.0-5.0 to 1 reduction ratio is due, at least in part, to their use of compression on their arrays. We were unaware of this feature difference until mid-implementation (when the ratio disparity became clear).

Additionally, the case of VMs going dark in part and then whole after reboot (which is normally the solution to most Windows problems 🙂 has repeated on three VMs to date (one mere minutes ago), resolved only by migrating back to 3PAR, even if only temporarily. The guests in question have all been running Windows Server 2012 R2, using virtual hardware versions 9 and 10, and powered atop ESXi 5.5.0 build 1331820. HBAs are QLogic 8262 CNAs running 5.5-compatible firmware/drivers.

Specific to the XtremIO array, we’ve also experienced a non-standard amount of “small block IO” alerts, even though we have isolated vSphere HA heartbeats to two volumes/datastores. Furthermore, we see routine albeit momentary latency spikes into the 10s of milliseconds, which are as yet unexplainable, but could be hypothesized to correlate with the small block I/O, which XtremIO forthrightly does not handle well due to its 4KB fixed block deduplication. In contract, Pure Storage doesn’t handle large block I/O as well due to its variable block and process of deduplication.

Aside from these issues (which are significant), the XtremIO array has generally performed well during normal operations with a full load (we migrated as much of our production environment as it would hold–we ran shy of a full migration due to insufficient data reduction ratios).

Without further adieu, if you have personal field experience with an XtremIO array in a virtual environment (preferably server virtualization, but VDI is fine, too), please post your stories here. Perhaps a cause to these woes will arise from the collective exchange. Thank you.

Update (one hour later, 12/26/13): we seem to have determined that XtremIO has an issue with Windows Server 2012 R2 guests and/or EFI firmware boot. Our environment has both 2012 and 2012 R2 guests, but our 2012 guests exclusively use BIOS firmware and our 2012 R2 guests exclusively use EFI firmware. We have not yet been able to test EFI on 2012 (non-R2) or BIOS on 2012 R2. All guests (2008 R2 and later) use LSI Logic SAS controllers.

Posted in SAN

Leave a Reply

Your email address will not be published.