VMworld: vSphere Stretched Clusters, DR & Mobility (BCO2479)

– lots of confusion between disaster recovery (DR) and disaster avoidance (DA)

Part 1: Disaster Avoidance vs. Disaster Recovery

Disaster Avoidance
– you know a host will go down
– Host: vMotion
– Site: vMotion

Disaster Recovery
– unplanned host outage
– Host: VMware HA
– Site: SRM

*** More content forthcoming (to fill in the blanks) ***

VMworld: vSphere PowerCLI Best Practice (VSP1883)

Speakers: Luc Dekens (Eurocontrol), Alan Renouf (VMware)

Luc – blog: http://lucd.info
Alan – blog: http://www.virtu-al.net

BP1: Get-View returns full copy of the server-side object
– otherwise the get- commands only return a subset of the properties

BP2: Finding Properties
– what you see is not what is there
– – can edit the .ps1xml files to change returned properties
– use Get-Member to return all of the properties
– ex. Get-VMHost | Get-Member
– complex (nested) objects
– – Select-Object -ExpandProperties
– – Get-Member in a loop
– – Format-Custom Depth

BP3: Make Your Own Properties
– use the New-VIProperty cmdlet
– adds a CodeProperty to the object
– valid until Remove-VIProperty or end of session

VMworld: VMware View Performance & Best Practices (EUC3163)

PCoIP server offload card for improved graphical performance & consolidation

View 5
– WAN bandwidth optimizations: ~75% reduction
– CPU optimizations: idle VMs, algorithms, libraries
– Better session resilience: session recovery during loss of network of up to 30 seconds
– PCoIP performance counters
– Provisioning: faster and more parallelism

HP 3PAR: AO Update…Sorta

I wish there was an awesome update that I’ve just been too preoccupied to post, but it’s more of a “well. . . .” After talking with HP/3PAR folks a couple months back and re-architecting things again, our setup is running pretty well in a tiered config, but the caveats in the prior post remain. Furthermore, there are a few stipulations that I think HP/3PAR should provide customers or that customers should consider themselves before buying into the tiered concept.

Critical mass of each media type: Think of it like failover capacity (in my case, vSphere clusters). If I have only two or three hosts in my cluster, I have to leave at least 33% capacity free on each to handle the loss of one host. But if I have five hosts, or even ten hosts, I only have to leave 20% (or for ten hosts, 10%) free to account for a host loss.Tiered media works the same way, though it feels uber wasted, unless you have a ton of stale/archive data. Our config only included 24 near-line SATA disks (and our tiered upgrade to our existing array only had 16 disks). While that adds 45TB+ to capacity, realistically, those disks can only handle between 1,000 and 2,000 IOPS. Tiering (AO) considers these things, but seems a little under qualified in considering virtual environments. Random seeks are the enemies of SATA, but when AO throws tiny chunks of hundreds of VMs on only two dozen SATA disks (then subtract RAID/parity), it can get bad fast. I’ve found this to especially be the case with OS files. Windows leaves quite a few alone after boot…so AO moves them down. Now run some maintenance reboot those boxes–ouch!

VMworld 2012: Impressions

It’s day two or three, depending on what you consider Sunday, here at VMworld 2012, and the feel is very different from prior years. Technicalities aside, just the venue and city set a tone that is distinct from the past. San Francisco bears a cultural appeal all of its own, and the Moscone is literally light–daylight isn’t scarce like our IT budgets, or like it is in Vegas.

In many ways, it feels like a transition year, epitomized by the transfer of reins from Paul Maritz to new CEO Pat Gelsinger. The kickoff music was a drum line embedded in the tops of the four-foot letters and numbers of “VMWORLD 2012″ on the stage, accompanied by hip-hop artists. Not that this is foreign to VMworld, but it seemed like changing energy (to me).

Technically, the focus has been on vCloud and the bigger, “software-defined” virtual datacenter. I enjoyed the direction and forward looking nature, but the application was definitely larger organization and more abstract, especially to the little guys in the community. Single site deployments and small shops will stretch to find more than removed interest in the focal points. Perhaps, though, they will find use in larger organizations in their IT careers.

The fact that this is a “dot” year (5.1) and not a major version marking event also affects things. In 2008, vSphere 4.0 made its debut (if not RTM) which was major news. Then in 2011, 5.0 showcased, another big deal. Sessions thus far have been half recap of 5.0 with minor updates that come with 5.1. Not disappointing, but not as wildly exciting as it could be.

I’ve said a lot that could be construed as disappointment, but the reality is that I’ve already been reminded of features and updates that I need to apply in my environments. Some task items are new; others are quite old but require a fresh look at things that six years of virtualization have obscured.

The second keynote is about to begin, so I’ll sign off for now. In the words of VMworld 2008, remember, “Virtually Anything Is Possible”.

EMC Avamar – Epic Fail

Terrible initial implementation. High-downtime expansion. Unreliable backups. Absentee support. That’s EMC Avamar.

On the tiny upside, deduplication works great…when backups work.

In September 2011, our tragedy began. We’re a 99% VMware-virtualized shop and bought into EMC Avamar on the promise that its VMware readiness and design orientation would make for low-maintenance, high-reliability backups. In our minds, this was a sort of near-warm redundancy with backup sets that could restore mission critical systems to another site in <6 hours. Sales even pitched that we could take backups every four to six hours and thus reduce our RPO. Not to be.

Before continuing, I should qualify all that gloom and woe by saying that we have had a few stretches of uneventful reliability, but that’s only when we avoided changing everything. And one of those supposed times, a bug in the core functionality rendered critical backups unusable. But I digress…

EMC XtremIO Gen2 and VMware

Synopis: My organization recently received and deployed one X-Brick of EMC’s initial GA release of the XtremIO Gen2 400GB storage array (raw 10TB flash; usable physical 7.47TB). Since this is a virgin product, virtually no community support or feedback exists, so this is a shout out for other org/user experience in the field.

Breakdown: We are a fully virtualized environment running on VMware ESXi 5.5 on modern Dell hardware and Cisco Nexus switches (converged to the host; fiber to the storage), and originally sourced on 3PAR storage. After initial deployment of the XtremIO array in early December (2013), we began our migration of VMs, beginning with lesser-priority, yet still production guests.

Within 24 hours, we encountered our first issues when one VM became unresponsive and upon a soft reboot (Windows 2012 guest), failed to boot–it hung at the Windows logo. Without going into too much detail, we hit an All Paths Down situation to the XtremIO array and later after rebooting the host, we still could not boot that initial guest. Only when we migrated (Storage vMotion) back to our 3PAR array could we successfully boot the VM.

EMC XtremIO and VMware EFI

After a couple weeks of troubleshooting by EMC/XtremIO and VMware engineers, the issue was determined to be an issue with EFI boot handing off a 7MB block to the XtremIO array, which filled the queue, and which would never clear as it was waiting for more data to be able to complete communication (i.e. deadlock). This seems to only happen with EFI firmware VMs (tested with Windows 2012 and Windows 2012 R2) and the issue is on the XtremIO end.

The good news is that the problem can be mitigated by adjusting the Disk.DiskMaxIOSize setting on each ESXi host from the default 32MB (32768) to 4MB (4096). You can find this in vCenter > Host > Configuration > Advanced Settings (bottom one) > Disk > Disk.DiskMaxIOSize. The XtremIO team is working on a permanent fix in the meantime, and the workaround can be implemented hot with no impact to active operations (potentially minor host CPU load increase as ESXi breaks down >4MB I/O into 4MB chunks).

VMware PXE Limitations


Update 12/12/2014: While the information below is valid that VMware does not appear to support RamDiskTFTPBlockSize above 1432 due to no support to deal with IP Fragmentation. However, it has been found that it is much better to adjust RamDiskTFTPWindowSize instead of RamDiskTFTPBlockSize to speed up TFTP (reduces amount of ACKs required and does not cause IP Fragmentation). VMware and other vendors all appear to handle this scenario perfectly. Modifying the WindowSize is easy to do from most PXE providers, including WDS (modify BCD). There is an issue changing this setting in Configuration Manager due to dynamic BCD that ConfigMgr manages itself with no native way to modify the setting (see the comments section for how to workaround this issue – thanks Patrick!). More info throughout the comments section.

Update 3/18/2016: Microsoft has released an update (currently in preview) that will now allow modifying both the WindowSize and the BlockSize without a hacked DLL. Info here: https://technet.microsoft.com/en-us/library/mt595861.aspx#BKMK_RC1603. This update is in 1603 Technical Preview and should also be apart of the next Current Branch release of Configuration Manager.

Update 7/22/2016: Microsoft has released a new Current Branch of SCCM, version 1606. I have confirmed that Current Branch version 1606 contains these modifications that were in version 1603 Preview. I have also tested the WindowSize/BlockSize registry keys and they are working great.
Name: RamDiskTFTPWindowSize
Value: 6
Note: After making changes, make sure to restart the Windows Deployment Service.

Task Sequence Run Command Line Step Fails after VMware Tools Installed

I recently ran into an issue when trying to use the Run Command Line step during a SCCM 2012 R2 SP1 task sequence on a VMware Virtual Machine.

This step worked fine on both physical hardware and on a Hyper-V Virtual Machine. However, on a VMware Virtual Machine, the task sequence returned the following error: Incorrect Function (0x00000001). After some additional troubleshooting I found that VMware tools appeared to be the culprit here. I moved VMware Tools install to the very last step in my Task Sequence and then everything worked correctly.

Some more information: I found that if I changed commands that I was trying to run, it wouldn’t always fail. It appeared the issue only consistently occurred when I used the Run As option in the Run Command Line step. For example, I was using sqlcmd.exe to make changes to a SQL Server, which SYSTEM does not have access to do, so Run As was required. The only solution I could find was to just make sure that VMware Tools isn’t installed until all other steps are finished in the Task Sequence.