Nathanael Iversen, Director of Technical Marketing, Xangati
As we’ve been pulled in to help with dozens and dozens of VDI deployments at various stages of success, stall, or even failure – we’ve again and again found that storage performance has an outsized impact on VDI performance. In fact, the majority of deployments end up suffering from storage performance fluctuation.
For example, if you were to take a particular group of VDI desktops that live on posts or are connected to storage, they all make their connections as users in deployment in the environment, and things are working along just fine. In the course of just a couple of minutes, as users launch applications and connect to different resources, all of those interactions change. The bandwidth changes, the storage changes and these changes are all happening across the network. Over the course of a 5 min period, what’s happened on the storage can vary widely. Where organizations get into trouble is that many traditional storage tools average their metrics over a five minute interval. And, so even though there might have only been 30 seconds or a minute and a half of contention, on a five minute graph, it looks like a very smooth, flat line. Whereas during the actual contention event, it might have been a 3 or 4 or 500 milliseconds, in some cases we’ve seen fluctuations as high as 2 or 3 seconds, where storage is momentarily bogged down servicing a number of simultaneous requests. And, while that’s invisible to a traditional monitoring tool, ultimately IT is left without the tools to spot the issue and resolve the problem.
What we’ve found is that dynamic interaction tracking is essential. If you don’t track things dynamically, there is no context to understand why resources are being consumed the way they are. The storage team doesn’t typically have any information available to them about how the hypervisor is perceiving their storage. So, it’s necessary to have something that can show them what the perception of the VDI desktop related to storage. And, these quick searches have to be visible because they often come and go in 10 to 15 to 30 seconds. And, when they go unnoticed, it’s very hard to understand whether observed storage latencies – are they tied to IOP surges where it’s legitimate demand on the hardware – or are there back issues that may be taking place. We’ve seen things ranging all the way from very heavy usage causing storage latency to things like spanning tree problems on a network causing storage latency. So, being able to understand and quantify those differences quickly is essential.
For example, a Xangati customer, CBRE, were busy deploying about 2500 production VDI users at the point when they brought Xangati online and they were experiencing intermittent performance issues of the kind where they seemed to be jammed up and there was no clear pattern that could be observed. It was very hard to pin down. They were up against a wall of user resistance pushing their deployment further. Xangati came in and literally within hours of being deployed found several storage latency issues that were plaguing part of their virtual infrastructure.
The interesting comment from the designer and lead architect of that VDI deployment was that everyone should be doing VDI and they should be using Xangati.
Check back tomorrow for Tip #3 where we discuss analyzing the networked activity of your VDI sessions.
Comments