I was speaking with one of my CIO friends who works for a multinational in the city and asking him about a problem he has been having. I’ve removed anything specific to his business, but his problems remain the same.

“I keep telling the guys and the VMware team manager to up the virtualization ratio, we need to be virtualizing more systems, and I don’t understand why there is a perception issue for a few teams, what the challenges are., why we aren’t getting there more quickly, I keep getting told of dependencies and issues, of we’ll get there but..

I’ve got a direct target from the head of business to have virtualized 70% of the server estate, comprising of production and development overall, they just want to be seeing progress, so I have been working with a small team I created to simply do this. The idea was that they would focus on virtualization activities, they wouldn’t get caught up in the day to day stuff, and could therefore start arranging and completing p2vs, that way we could prioritize old systems and retrospectively and at the same time identify current and future projects to see what could be operating in a virtual machine.”

Anyway, we’ve managed 40% and we seem to be slowing down the pace, and it’s only when I asked the manager into my office to explain what is going on that she identified our bottlenecks.

I was disappointed on several levels. The team had:

  • Started making decisions on what was a supported disk configuration or was deemed as best practice within their matrix
  • Operated on a basis of the age of the hardware rather than the workload and the application
  • Neglected to effectively communicate the business drivers, when I asked to see their intranet pages it was full of sales pitches and technical stuff, nothing about why our organization was deploying virtualization what we were seeking to achieve.
  • Had broken the virtualization estate on a business line basis, which was fine idea in design, but I now had a problem that the trading ESX servers were ‘full’ but IT had capacity, however IT didn’t want to put trading on IT as that meant the systems had to be more reliable/avaialble reducing testing for upgrades or new configurations. It also meant that we had idle capacity for different business lines which could not be shared for whoever needed it.

We had to turn everything around. I also asked them how we were answering the billing mechanism, there was no response, guys come on, how are we answering people’s questions about depreciation and support costs? How are we getting people to pay for the environmnet, £100 internal chargeback is better than nothing whilst we deploy the technology and brainstorm billing methods. Because we didn’t have a billing method the guys had got stuck around the business line basis, because it was easier, if trading needed more capacity we asked them to buy more ESX capacity and licenses, not surprisingly their response was, superb idea, “come back in 2012 just after the Olympics, we might just be interested then, in the meantime, can I have 16 new physical 1u cheapy servers, and can you rebuild that DL380 G4? It’s trading so you can’t say no”.

An interesting issue and who’s at fault? The CIO let me speak to his virtualization team manager over coffee with his knowledge. I got labelled as a server consultant, a vague enough title for her to be able to rant her problems and challenges so we could find out what was going on, I reported the results to them both and have summarized them below:

The Management

  • No clear definitions of what is in scope – what the priorities are
  • Lack of billing method to establish how we pay for and fund the environment software licenses, storage and hardware
  • Lack of day to day operational interest – the problem of logging a call and thinking it will fix itself
  • Procedural and process based problems – virtual machines were built in the same olden days way as a physical box, reducing the benefit of the platform, and the CTI for issues relating to virtual machines was not defined – if a virtual machine went of the network was that a virtualization team issue or a Windows/Unix one, in the meantime the call got passed around the universe and the issue not resolved – frustrating the end user community.

The Virtualization team

  • Worrying for everyone – when asked what the policy was if a user asked for 400GB storage, they replied that wasn’t supported, the user could have 40GB and request more when they needed it. Otherwise they would quickly use up the storage that the organization had.
    • – That’s not their issue. Their issue is to identify barriers to achieving the 75% which is storage so that the management team can either set policies or buy more storage
    • - In doing so people with larger storage requirements opted for a physical server creating workload later, and also damaging the virtualization concept – they’ll only give you 40GB don’t request a virtual machine
  • Lack of branding – they had an intranet page it had we use this VMware version, we use these kind of rack servers and these networks are supported
    • - No business benefits such as agility or ease of upgrade/changes and rollback
    • - No best practice as to how to request a virtual machine and what specification of machine is needed
    • - No success stories – mini succeses kind of things, we’ve reduced IT from 300 servers to 11 physical ESX servers and 20 physical servers running a mixture of Windows and Linux for the backups and the databases
  • Waiting for dependencies – the CIO asked how many of the 37 networks were supported, 11 he was told, why? Because they were waiting for networks to complete their vlan tagging and specific network upgrades rather than request that network, this ruled out complete environments and applications comprising of development, staging and production systems which might have been or were happy to be virtualized but could not because the network was not ready yet.
  • Clustering and high availability – too much focus on high availability not in getting the job done. The team had actively held back on virtualizing systems that had networks or specific requirements and could not be put on the high availability virtual environment. The point – they hadn’t asked the user base if they wanted or needed this functionality, not every system is tier1, not every user would necessarily need or want this function, as long as they know if broken it will be available in 2-4 hours that meets 75% or more of their end user base needs.
  • Inappropriate candidates selected for virtualization and missing out on others, selecting multi socket, multi core batch processing low latency systems and choosing them saying it should work without clearly defining the deliverables or the rollback procedure resulting in delays and virtualization rubbish message being distributed around the end user development estate. At the same time, infrastructure servers were often left or not ruled in scope for mixed reasons, and systems left out of scope were not highlighted for a hardware refresh, re-design or requirements identification. The best example being their tape backup system for a separate external environment comprising of some legacy 7u rack servers, it works and it will be decommissioned soon, rather than identify the savings achieveable through refreshing them with a 1u server and tape backup solution. This further reduced savings or possibilities – was the tape backups system even needed, or was it an inherited thing that was there and nobody knew why they had it.
  • Everything and I mean everything was stored, published and updated in the virtualization world. The CIO got no view of what was happening, he didn’t know that the trading team had recently ordered a new server simply to store 500GB of data, because a virtual machine could not be supplied with that much space – his response, why isn’t it going on the NAS? Who scoped this much storage? What’s the archiving in place, and how important is it?

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Bookmark and Share

One Comment

  1. Brian says:

    Great post, came across many of these in the past.

Leave a Reply