I was having a chat with an IT Manager of the Windows server team for a financial multinational in the Canary Wharf area. We had dinner shortly before I went on holiday. We had been speaking about the BAU issues he’s going through, what his concerns are and his business drivers. The CIOs are often thinking strategy, thinking about costs, efficiency and delivery, the on the ground management teams often have different opinions and drivers.
Janet, (we’ll call her), was telling me about an unexpected consequence of a recent organizational shift with regards to next business day warranty and with that, I’ll leave her to do speak below, please note as usual, I’ve removed anything that might reference her organization or identify her, we get much better feedback that way, so over to Janet.
We used to have one single support contract that covered all servers regardless of age and platform, so if a system board in our x86 rackmount died, the onsite service provider engineers would get a call from us, they would look into it and liase with the vendor to obtain or repair the faulty components. However, the senior IT management team have been looking at the way we provision and manage the server environment and re-negotiated the contracts. Now what happens is that we buy new servers with a next day business warranty (as per normal) and for the crucial existing systems that we have, the unix stuff, some of the high end tier one wintel servers we have on a support warranty. This poses a number of operational challenges which we have started to experience:
- Internalizing the hardware support – not everyone has experience with server hardware, asking a guy who has previously looked after windows, or unix, to change a system board or a backplane on the servers can be quite a challenge/confidence issues.
- Different part numbers within one vendor platform – power supplies, backplanes, or common components which can make small tasks more complex, 2u rack server that means power supply 18724j not the 187913B, different system boards between the same server model type, the 2.8GHz against the 2.4GHz models.
- Different standards and ways of doing simple similar tasks, the way vendors treat hardware support process, logging a call, what information is mandatory, what isn’t, that firmware and drivers need to be at a specific level, has caused in fact some of the engineers saying that one vendors servers are ‘rubbish, and high maintenance’ simply because of the support process
- Next business day and warranty complication issues – we get mixed signals from the vendors when trying to log a call for a system out of business hours, which is on a next business day warranty, it seems to depend on the guy taking the call and the vendor we’re dealing with. I’ve got the example I gave you about this – see the end of this post.
- Concerns about the platform becoming the commodity with the old school training and ways of doing business – the just rebuild it scenario can be deskilling, in the respect that as the engineers and the end users get more used to rip and replace, when it’s three in the morning and the system dies, we become more reliant on external vendors, where does this leave us and IT going forward?
The example Janet told me was from a week ago, when they called out a vendor for a system with a tier2 server which had a hardware fault. The system was not on the corporate support contract and they therefore had to log a formal call ‘like an end user’ as Janet put it, the conversation was like this, I’ve removed the references to the vendor and the system to remove any cause for concern or concepts being taken out of context: (Janet’s engineer Neil is in italics)
“Welcome to vendor Technical Support, my name is Sammy, how can I help you today?”
“Hi, I’ve got a fault with our server, it’s a model type and number, with serial number 123456xjustmadeup. We rebooted it as part of a systems upgrade and the server powers up to the POST screen, but shows a blank screen. The lights are on, the disks are spinning and lit up but nothing is coming on screen.”
“Ok sir, that’s not a problem, we’ll get that sorted for you, let me get your details here. It will take me a few minutes, can I have your name and telephone number/email in the meantime?”
Neil supplies his contact details.
“Right sir, that server is on a next business day warranty cover. That means we wont be able to progress this call any further. If you call back Monday morning at 9am, we can get a call raised and sent to our first line escalation team. Alternatively I see you have a support contract, we might be able to request this server be added to your support contract via your account manager.”
“Right, well the server is broken, I think it’s either the system board, or one of the cpus, I need to get the server back online as soon as possible, since it’s next business day, it’s Sunday at 4pm, can we log a call with the understanding that nothing will get done with the call until Monday?”
“I’m sorry sir, that’s not how the process works, we can leave your details on the system, but you will need to phone back and log a formal call.”
Result, the multinational waited. The engineer did not have the approval or authorization to add the server to the support call, the engineer escalated to the management team, who told the business account manager, that they weren’t able to log a call as it’s next business day support, and Monday morning a formal call was logged for an engineer to attend on Tuesday afternoon to investigate the failure. The server team, label that vendors platform as ‘high maintenance and earmark it for replacement’.