This interview happened earlier on in the week in a Starbucks with my promised Frappuccino Blended Crème thingy. Chris works for a large bank based in Canary Wharf, he get’s quoted every now and again on the blog, the questions were in no particular order and happened over our coffee. I’d like to thank him and hope it’s an interesting read, I’ve paraphrased his answers and removed anything that might refer to his employer.

What are you working on?

  • A number of things really. We’ve got the support stuff to get on with, the incidents/changes/requests to work through and they keep coming in.
  • At the same time, as a result of some of the mergers/acquisitions, we’ve got numerous high visibility projects, data center moves, hardware refresh and rebuilds, to bring servers in line with our own internal standards, on our supported platforms using our bank build.

What’s today’s challenge?

  • We’ve got 96 blades to build, which in itself isn’t difficult, though they all need different layered products (SQL/Control M/IIS) installed, managing that can get quite an overhead, as there are separate processes to install each, so we’ve had to push back with a glorified spreadsheet, saying blade name, operating system (linux/Windows/VMware) and layered products to make sure we aren’t wasting time rebuilding them over and over again.
  • At the same time, the incident/request queue has been growing for some time, calls just getting put on hold due to their volume and the fact that we let a few guys go over the last month.
  • We’ve also been asked to perform another inventory of the server estate which again isn’t world ending, but it is when it gets complicated, last time they wanted server name, memory, operating system, model type, number of processor sockets, storage size and type, as much as people think this is automated, it really isn’t besides they forget there are so many separate networks, operating environments and security wont let us have one machine that goes to all networks, so we end up with many management servers which we have to log on to, to run a script and then manually put it all together.

What technologies are being deployed in priority?

  • We’re adopting virtualization big time, at the moment it’s VMware, but we’re going to be trying hyper-v, and the unix guys keep whispering quietly about Xen – they’re big fans, and it might work very well coupled together as a platform for our Citrix applications, to have a virtualized Xen infrastructure running all our Citrix applications.
  • We’re looking at upgrading to Windows 2008, but that looks more part of the five year plan, we still need to upgrade the remaining boxes to Windows 2003 first and the build team haven’t yet pressed the go-live button for the build for Windows 2008.
  • We’re looking at upgrading network speeds for a range of applications, this will require some servers to get a new network card (on those older boxes), though we’re hoping to do the window salesman thing and say, well, whilst we’re moving your servers network, how about we swing in a new box instead? With the pre-provisioning, I’m hoping this will be the case.

What isn’t working?

  • Inventory, we all have different inventories, there’s the mandated gold source, but the debates that occur as a result of just publishing a list of servers we’re planning to work on gets tiresome. The challenge being that each business line ‘owns’ their servers, so each has their own inventory, their own list, so I might send a list out saying, we’re updating 200 servers firmware, and get emails back, “actually, that’s not Front Office Risk, it’s Credit”, but because there isn’t a specified data owner, the information isn’t always up to date, and people are protective of the ‘gold data’, it gets very tiresome. This is becoming ever more important as we move servers around, re-brand them for the applications/businesses we bring into our business and as we upgrade or get asked to produce reports.
  • Support in the respect, there are still some applications that through a lack of investment, poor coding or just un-interest are causing us a support overhead, the prime example being the web application for the last six months that needs restarted twice a day because there is a memory leak, it’s not a big thing in itself, but it’s the method, a priority two call is logged which gets everyone excited thinking the world has ended, at which point we have to remind people it’s the code leak, and ask at what point is this getting fixed, to get the standard answer, oh in version 2.18, and that’s out when? Some point in the second quarter of 2010.
  • Retrospective tidying – there needs to be a retrospective tidy up of the data center from a server stand point, we keep coming across servers which we email application teams about and get the answer “that was decommissioned in March 2008″, that’s fine, but we need to remove them from the data center – the challenge, there’s a debate about whether it’s a production or project cost, in the meantime, they sit there on, being ’supported’, (we have to patch and update the anti virus because they’re on the network), and at the same time project managers are asking us to rack servers when we haven’t got the power or space.
  • Too many platforms – we’ve now got two or three vendors blade and rack servers, which you can say isn’t an issue, but it adds that extra level of debate that has to be undertaken in due diligence aroud the systems management. With one or two platforms it’s easy to say, right over the next six months we’re upgrading all ILO to 1.92 or, we’re applying array controller and system firmware on all servers, but as we get to the 6 different HP model types, the 6 types of blades we have plus the other vendor’s equipment, the thought starts to look like an exercise per vendor let alone as a server estate.
  • The ownership aspect is creating unnecessary issues in delivery, who patches what? Who ‘owns’ what in terms of support and service delivery, the fact that I have admin rights and those that own their servers have admin rights makes it more complex. I want to have rights to what I support and nothing else, end the debate. We got into an email argument last week because a server went down and we’d been performing an inventory on the estate, had our wmi script caused it, after much investigation it was discovered that there was a faulty memory chip in the system board causing an ASR. But then if we do this, how does the CIO ask for an inventory, a status of what’s patched, what isn’t, wont he or she get like 9 reports which they then have to put into one excel spreadsheet for presentation and analysis?

Any concerns?

  • Obviously there’s the job element, with headcount freezes, the market kind of dying in recruitment and not knowing where we are that affects your job satisfaction and questions the whole how much effort should I put in, and more to the point if we’re going through organizational change, is there any point? Everything’s going to change so what’s the point of upgrading firmware on servers if they’ll be replaced in the next quarter – but the challenge – no one can tell us what is and is not being replaced, the answers keep changing.
  • Career progression in the respect that I love Wintel support (particularly the web stuff), but I don’t know going forward how much longer I want to be doing it for, where do I go from here? What’s the next generation IT role to be moving into?
  • 24/7 support element – we’re doing on call more regularly, what was deemed as occasional ‘overnight’ support isn’t. It’s 24/7 availability, it’s being on call, getting called sometimes throughout the night to support those systems that are used globally, I just question at what point do we change from hiding this to saying, look it’s 24/7 service we need to provide and either changing the team and getting someone into do overnight, giving the ops team the basic rights to do the simple stuff, more automation or asking our friends in other sites to do the support – any reason New York can’t do that iisreset for me? But that opens up a series of issues in itself.
  • Appication complexity – we’re doing two things; making the infrastructure and the application more complex in requesting more functionality.  We’re implementing a more complex operating environment, first of all we might have an environment running on VMware/Hyper-V or Xen, then we’ve got the operating system, the layered applications and components to deliver that service, creating an infrastructure support overhead for complex support issues.
    • One application for a trader might actually comprise of three applications and 21 servers, so when a call get’s logged to our team saying the application is down, is in sufficient, we need what element, what component is down, on what system because it might take us hours to go through the logs and individual components to identify an issue. The days of doing a server reboot to fix a problem are nearing over. There are so many feeds, related components and links that mean this might need stopped, then that part over there, then reboot, then restart this over there, that component on server 9 and oh yes an iisreset here, you just can’t automate that.

Related posts:

  1. Interviewing a server build engineer For the second time, I had a conversation with a...
  2. The Bladewatch CMDB tool announced! IT rocks and we’ll be announcing more soon! Over the last few months and since I’ve began writing...
  3. Challenges in the virtual world Techtarget.com Despite the many benefits of server virtualization, the technology...
  4. HP Integration Services to help with technology based integration challenges HP PALO ALTO, Calif., Sept. 21, 2009 – HP today...
  5. Virtual desktop – the support questions/issues to consider Virtual desktops remain a topic of interest for me in...

Related posts brought to you by Yet Another Related Posts Plugin.

Bookmark and Share

Leave a Reply