February 2007 25

Easy call operations, who?

There is a team in nearly every large organization which is on site 24hrs, it’s their task to monitor the infrastructure and applications. To with the aid of monitoring tools like OpenView, TNG etc, to monitor the infrastructure, the networks, the telecoms, servers, applications and batches, when an alert is received, they log a helpdesk call and then call out the relevant application/infrastructure team.

They remain key to the application support flow, they walk the delicate walk between calling out and not calling out an alert, calling it out in error might result in an emotional conversation with an engineer, missing an alert might mean the batch is delayed, the reports are late and everyone is upset. Good appropriate monitoring is therefore key, even implementing procedures, what if scenarios, or support work-flows will aid this process.

web.mac.com/martinmacleod/iWeb/Site/Blog/applicationflow.pdf - shows the basic application support flow. Let’s keep it simple, it’s for a large investment bank with a web site alert.

User is trying to log into https://tradingshares.largebank.com and the site hangs, he gets upset, notes the if there is a problem call +44 123 45678910, and calls it. The call goes to the web support team, who try it, and note the system is down, they contact operations. Operations carry out their steps, and call the application support/development team (sometimes they are both), the support analyst logs on, carries out a check and discovers there is a problem with com+, and requests the web infrastructure support team are called. They log on, restart com+, operations, notify everyone, the web support team so they can notify the end user, the application team, and close the incident.

They key point being, that operations as well as monitoring the infrastructure, become indirectly service delivery managers, they take ownership of the incident calling out the teams that they think can assist with the incident. Keeping operations therefore involved in what’s going on in the infrastructure, what new applications are going live, who’s supporting them, and what teams are involved is not only good service delivery, it’s good business. The operators might not be excellent technically, but their ability to communicate with everyone involved and handle and an incident to aid new joiners with when to escalate is an invaluable for the engineer.

So next time you’re called at 3am about a server/application going down, keep in mind that some time you might need their assistance, complaining to an operator because your application or infrastructure has an issue isn’t aiding the process, it’s poor communication, poor relationship building an ultimately unhelpful.

Yes it’s 3am, but it’s the cost of doing business, want to be in production support? Doing on-call is part of the package, just like the free company computer to log in remotely.




No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Bookmark and Share

Leave a Reply