NBR

A failed oil pressure sensor on a backup generator could be the reason Air New Zealand’s computer system crashed on Sunday for six hours, IBM confirmed this morning.

The IT failure caused disruptions to more than 10,000 Air New Zealand customers, who were delayed by two hours on average. The computer failure affected the company’s booking and check in systems. Passengers had to be manually checked in, which caused further delays.

Air New Zealand apologised to its customers yesterday saying the incident should never have happened.

We’re becoming increasingly dependent on our IT infrastructure whether it’s for front office processing, back office reporting or simple things like internet access and email. I wonder if by talking more about outages or ‘failures’ if we want to call it that, we might help prevent others experiencing the same events? Much the same as we talk about ‘best practice’ could that not also include – this is what happened when something failed and this is how we could have done it better – what haven’t you thought about in that disaster recovery plan? The list of servers, so you know where the domain controllers are, will that be available if your data center is offline?

In reference to the title, sometimes failure can help remind different groups about different things – that your platform is out of support, that your application needs some inbuilt redundancy, for the technology people that the processes need improved, that we need to ‘own our infrastructure’ that in essence regardless of how many redundant factors we put in place, it can be the simple things that de-rail success – you do have a restart plan in the event of power failure in the data center – you know it’s network, storage, then server, and in server, it’s dc/sms/file and print, then application? There’s no point switching on servers only to find that nothing works because the domain is offline or the storage isn’t mounted.

Related posts:

  1. DL580 Array controller failure! (blank screen after POST) An email from Bill about a problem he had with...
  2. Disk failure on server – things to try: These suggestions can apply if the drive is reporting failure...
  3. Time to switch to more efficient storage? Chris was asking me about storage energy efficiency, one of...
  4. Our settlement’s system is pants – fix it I got an email from someone that shall remain nameless,...
  5. Reviewing the ML350 G6 Law.com If your law firm has five or more users,...

Related posts brought to you by Yet Another Related Posts Plugin.

Bookmark and Share

Leave a Reply