TechCrunch

Wow. T-Mobile and Danger, the Microsoft-owned subsidiary that makes the Sidekick, has just announced that they’ve likely lost all user data that was being stored on Microsoft’s servers due to a server failure. That means that any contacts, photos, calendars, or to-do lists that haven’t been locally backed up are gone. Apparently if you don’t turn off your Sidekick and make sure its battery doesn’t run out you can salvage what’s currently stored on the device, otherwise you’re out of luck: Microsoft/Danger is describing the likelihood of recovering the data from their servers as “extremely low”.

T-Mobile Sidekick users have been suffering from a major outage all week, and that issue apparently hasn’t been resolved either.

This goes beyond FAIL, face-palm, or any of the other internet memes we’ve come to associate with incompetence. The fact that T-Mobile and/or Microsoft Danger don’t have a redundant backup is simply inexcusable, especially given the fact that the Sidekick is totally reliant on the cloud because it doesn’t store its data locally. Microsoft acquired Danger for $500 million in February 2008.

I asked my wife what she thought about this and got the response I expected, “what do you mean they don’t have a backup?”. It’s one of the issues that happens inside and outside the enterprise, whether it’s a fault of the project manager, or the guy who signed the ‘risk form’ (no we don’t need it backed up), so many people just expect data to being backed up, it’s ‘core’ so to speak to many people. That said a few issues, before you also agree with my wife:

  • How much data was involved in the backup
  • How long would it take the backup
  • How long is an actual restore and how easy is to do?
  • Consider that from a service delivery standpoint, there is increasingly no concept of downtime
  • Internal process – we don’t backup development systems, the system then gets ‘made production’ – who checks the backups are in place when it’s done?

On so many occasions either myself or colleagues have experienced this, consider that for a few thousand users the amount of data could be very significant, and also that users might not have been perceived to accept the downtime needed to take a proper backup. For example, switch off all services for 48 hours, run backup and then do the work. At the same time, we also have to understand that it can actually be easier to rebuild the system, than spend forever trying to recover data, the “I can rebuild your server in 3 hours, or we can spend the next 72 trying to restore it, what’s your preference?”.

It’s easy to sit on the outside and ask questions or say what should have happened, regardless, consider that with a large userbase, with data volumes getting ever bigger, it can be getting increasingly more challenging to keeping the backup infrastructure in line with your data  – everyone wants data, everyone wants service, but just watch the reaction when you say “and what about the backups?”, people only appreciate the importance of backups, when they’re needed, they’re typical a cross platform, cross business line service which has no budget, no investment until it stops working in many an enterprise/organization. There’s more technical coverage on this site.

With no downtime (the 24 hour business), increasingly simple tasks start to cover more risk – just look at what the IT guys go through when increasing more disk space on their cluster – if users were told the actual risks, they might not sign off the risk/accept the change. For example, inallocating more storage to a Windows cluster, we could see the following risks:

  • Data loss or corruption
  • Cluster issues – some resources wont start, don’t operate as expected
  • Cluster doesn’t work, or wont operate in a cluster due to issues relating to the resources/quorum
  • Server might not boot when accessing the storage – a Known issue with specific fibre card drivers and Windows 2003 service packs.

Related posts:

  1. Talking backups, backups and backups. Infoworld I work full-time as an IT manager, and once...
  2. Sidekick user data is being recovered http://news.bbc.co.uk/1/low/technology/8309218.stm Microsoft says it has now recovered the personal data...
  3. Talking about Backups www.handybackup.net Hardware errors, data corruption, viruses and spyware or simple...
  4. Aster Data continue to innovate the backups Aster Data San Carlos, Calif. – September 22, 2009 –...
  5. Backups and indirect costs/operational results I was having a chat with Danny last week asking...

Related posts brought to you by Yet Another Related Posts Plugin.

Bookmark and Share

Leave a Reply