One of my colleagues emailed me about their new HP Proliant DL980 G7 running Windows 2008 to host their databases, “we’ve been having a problem with one of them it’s been reporting ASR alerts. Have you seen this before?”
Yes I replied and summarized in my emails the steps that I would take in this scenario. Some brief history is here, of course there will be the official HP bit, I found this article talking about it, and all opinions are my own and you should always speak with HP or your service provider if you are unsure about anything.
HP ASR is their Automated System Recovery which is configured in the system bios or configuration menu. The ASR feature can be disabled if you want to manage any events like this manually. It’s a technology which monitors the server hardware to identify system hangs and unexpected behaviour and if there is a problem reboot the server to return the server to service. In the past when I have seen an ASR occur in the Integrated Management Log, (the IML), it’s usually been related to a software or hardware problem and is a symptom of an underlying issue.
The first steps as follows, (I’ve made a note of intrusive/non intrusive in terms of changing system configurations for your reference):
- Shut the server down and run the full set of HP diagnostics (preferably the latest version) – non intrusive
- Shut down the server and check/reseat the components in particularly the memory and PCI cards – intrusive
- Download the latest system, array and ILO/device firmware and apply them to rule out any known issues – this might be achieved using HP Smart Update Manager if you are using it. Can be intrusive (reboot is required) but should not affect any applications or functionality
The ASR is not always an indication of a hardware fault, however I always want to step back to basics rule out elements until we can identify where the issue occurs, physical check, run diagnostics, firmware/drivers up to date, then possibly disabling ASR if necessary to see if we witness an error, or cross referencing the IML logs with those of the application and the operating system.
With these steps completed run a soak test, by that I mean re-run the application(s) and see what behaviour occurs, if you see a repeated ASR you need to log a call with your hardware service provider or HP and for that you’ll probably need:
- Server model and serial number
- Versions of firmware applied
- IML/diagnostics logs for analysis