Time to rule out the infrastructure
Was having a chat with Chris, he’s been having problems with one of the application servers where he works. The problem he’s currently having is that the server fails overnight, the Windows engineer was called, rebooted the server and the application team restarted the application. The next morning the application team suggested that there is a fault with the server, that it had to be rebooted and obviously therefore windows/the server is at fault. “How do I respond to this?” he asked
First then let’s rule out the hardware:
- Upgrade to the lastest disk/array/system and ILO firmware
- Upgrade the driver pack to the latest version you use in production
- Run a full set of diagnostics to see if any obvious errors are thrown - you might have to do this over the weekend/out of hours
Then we rule out the software:
- Get a system report: operating system version, service pack levels, hot fixes etc and uptime
- Perform a health check (what I call it anyway): check the event logs, check disk space, memory and processor utilization, also do check fragmentation - that can make your file server very slow
- Verify the anti virus software configuration: don’t scan the file twice, either inbound or outbound, read or write - if your file system has many files this can make a big difference to performance
- Verify the configuration - could we adjust anything to improve performance/stability
Thirdly we create an action plan:
- Analyze the results and be open with the findings - we need to upgrade drivers and log a call with Microsoft
- Agree the next steps - what the IT team and the application team are doing - it might just be the application team checking their logs, and the server team checking the server, with networks checking the network port/route
In Chris’ situation it’s getting emotional, one of the things that we can always suggest is the swing box - we feel it’s the server. Fine, there is a server of similar specification, base build it (install windows and layered components only), have the business line/application team test their application


