Distributed Apps, SLA's and Downtime
So the scenario we have is: Payroll distributed application with x number of components in it. One of the servers is a unix box running Oracle Payroll app goes bad and a process hogs 100% CPU for x number of hours Scom alert comes in regarding the CPU, but at this point is not affecting the distributed app. DBA team wish to leave the process at 100% for a few hours to see if its a job running 2hrs in Unix server heartbeat fails, affecting the distributed app CPU issue resolve, server and dist app ok now ______________________________________________________ So the complaint is DBA say the payroll app did not go down Windows guys say resolve the process issue DBA guys say either way there app did not goes down and should not be reflected in the SLA stats Windows guys say maybe your app did go down but recovered Thoughts on this welcome? Regards, Robert --------- You can view my blog at: http://msopsmgr.blogspot.com/
February 24th, 2011 10:33am

Hi Robert The joys of how to logically break down infrastructure and application. Other issues that typically come into play here are application teams and infrastructure teams wanting different thresholds. There is no easy answer here. One option when building the app, you can do a distributed app for infrastructure and target more generic components (e.g. computer). And another distributed application for the applications team and for the application really target the actual, most detailed level, components (and probably use some remote checks against database availability and response time in your case). You can then bring both distributed applications together. I usually mix and match Savision Live Maps with Distributed Applications here to get more flexbility. Cheers GrahamView OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/
Free Windows Admin Tool Kit Click here and download it now
February 24th, 2011 11:01am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics