Would be nice to have a health view of the systems in our environment so that we can click on a server or component and see the health. This would be hugely beneficial for us.
See comments below.
Daniele Muscetta commented
Also, for 'state', I'll explain better/some more what I mean by 'inferring it from the data'.
You could decide to use the worst severity in the windows 'system' event log as an indicator of health (worse between warning or error - lower being worse in windows, but 0 is 'success' and 4 is 'information... for weird backwards compatibiilty reasons) - we use the MIN function
Type=Event EventLog=System (EventLevel=1 OR EventLevel=2) | Measure Min(EventLevel) by Computer
Now you hav to look at the grid and mentally map those '1' to RED and those '2' to YELLOW. Or you can throw another filter in the query an only pick Critical - so you only get 'critical' computers - implying any other one that does not appear in the list must be in a 'better' state.
Some other data typs had similar properties to let you 'rank' the worst 'known info' about a given computer (grouping by the 'Computer' field) for example in Malware assessment there are special 'rank' fields precisely for this purpose, higher is worse in this case so we use MAX frunction
Type=ProtectionStatus | measure max(ThreatStatusRank) as WorstRank by Computer
Type=ProtectionStatus | measure max(ProtectionStatusRank) as WorstRank by Computer
See? You can basically 'derive' something like a 'state' by applying statistical functions to the data!
Hope it makes sense/clarifies what I meant by 'not persisting state'.
You can save those searches and pin them to your dashboard and soon see them on your phone!
Also, if you mean 'health' from a 'proactive' angle (rather than reactive monitoring) angle, also check out how information is presented in SQL Assessment 'by Focus Area' and in priority order, but you can easily get a different pivot in search, by Computer for example.
We have other example searches on this blog post http://blogs.msdn.com/b/dmuscett/archive/2014/10/19/advisor-searches-collection.aspx and we keep adding to it. Let us know if you need help with a specific query scenario.
If all or any of this helps, please let us know.
If it doesn't, it would be great if you could elaborate on the scenario a bit more.
Daniele Muscetta commented
State in Operations Manager is persisted and updated (with a LOT of database activity - and performance hit) continuosly, based on a number of 'monitors' present in management packs = in order to know 'state', you need to have a criteria that determines what 'state' even means (when is it 'green'? when is it 'red'?)
Do you intend to SYNCHRONIZE what is in SCOM to the Cloud? (i.e. like it's now done for the Alert management IP?)
That might be doable, but would still only a 'copy' of what's in SCOM, for consultation purposes... that's why I was asking, but I didn't understand the answer.
In the current thinking and with the type of backend we use, we don't really intend to PERSIST any *state* information in the cloud. We don't even have *objects*. It's not like SCOM. This is all entirely based on DATA.
We have 'types' of data, but they are really not object types - they are just a field name - described here http://blogs.msdn.com/b/dmuscett/archive/2014/10/19/advisor-search-first-steps-how-to-filter-data-part-i.aspx
We'd rather want to be able to INFER STATE by looking at the data and the KPI's that matter to you.
I have described some of this - and some converstion between SCOM alerting rules and 'searches' equivalent syntax in this blog post http://blogs.msdn.com/b/dmuscett/archive/2014/11/05/iis-mp-event-alerting-rules-s-opinsights-searches-equivalents.aspx
The simplest example I can give of this is to look at when a machine has last reported some data - if the most recent piece of data is OLDER than 4 hours, I want to see the computer name in the results
* | measure Max(TimeGenerated) as LastData by Computer | Where LastData < NOW-4HOURS
and if you have results... well, that IS showing you machines in a 'bad state' (=not sending frequently enough).
And you can pin that to a tile in the dashboard and make it colr if there are more than ZERO results.
There's your 'state' but we have not WRITTEN it anywhere.
Of you can just look for a set of 'bad' events or conditions that yuo know should not happen. As soon as you see a result, that is your 'bad state'.
You just have to PIN the query that shows the 'state' (or rather the criteria to get to that state) that you are interested in. You are essentially calculating it every time, but with this type of architecture is actually way faster to do this way.
In the future those searches could be running real time and produce alerts - http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519198-long-running-saved-searches-or-scheduled-that-ca
Let us know if this clarifies the current thinking at least a little. We understand this is a shift from previous/traditional/stateful monitoring in Operations Manager, and it is very deliberate.
Tim Carpita commented
Monitored system health state, such as server groups and servers themselves.