Diagnostics and Monitoring

  1. When there is a scaling alert it should tell me which one of my rules cause the scaling up or down.

    When I get an email scale up or down alert for my site. I have no idea which one of my rules has caused the action. It could have been CPU, Memory, or any one of my scale up metrics I am monitoring. It would be really nice to know which one it is.

    12 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Allow alerting on searched traces

    The alerting feature only allows alerting on predefined metrics. I want to be able to create an alert on any search I make in the data. For instance, I might log an error and need an alert whenever that happens.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Fix alert rules

    Alert rules that use a time metric like CPU time or average response time are incorrect.

    Setting a threshold of 1.5 seconds updates the metric graph correctly, showing a dotted line at the 1.5 second mark - however the test is actually set to a threshold of 1.5 milliseconds. You can confirm this by viewing the alert in the old portal, and in the fact that even though the dotted line remains above the blue line in the graph the whole time, the alert is still considered active.

    Please fix it

    Also, if you can make it more clear where…

    11 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. Fix "no available metric" in the Edit Chart item

    Screenshot:

    http://i.imgur.com/wmTKrRT.png

    No available metric? That's a bit hard to believe. Seems more likely that the service that provides the list of metrics failed.

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Support charts with data for many resources in the application

    We have different metrics for cloud services. CPU/Memory etc.
    But they display only per cloud server.

    I have an application that consist from many clod services: frontend, backend, cache, translation, gis etc.

    I want to see some merged charts. For example, I want to see all services CPU consumption telemetry on one chart to see if all my services are handle the load. I don't want to go to each service panel to see theirs CPU load

    Thank you.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Add the ability to monitor total RAM usage on a VM

    We have a graph that monitors CPU usage, Network traffic, and disk read/write, but it would be very nice to have a graph to show RAM usage on a VM over a period of time (much like the CPU). Especially when deciding to switch between say an A2 and an A5.

    27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  7. Prevent the immediate deallocation of scalled virtual machines when manually starting

    I have a series of Virtual Machines in a cloud instance which switch on or off based on CPU usage.

    Sometimes I want to manually start all virtual machines to apply some configuration changes.

    Unfortunately the auto scaling causes the machines to be immediately switch back off. I have tried setting the "Scale down wait time" to 60 minutes but this setting is ignored if you manually start the machines.

    The only work around at the moment:

    1) Turn off scaling.
    2) Start all virtual machines and apply configurations
    3) Re-configure scaling settings.

    I suggest you should honor the "Scale…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    So to be clear, you’re suggesting that the cooldown period should be based on any scaling action, not just an autoscaling action? This seems like a reasonable change.

  8. Improve the Auto Scaling based on queue feature

    1. currently we can only specify a queue and auto scaling based on the queue size regardless of the message type in the queue, for instance, we have deadletter messages and scheduled messages, but it is only the active messages that needs to be considered for auto-scaling, this impacts the cost significantly.

    2. auto-scaling's up scaling only supports fixed numbers, can we support relative numbers? for instance, calculate based on the numbers of messages in the queue

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. AZURE Admin Login Notifications by email or SMS.

    I have been plagued by unbelievable security issues with my ISP MediaTemple for more than 2yrs now. After contacting more than 10 ISP's over a 1yr period it seems that there is no functionality in the marketplace to allow the primary ADMIN of a ISP cloud/webhosting account, in this case the admin user of a Azure, to be notified by SMS or email when someone logs into the admin account to view/modify settings.

    An example. Someone sniffed my packets and passwords. Logged into my admin account. Changed DNS and Zone file records and put a MX redirect so that all…

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. Show memory and network metrics as percentages

    Since each VM / Cloud Service has 3 major metrics: CPU, Memory , Network - diagnostics should collect this metrics by default (logging level minimal).

    Because there are allot of instance types absolute values of Network and Memory counters are not informative and % should be used.
    It is more informative to know that instance is using 90% of memory than to have a value of 600 MB. Same is for the network channel load. For example my service is network intensive and now it is really hard to understand when network channel load comes to it's limit and service…

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is very good feedback. Today we have separate datasources for what your quotas are (e.g. network, memory), and the metrics that we emit. Ideally, we could bring those two points together to give you a percentage of those metrics.

    Also, once these are exposed, they will automatically be available for autoscale — already today you can use any exposed metric for scaling.

  11. Adding reports, like SLA/Uptime reports for Virtual Machines, Availability Sets, and Traffic Managers

    Clients like to see reports that show that SLAs are being met or the Uptime of a Virtual Machine, Availability Set and/or Traffic Manager. Could Azure provide reports that could be generated from the data they are already collecting and presenting on the graphs?

    Thanks,
    Scott Weigand

    299 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    18 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Enable a "before or after" event to be defined in auto-scaling of Cloud Services

    We domain-join the Virtual Machines associated with our Cloud Service roles. Of course, the process of joining the domain forces a reboot. And, when scaling up, this is fine. However, when scaling-down we want to automatically remove the machines AD Account from the domain. I don't see a way to execute a "before or after" event in Azure auto-scaling. The current prescription is to override the roles OnStop event. However, there is no way to tell if it is a simple reboot or an actual de-allocation.

    We should be able to explicitly define & execute a separate set of code…

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →

    Is this code that you’d want to define inside your role, or, code that you’d want to define in the autoscale setting?

  13. Auto-Scale explanation of Cool-Down period

    We need better explanation of auto-scaling rules. Even after reading MSDN documentation, it is unclear.

    Does a Cool-Down period for scale-down interfere with a scale-up operation? For instance, we have a polling period of 5 minutes for scale up, and 120 minutes Cool-Down for scale down. If however, during the 120 minute Cool-Down period of scale-down no further scale-up operation can take place, this would be awful. Sadly, no information is available.

    Or what happens if a CPU-Metric-Rule says scale down: but a scale-up Memory-Metric is still firing?

    Lastly, we need infos as to whether rule-sets (more than one) are…

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. The (auto) scaling graph has a really wierd timebase. The x-Axis needs to autoscale!

    Like the title says, when starting with autoscaling, the graph looks awful and overwrites itsself if there are several scaling actions. The x-axis needs to autoscale and be configurable (time range). Furthermore, the legend needs to be explained, how does cpu percentage fit into the y-axis of the instance count?

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Performance Query Widgets on Dashboard get reset on new load

    The Query-Settings for the Dashboard widgets partly reset on reopening the dashboard. Some have extra plots added, others have less, and even others have a different time range set than before. Exactly which setting is reset is always the same on each widget.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Audit and IT fundamental services

    Provide unalterable audit logs for services like ACS.

    Provide management stats and backup/recovery mechanisms and services for all provided services.

    12 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for the feedback. We’ll look into this. Each service has their own auditing capabilities, but we can consider standardization, if that would be helpful for you. Let us know if you have any specifics you’re looking for or ways this could help you.

  17. Add an Alert Rule so that notification can occur as soon as a web service Web Services DIAGNOSTICS CONNECTION STRINGS response fails

    Add an Alert Rule so that notification can occur as soon as a web service Web Services DIAGNOSTICS CONNECTION STRINGS response fails. Currently it only Alerts on Uptime and Response Times which are averaged over a minimum of 15 minutes.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  18. Scale page > instances graph is incorrect after using "Specific Dates" feature

    As a test, I used the "Specific Dates" feature yesterday to scale up the # of instances for a web role from 1 -> 10 for just 2 hours. The role scaled back down to 1 instance after that interval. However, today the graph shows the historical # of instances pegged at 10 for the previous 5 days, which is incorrect. See attached image.

    0 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Support auto-scaling cloud services by service bus topic / subscription

    At the moment you can only scale by queue size but not topic/subscription size. Given subscriptions can have filters it would be ideal to be able to scale by subscription size and not just general topic size.

    110 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is something that we are looking at. In the meanwhile, there is a workaround to forward the items you want to scale by to a queue and scale by that queue.

  20. Metrics Monitor missing (for Websites etc.)

    It was very easy to see the outgoing traffic (and other metrics) in the old portal (e.g. go to a Website and then go to Monitor).
    In the new portal is seems that only http requests are shown - is this a limitation from the preview? Would like to see other metrics as well.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Diagnostics and Monitoring

Categories

Feedback and Knowledge Base