Diagnostics and Monitoring

How can we improve Azure Diagnostics and Monitoring?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Alerts based on Queue Size

    I would like to be able to setup an alert and monitor a Cloud Service based on Queue size. So if a queue has more than 10,000 items for 15 minutes send alert.

    441 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  15 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Adding reports, like SLA/Uptime reports for Virtual Machines, Availability Sets, and Traffic Managers

    Clients like to see reports that show that SLAs are being met or the Uptime of a Virtual Machine, Availability Set and/or Traffic Manager. Could Azure provide reports that could be generated from the data they are already collecting and presenting on the graphs?

    Thanks,
    Scott Weigand

    234 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    17 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. Support auto-scaling cloud services by service bus topic / subscription

    At the moment you can only scale by queue size but not topic/subscription size. Given subscriptions can have filters it would be ideal to be able to scale by subscription size and not just general topic size.

    110 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is something that we are looking at. In the meanwhile, there is a workaround to forward the items you want to scale by to a queue and scale by that queue.

  4. Show memory and network metrics as percentages

    Since each VM / Cloud Service has 3 major metrics: CPU, Memory , Network - diagnostics should collect this metrics by default (logging level minimal).

    Because there are allot of instance types absolute values of Network and Memory counters are not informative and % should be used.
    It is more informative to know that instance is using 90% of memory than to have a value of 600 MB. Same is for the network channel load. For example my service is network intensive and now it is really hard to understand when network channel load comes to it's limit and service…

    62 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is very good feedback. Today we have separate datasources for what your quotas are (e.g. network, memory), and the metrics that we emit. Ideally, we could bring those two points together to give you a percentage of those metrics.

    Also, once these are exposed, they will automatically be available for autoscale — already today you can use any exposed metric for scaling.

  5. Have the diagnostics engine push remaining data to storage on shutdown

    I'd like to see the Diagnostics agent be aware of a graceful shutdown scenario (instance count being lowered for example) and if it has been working on a schedule to push data to storage attempt to push any data since the last scheduled transfer over before the system is fully shutdown.

    If you have a schedule set up to move data every 5 or 10 minutes (or really any time schedule that isn't in the seconds) you could considerable amount of data if the role is shutdown between scheduled pushes. It would be nice if an attempt is actually made…

    47 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for the suggestion. This seems like a reasonable way to make sure that you collect all of the diagnostic data from your virtual machines. I have passed this idea on to the WAD team.

  6. Auto-Scale explanation of Cool-Down period

    We need better explanation of auto-scaling rules. Even after reading MSDN documentation, it is unclear.

    Does a Cool-Down period for scale-down interfere with a scale-up operation? For instance, we have a polling period of 5 minutes for scale up, and 120 minutes Cool-Down for scale down. If however, during the 120 minute Cool-Down period of scale-down no further scale-up operation can take place, this would be awful. Sadly, no information is available.

    Or what happens if a CPU-Metric-Rule says scale down: but a scale-up Memory-Metric is still firing?

    Lastly, we need infos as to whether rule-sets (more than one) are…

    37 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  7. Add the ability to monitor total RAM usage on a VM

    We have a graph that monitors CPU usage, Network traffic, and disk read/write, but it would be very nice to have a graph to show RAM usage on a VM over a period of time (much like the CPU). Especially when deciding to switch between say an A2 and an A5.

    27 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. When there is a scaling alert it should tell me which one of my rules cause the scaling up or down.

    When I get an email scale up or down alert for my site. I have no idea which one of my rules has caused the action. It could have been CPU, Memory, or any one of my scale up metrics I am monitoring. It would be really nice to know which one it is.

    12 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Audit and IT fundamental services

    Provide unalterable audit logs for services like ACS.

    Provide management stats and backup/recovery mechanisms and services for all provided services.

    12 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thanks for the feedback. We’ll look into this. Each service has their own auditing capabilities, but we can consider standardization, if that would be helpful for you. Let us know if you have any specifics you’re looking for or ways this could help you.

  10. Make alerts more actionable (e.g. "Open a support ticket")

    This was feedback given in the mvp summit session on the current (Old-New) portal.

    When I have an alert or something is 'limited' that shows me the red exclamation point I expect to be able to "do something" or "go somewhere for help".

    I would love to get "this problem is causing your service to be down - go here to open up a support ticket for that".

    I also expect any 'limited' message to be archived so that I can go back to them. Many times I've had 'limited' show up and I send a message to someone about…

    11 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  11. Fix alert rules

    Alert rules that use a time metric like CPU time or average response time are incorrect.

    Setting a threshold of 1.5 seconds updates the metric graph correctly, showing a dotted line at the 1.5 second mark - however the test is actually set to a threshold of 1.5 milliseconds. You can confirm this by viewing the alert in the old portal, and in the fact that even though the dotted line remains above the blue line in the graph the whole time, the alert is still considered active.

    Please fix it

    Also, if you can make it more clear where…

    11 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  3 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. AZURE Admin Login Notifications by email or SMS.

    I have been plagued by unbelievable security issues with my ISP MediaTemple for more than 2yrs now. After contacting more than 10 ISP's over a 1yr period it seems that there is no functionality in the marketplace to allow the primary ADMIN of a ISP cloud/webhosting account, in this case the admin user of a Azure, to be notified by SMS or email when someone logs into the admin account to view/modify settings.

    An example. Someone sniffed my packets and passwords. Logged into my admin account. Changed DNS and Zone file records and put a MX redirect so that all…

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  13. Pause the Streaming Log

    When you scroll up in the streaming log it shouldn't continue pushing you to the bottom. And when you scroll back to the bottom, it should continue to stream in the UI.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  14. Allow alerting on searched traces

    The alerting feature only allows alerting on predefined metrics. I want to be able to create an alert on any search I make in the data. For instance, I might log an error and need an alert whenever that happens.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. Metrics Monitor missing (for Websites etc.)

    It was very easy to see the outgoing traffic (and other metrics) in the old portal (e.g. go to a Website and then go to Monitor).
    In the new portal is seems that only http requests are shown - is this a limitation from the preview? Would like to see other metrics as well.

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  16. Built-in logging for low-level issues

    Multiple times I ran into issues where roles would not properly start or where log transfer to Blob storage does not work. This is an absolute time killer because the remedy is trial-and-error, waiting for a forum answer or contacting WAZ support.
    I would expect that Windows Azure provides low-level "bootstrap" log information for troubleshooting right out of the box. Non-inclusive list of things I would want to see is (all based on personal experience):
    -Assembly binding issues, missing assemblies, assembly version mismatches.
    -Misconfigured diagnostic log connection strings
    -Which overridden role methods were called plus any exceptions that might have…

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for the feedback, I agree that understanding the issues that are preventing your role from starting would be very useful — I myself have had roles that have failed to start and wanted to know why that was.

  17. Alert on Auto Scaling

    Hi,

    I've just enabled the Auto-Scaling feature for one of our Websites.
    However I miss the option to be alerted when the number of assigned instances changes.

    Can you please add an alert feature?

    thanks!

    4 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
    under review  ·  Nir Mashkowski responded

    We currently put an entry in the Operation Log whenever there is a Success or Failure in autoscaling. We also send an email on the Failure, but not success of autoscale. The reason for this is it would be extremely noisy – we do many thousands of scale actions in a day.

    We may potentially add this as an opt-in only feature in the future, but it’s not on the immediate roadmap.

  18. The (auto) scaling graph has a really wierd timebase. The x-Axis needs to autoscale!

    Like the title says, when starting with autoscaling, the graph looks awful and overwrites itsself if there are several scaling actions. The x-axis needs to autoscale and be configurable (time range). Furthermore, the legend needs to be explained, how does cpu percentage fit into the y-axis of the instance count?

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Add a Counter for the Autoscale Instances to the Monitor

    Pmease add a Metric for the Autoscale to the Monitor-Tab of the Portal

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  20. Request and errors lense: need ability to navigate to errror description itself

    I see the number of HTTP errors, but anywhere I click (including clickable number of errors) I get to the list of requests. There I can see the error request time, but it does not navigate to the error description.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: oidc
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Diagnostics and Monitoring

Feedback and Knowledge Base