Diagnostics and Monitoring

How can we improve Azure Diagnostics and Monitoring?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Alerts based on Queue Size

    I would like to be able to setup an alert and monitor a Cloud Service based on Queue size. So if a queue has more than 10,000 items for 15 minutes send alert.

    374 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      You have left! (?) (thinking…)
      under review  ·  11 comments  ·  Queues  ·  Flag idea as inappropriate…  ·  Admin →
    • Adding reports, like SLA/Uptime reports for Virtual Machines, Availability Sets, and Traffic Managers

      Clients like to see reports that show that SLAs are being met or the Uptime of a Virtual Machine, Availability Set and/or Traffic Manager. Could Azure provide reports that could be generated from the data they are already collecting and presenting on the graphs?

      Thanks,
      Scott Weigand

      113 votes
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        I agree to the terms of service
        Signed in as (Sign out)
        You have left! (?) (thinking…)
        16 comments  ·  Flag idea as inappropriate…  ·  Admin →
      • Support auto-scaling cloud services by service bus topic / subscription

        At the moment you can only scale by queue size but not topic/subscription size. Given subscriptions can have filters it would be ideal to be able to scale by subscription size and not just general topic size.

        90 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          I agree to the terms of service
          Signed in as (Sign out)
          You have left! (?) (thinking…)
          3 comments  ·  Flag idea as inappropriate…  ·  Admin →

          This is something that we are looking at. In the meanwhile, there is a workaround to forward the items you want to scale by to a queue and scale by that queue.

        • Retention Policy for Diagnostics

          Add a retention policy to Azure Diagnostics much like Azure Storage has for logging and analytics. It is currently WAY too hard to clean up old diagnostics data.

          84 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            I agree to the terms of service
            Signed in as (Sign out)
            You have left! (?) (thinking…)
            2 comments  ·  Flag idea as inappropriate…  ·  Admin →
          • Add CPU and Memory usage metrics per instance

            We use New Relic to monitor our cloud services. Sometimes we see an instance going above 80% CPU, which would possibly be solved by rebooting the instance, but it's imposible to identify which of the instances is in trouble in the portal (Azure portal uses _N to name the instances, New Relic uses an ID).

            76 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              I agree to the terms of service
              Signed in as (Sign out)
              You have left! (?) (thinking…)
              4 comments  ·  Flag idea as inappropriate…  ·  Admin →
            • Show memory and network metrics as percentages

              Since each VM / Cloud Service has 3 major metrics: CPU, Memory , Network - diagnostics should collect this metrics by default (logging level minimal).

              Because there are allot of instance types absolute values of Network and Memory counters are not informative and % should be used.
              It is more informative to know that instance is using 90% of memory than to have a value of 600 MB. Same is for the network channel load. For example my service is network intensive and now it is really hard to understand when network channel load comes to it's limit and service…

              54 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                I agree to the terms of service
                Signed in as (Sign out)
                You have left! (?) (thinking…)
                2 comments  ·  Flag idea as inappropriate…  ·  Admin →

                This is very good feedback. Today we have separate datasources for what your quotas are (e.g. network, memory), and the metrics that we emit. Ideally, we could bring those two points together to give you a percentage of those metrics.

                Also, once these are exposed, they will automatically be available for autoscale — already today you can use any exposed metric for scaling.

              • Have the diagnostics engine push remaining data to storage on shutdown

                I'd like to see the Diagnostics agent be aware of a graceful shutdown scenario (instance count being lowered for example) and if it has been working on a schedule to push data to storage attempt to push any data since the last scheduled transfer over before the system is fully shutdown.

                If you have a schedule set up to move data every 5 or 10 minutes (or really any time schedule that isn't in the seconds) you could considerable amount of data if the role is shutdown between scheduled pushes. It would be nice if an attempt is actually made…

                46 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  I agree to the terms of service
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                  0 comments  ·  Flag idea as inappropriate…  ·  Admin →

                  Thank you for the suggestion. This seems like a reasonable way to make sure that you collect all of the diagnostic data from your virtual machines. I have passed this idea on to the WAD team.

                • Provide functionality to solve character limit api requests.

                  Hi,
                  We are using the new Azure SDK 2.5 Diagnostics for our Cloud Service.
                  Since our Performance Counters are dynamic and rely on process ID and process Identifier (the thing with the #) we cannot set the configuration in the wadcfg files but have to create the configuration during runtime and upload it via the REST API.

                  E.g. ChangeDeploymentConfiguration:
                  https://management.core.windows.net/{0}/services/hostedservices/{1}/deploymentslots/{2}/?comp=config "POST"

                  Today we received an error that the maximum length for the public configuration cannot exceed 20480 characters.

                  That is not acceptable.
                  First of all we have so many performance counters that we can easily reach this limit. And secondly…

                  35 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    I agree to the terms of service
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                    3 comments  ·  Flag idea as inappropriate…  ·  Admin →
                  • Monintor VM Status

                    Add a feature to monitor the Status of a VM with some conditions.
                    Ex.: I want to receive an alert when the Status of VM "X" is not "Running".

                    29 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      I agree to the terms of service
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                      0 comments  ·  Windows  ·  Flag idea as inappropriate…  ·  Admin →
                    • Add the ability to monitor total RAM usage on a VM

                      We have a graph that monitors CPU usage, Network traffic, and disk read/write, but it would be very nice to have a graph to show RAM usage on a VM over a period of time (much like the CPU). Especially when deciding to switch between say an A2 and an A5.

                      26 votes
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        I agree to the terms of service
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                        under review  ·  1 comment  ·  Flag idea as inappropriate…  ·  Admin →
                      • Additional Azure Monitor metric - RAM/Memory resource

                        Considering there are various CPU, Network and Disk metrics available, could we also have "RAM/Memory % utilization" please?

                        19 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          I agree to the terms of service
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                          4 comments  ·  Flag idea as inappropriate…  ·  Admin →
                        • Bring back the dashboard tiles that can show me the performance tier and number of instances my app service plan is running at.

                          I have a dashboard with 7 app service plans on it. For each plan, I had 1 tile that showed me the current tier: S1, S2, S3, etc. and 1 tile that show me the instance count; 1 or 5 or 10, etc.

                          A week or 2 ago, these tiles stopped working and I got a notise that those tiles have been "Retired".

                          Now it seems that there is no replacement tile to provide the same information.

                          How can that be. I hope I am wrong.

                          Please advise.

                          See this for more info: https://social.msdn.microsoft.com/Forums/en-US/9c1c0633-ceff-493f-be9b-f62f8cb279e2/how-can-i-monitore-the-instance-count-of-an-app-service-plan-on-a-dashboard?forum=windowsazurewebsitespreview

                          18 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            I agree to the terms of service
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                            0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                          • Create Activity Log Alerts and Action Groups by PowerShell cmdlts

                            When we want to create Activity Log Alerts and Action Groups, we can only use Azure Portal or Resource Manager templates.
                            However, it well be easier if it will be able to create these resources by PowerShell cmdlts.

                            https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-activity-log-alerts
                            https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-action-groups

                            18 votes
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              I agree to the terms of service
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                              0 comments  ·  Flag idea as inappropriate…  ·  Admin →

                              Thanks for the suggestion! We’re definitely looking to add PowerShell support for Activity Log Alerts and Action Groups. Currently, we’re hoping that this will be in the October Azure PowerShell release.

                              John Kemnetz
                              Program Manager, Azure Monitor

                            • Add support for enabling diagnostics 1.3 via TFS build

                              Prior to SDK 2.5, the diagnostics config was part of the Azure Service Configuration, and was enabled automatically when we built and published using TFS / Visual Studio Online.

                              We use the Staging > Production VIP swap approach to deployments, so we are always deploying to an empty staging slot.

                              Now that diagnostics is an extension, we need to manually enable the diagnostics using Powershell every time we deploy to Staging.

                              Furthermore, because we need to wait for the slot to exist before we can enable diagnostics, we have no way of retrieving any diagnostics during role startup.

                              For now…

                              15 votes
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                I agree to the terms of service
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                                2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                              • Add a button for copying an existing Alert Rule

                                I am currently creating duplicates for every Alert Rule so that I can have two versions of each one. I want one copy of each Alert Rule to have a lower threshold which will go out to the engineers. I want another Alert Rule with a much higher threshold which will go out to DevOps.

                                Would save me a lot of time if there was a copy button.

                                15 votes
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  I agree to the terms of service
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                  2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                • Auto-Scale explanation of Cool-Down period

                                  We need better explanation of auto-scaling rules. Even after reading MSDN documentation, it is unclear.

                                  Does a Cool-Down period for scale-down interfere with a scale-up operation? For instance, we have a polling period of 5 minutes for scale up, and 120 minutes Cool-Down for scale down. If however, during the 120 minute Cool-Down period of scale-down no further scale-up operation can take place, this would be awful. Sadly, no information is available.

                                  Or what happens if a CPU-Metric-Rule says scale down: but a scale-up Memory-Metric is still firing?

                                  Lastly, we need infos as to whether rule-sets (more than one) are…

                                  14 votes
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    I agree to the terms of service
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                  • Enhance Audit Logs on new portal with Timestamp and Started status

                                    Old management portal shows Audit Log in pretty good way: Each record has detailed timestamps with seconds. It's absolutely clear when operation is Started and when it is finished (Succeeded/Failed).

                                    Please consider enhancing Audit Logs on new portal with this information. It is very useful during troubleshooting.

                                    14 votes
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      I agree to the terms of service
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                      0 comments  ·  Flag idea as inappropriate…  ·  Admin →

                                      This is available in the Azure Portal. To configure which columns are shown for the Activity Log, simply use the “columns” button at the top of the Activity Log blade.

                                      Thanks,

                                      John Kemnetz
                                      Program Manager, Azure Monitor

                                    • Copy Alert Rules from one Azure resource to another

                                      It would be really great if there was a way to copy a set of Alert Rules from one Azure resource to another.

                                      Use Case: I made 15 Alert Rules on our Staging db. I want those on the Prod db now. Same thing with our WebApp, CloudService, SQL Server, etc. It takes a really long time to add these manually and you might forget one or type an email in wrong, and then you miss out on important alerts.

                                      13 votes
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        I agree to the terms of service
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                        0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                      • Download performance metrics from portal

                                        It would be really great to be able to download the performance metrics from a chart as a csv or excel file.

                                        13 votes
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          I agree to the terms of service
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                          0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                        • When there is a scaling alert it should tell me which one of my rules cause the scaling up or down.

                                          When I get an email scale up or down alert for my site. I have no idea which one of my rules has caused the action. It could have been CPU, Memory, or any one of my scale up metrics I am monitoring. It would be really nice to know which one it is.

                                          12 votes
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            I agree to the terms of service
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                            under review  ·  0 comments  ·  Flag idea as inappropriate…  ·  Admin →
                                          ← Previous 1 3 4 5 6 7
                                          • Don't see your idea?

                                          Diagnostics and Monitoring

                                          Feedback and Knowledge Base