Diagnostics and Monitoring

  1. Alerts based on Queue Size

    I would like to be able to setup an alert and monitor a Cloud Service based on Queue size. So if a queue has more than 10,000 items for 15 minutes send alert.

    506 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    under review  ·  18 comments  ·  Flag idea as inappropriate…  ·  Admin →
  2. Adding reports, like SLA/Uptime reports for Virtual Machines, Availability Sets, and Traffic Managers

    Clients like to see reports that show that SLAs are being met or the Uptime of a Virtual Machine, Availability Set and/or Traffic Manager. Could Azure provide reports that could be generated from the data they are already collecting and presenting on the graphs?

    Thanks,
    Scott Weigand

    468 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    23 comments  ·  Flag idea as inappropriate…  ·  Admin →
  3. APIM Load Balancer Metrics to View SNAT Metrics Status so I Proactive Know When To Scale

    My enterprise was on the phone for the last week with Microsoft support staff trying to troubleshoot a 500 "Unable to connect to remote server" issue.

    The result was the SNAT was maxed out on our APIM instance. Having metrics would allow us to proactively scale it.

    326 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    8 comments  ·  Flag idea as inappropriate…  ·  Admin →
  4. E-mail notifications for "Impossible travel to atypical locations" reporting

    E-mail notifications for "Impossible travel to atypical locations" reporting. Presently admins need to watch this portal screen all day/night to see if there are new entries. It would be much better to see an e-mail notification about it.

    157 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  5. Retention Policy for Diagnostics

    Add a retention policy to Azure Diagnostics much like Azure Storage has for logging and analytics. It is currently WAY too hard to clean up old diagnostics data.

    128 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    4 comments  ·  Flag idea as inappropriate…  ·  Admin →
  6. Support auto-scaling cloud services by service bus topic / subscription

    At the moment you can only scale by queue size but not topic/subscription size. Given subscriptions can have filters it would be ideal to be able to scale by subscription size and not just general topic size.

    113 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is something that we are looking at. In the meanwhile, there is a workaround to forward the items you want to scale by to a queue and scale by that queue.

  7. Support for PromQL (Prometheus Query Language)

    Prometheus and Grafana have become a de facto standard for metrics in the Cloud Native world, and Azure Monitor is a valid, fully-managed alternative to a self-hosted Prometheus environment.

    Unfortunately, when replacing Prometheus with Azure Monitor, most Grafana dashboard template won't work anymore out of the box, as they query the data source in PromQL, the Prometheus Query Language. It would be great if Azure Monitor would also be able to be queried with PromQL and the Grafana Data Source Plugin would support that.

    This would make the replacement of a self-hosted Prometheus with Azure Monitor much smoother!

    104 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  8. Monitor Resource Creation in an Azure Subscription.

    DevOps teams and also IT teams want to know when a resource has been either created or removed from the subscription they manage.

    Resource creation and delete may effect a Hugh range of issue from billing to product functionality.

    As for now, there is no way to get an alert once a resource is either created or deleted on the subscription.

    96 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  9. Add CPU and Memory usage metrics per instance

    We use New Relic to monitor our cloud services. Sometimes we see an instance going above 80% CPU, which would possibly be solved by rebooting the instance, but it's imposible to identify which of the instances is in trouble in the portal (Azure portal uses _N to name the instances, New Relic uses an ID).

    82 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    5 comments  ·  Flag idea as inappropriate…  ·  Admin →
  10. multiple email address per one action

    The customer wants to register multiple e-mail addresses in the action group.
    Now it is one email address per action.
    Can this be multiple email addresses per action?
    Are there plans to be changed in the future.

    Customers are concerned about erasing the action by mistake.

    68 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  11. Add custom charts that pull data from our own SQL database (dashboard like features)

    Since I go to the portal to view the metrics around usage of the site plus CPU data, HTTP response codes, etc. it would be nice to be able to add my own custom data to the portal for viewing alongside these existing charts. For me it would be nice to include custom SQL to build a chart that pulls certain metrics from the database that I have attached to my website. I could see using it for tracking customer registrations, order information, etc. Things that are specific to my site but that provide valuable insight into what is going…

    65 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  12. Show memory and network metrics as percentages

    Since each VM / Cloud Service has 3 major metrics: CPU, Memory , Network - diagnostics should collect this metrics by default (logging level minimal).

    Because there are allot of instance types absolute values of Network and Memory counters are not informative and % should be used.
    It is more informative to know that instance is using 90% of memory than to have a value of 600 MB. Same is for the network channel load. For example my service is network intensive and now it is really hard to understand when network channel load comes to it's limit and service…

    63 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →

    This is very good feedback. Today we have separate datasources for what your quotas are (e.g. network, memory), and the metrics that we emit. Ideally, we could bring those two points together to give you a percentage of those metrics.

    Also, once these are exposed, they will automatically be available for autoscale — already today you can use any exposed metric for scaling.

  13. NSG Flow Log Export to Event Hub for SIEM Ingestion

    NSG Flow Logs can be viewed in Network Watcher or Exported to Storage, but the option to export to Event Hub is missing. Since Event Hubs have become the standard interface for SIEM solutions to access Azure logs, it would be great to be able to handle NSG Flow Logs the same way.

    54 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  14. Add monitoring of vCPU usage against quota at subscription level

    Currently there is no out-of-the-box way of monitoring vCPU usage over time and against subscription quotas.

    This really comes into play when using services like Azure Databricks which create and destroy VMs in the subscription very frequently.

    We semi-regularly encounter failures in scaling operations on these services as we bumpo up against subscription vCPU limits, however there's no easy way of proactively mitigating against them - the Azure Portal only shows the current usage, which may be fine, and show plenty of spare capacity depending on the time of day/current usage. However there is no real way of seeing a…

    53 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Flag idea as inappropriate…  ·  Admin →
  15. 49 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  16. Have the diagnostics engine push remaining data to storage on shutdown

    I'd like to see the Diagnostics agent be aware of a graceful shutdown scenario (instance count being lowered for example) and if it has been working on a schedule to push data to storage attempt to push any data since the last scheduled transfer over before the system is fully shutdown.

    If you have a schedule set up to move data every 5 or 10 minutes (or really any time schedule that isn't in the seconds) you could considerable amount of data if the role is shutdown between scheduled pushes. It would be nice if an attempt is actually made…

    47 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →

    Thank you for the suggestion. This seems like a reasonable way to make sure that you collect all of the diagnostic data from your virtual machines. I have passed this idea on to the WAD team.

  17. Copy Alert Rules from one Azure resource to another

    It would be really great if there was a way to copy a set of Alert Rules from one Azure resource to another.

    Use Case: I made 15 Alert Rules on our Staging db. I want those on the Prod db now. Same thing with our WebApp, CloudService, SQL Server, etc. It takes a really long time to add these manually and you might forget one or type an email in wrong, and then you miss out on important alerts.

    47 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add support for exporting ARM templates

    Please add support for exporting Action Groups and Alert Rules as ARM Templates, in the same manner as the Data Factory V2 team allows exporting pipeline definitions and all their related artifacts as ARM templates. This is incredibly useful for cases where we're creating a product with multiple environments.

    44 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
  19. Auto-Scale explanation of Cool-Down period

    We need better explanation of auto-scaling rules. Even after reading MSDN documentation, it is unclear.

    Does a Cool-Down period for scale-down interfere with a scale-up operation? For instance, we have a polling period of 5 minutes for scale up, and 120 minutes Cool-Down for scale down. If however, during the 120 minute Cool-Down period of scale-down no further scale-up operation can take place, this would be awful. Sadly, no information is available.

    Or what happens if a CPU-Metric-Rule says scale down: but a scale-up Memory-Metric is still firing?

    Lastly, we need infos as to whether rule-sets (more than one) are…

    38 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    6 comments  ·  Flag idea as inappropriate…  ·  Admin →
  20. Alert for Azure data factory pipeline duration

    I am using Data factory V2. It seems there is no Alerts configuration for the pipeline duration. We had some issues where pipeline got stuck and long running for hours. We can't monitor it manually and would like to know if there is any way to trigger alert if pipeline runs for more than a threshold value.

    36 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Microsoft
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5 9 10
  • Don't see your idea?

Diagnostics and Monitoring

Categories

Feedback and Knowledge Base