Other Articles

Cosmos DB for PostgreSQL – Monitor IO Utilization

This check ensures that IO utilization is actively monitored for Azure Cosmos DB for PostgreSQL clusters. Monitoring IO metrics helps detect performance bottlenecks, prevent resource saturation, and maintain high availability of database workloads.

Check Details

  • Resource: Azure Cosmos DB for PostgreSQL
  • Check: Ensure IO utilization is Monitored
  • Risk: Failure to monitor IO utilization may result in undetected resource exhaustion, degraded database performance, increased latency, and potential service disruption.

Remediation via Azure Portal

  1. Log in to the Azure Portal. Azure Portal dashboard
  2. Navigate to Azure Cosmos DB for PostgreSQL and select the appropriate cluster. Cosmos DB for PostgreSQL clusters
  3. Under Monitoring, select Metrics. PostgreSQL metrics blade
  4. From the metric drop-down list, select IOPS. PostgreSQL IO metrics blade
  5. Configure an Alert rule based on IO utilization thresholds appropriate for your workload.
  6. Ensure alerts are linked to an Action Group to notify administrators when thresholds are exceeded. PostgreSQL IO metrics alert

Remediation via Azure CLI

  1. Open Azure Cloud Shell or a local terminal with Azure CLI installed. Azure Cloud Shell
  2. Retrieve available IO metrics for the PostgreSQL cluster:
    az monitor metrics list-definitions \
     --resource <resource-id> \
     --query "[?contains(name.value, 'io')].name.value"
    
  3. Create an alert rule to monitor IO utilization:
    az monitor metrics alert create \
     --name <alert-name> \
     --resource-group <resource-group> \
     --scopes <resource-id> \
     --condition "avg io_utilization > 80" \
     --description "Alert when IO utilization exceeds 80 percent" \
     --evaluation-frequency 5m \
     --window-size 5m \
     --severity 2
    
  4. Verify the alert configuration:
    az monitor metrics alert list \
     --resource-group <resource-group> \
     --output table
    

Replace <resource-id>, <resource-group>, and <alert-name> with your actual values. Ensure the alert threshold aligns with your workload requirements.