Part 1 – Building A Multi-Subscription Azure Savings & Score Card Report – Data Collection & Insights

Cost, it’s something that a lot of organisations struggle with when operating in the cloud.
I have authored many posts on mechanisms you can take to reduce your costs, and they broadly apply across all the hyperscalers, but as they say, sunlight is the best disinfectant.

The sunlight I am referring to is cost visibility. Something many organisations lack to implement.

The challenge in many an organisations are that engineering teams operate in the dark about their spending impact. Finance may see ballooning costs that can be hard to correlate to business outcomes.

This isn’t a technical problem. We have two parties, one that is building, pulling architectural levers and spending and then there is another party thats paying bills that are almost always not in sync with each other.

We need a means to bring these two personas together.

The genesis of this solution came during my work helping a large Australian business, they have close to 200 Azure subscriptions. Their monthly Azure bills aren’t easily reconcilable to engineering teams and broadly speaking, engineering teams are only happy to adjust patterns when notified.

What if technical teams at the Azure subscription level could receive a monthly bill, showing them their top used services, how their spend compares against other teams and the potential savings they could make against their stack?

That’s what we want.




So in this blog post series I will show you how I was able to harvest data, analyse and then visualise

You may be thinking, Microsoft environment, why dont you use Power BI or Fabric. The challenge here is multiple

  1. The Data I Need : The data I need is a combination of cost and billing data, Azure Advisor and Azure Monitor.
  2. Look & Feel : These reports speak Microsoft’s design language. Python graces us the utmost flexibility
  3. Cost : The over arching aim is to save money, using either of the above does not help my cause.

This post can be broken in to a few key steps

  1. Harvesting Data From Azure (Cost API + Advisor + Monitor)
  2. Drawing Insights With Python (Keras + Pandas)
  3. Visualisation

Harvesting Data From Azure
The first step in this multi step process is to harvest data. I want this to look like a bill and if i visualise a bill from my energy retails (electricity) the first thing I want to see is my history. This means we need to look back months, in my case is 6 months. There are challenges in this. Hitting Azure’s API’s too hard will result in HTTP 429 errors as you can see in my example below.

Please ensure you run an az login prior to executing this code

2025-07-26 20:20:10.989 [INFO   ] Initializing Azure credentials...
2025-07-26 20:20:10.991 [DEBUG  ] Connecting to SubscriptionClient...
2025-07-26 20:20:10.991 [INFO   ] Calculating date ranges for cost analysis...
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-02-01 to 2025-02-28 (0.000s)
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-03-01 to 2025-03-31 (0.000s)
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-04-01 to 2025-04-30 (0.000s)
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-05-01 to 2025-05-31 (0.000s)
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-06-01 to 2025-06-30 (0.000s)
2025-07-26 20:20:10.991 [DEBUG  ] Generated date range: 2025-07-01 to 2025-07-31 (0.000s)
2025-07-26 20:20:10.991 [INFO   ] Processing 6 months: ['Feb 2025', 'Mar 2025', 'Apr 2025', 'May 2025', 'Jun 2025', 'Jul 2025']
2025-07-26 20:20:10.991 [INFO   ] Initializing output file: azure_cost_report.csv
2025-07-26 20:20:10.992 [DEBUG  ] Writing CSV headers: ['Subscription ID', 'Subscription Name', 'Advisor Score', 'Est. Monthly Savings', 'Feb 2025 Cost', 'Mar 2025 Cost', 'Apr 2025 Cost', 'May 2025 Cost', 'Jun 2025 Cost', 'Jul 2025 Cost', 'Top 5 Services (Current Month)']
2025-07-26 20:20:13.743 [INFO   ] Found 1 subscriptions to process
2025-07-26 20:20:13.743 [DEBUG  ] 
############################################################
2025-07-26 20:20:13.743 [INFO   ] PROCESSING SUBSCRIPTION 1/1
2025-07-26 20:20:13.743 [INFO   ] Display Name: Baldacchino - Personal
2025-07-26 20:20:13.743 [DEBUG  ] Subscription ID: xxxxxxxxxx
2025-07-26 20:20:13.743 [DEBUG  ] Processing Started: 20:20:13
2025-07-26 20:20:13.743 [INFO   ] Initializing Advisor recommendations analysis...
2025-07-26 20:20:13.744 [DEBUG  ] Fetching recommendations...
2025-07-26 20:20:14.983 [INFO   ] Retrieved 2 recommendations in 1.24s
2025-07-26 20:20:14.983 [INFO   ] Analyzing cost recommendations...
2025-07-26 20:20:14.983 [DEBUG  ] Recommendation properties: {'recommendationSubCategory': 'MonitoringAndAlerting', 'maturityLevel': 'GA', 'recommendationOfferingId': 'ed05f3b7-8949-4698-9b38-49ab8abb5ee0'}
2025-07-26 20:20:14.983 [DEBUG  ] Skipping non-cost recommendation: /subscriptions/8c2771cb-048c-401c-a465-3936aee29f05/providers/Microsoft.Advisor/recommendations/09fe3804-16f9-6077-4c40-2632b45392a2
2025-07-26 20:20:14.983 [DEBUG  ] Recommendation properties: {'region': 'eastus', 'recommendationSubCategory': 'HighAvailability', 'maturityLevel': 'GA', 'recommendationOfferingId': '021d60de-795e-468a-953b-cb0067f8542e'}
2025-07-26 20:20:14.983 [DEBUG  ] Skipping non-cost recommendation: /subscriptions/8c2771cb-048c-401c-a465-3936aee29f05/resourcegroups/baldacchino_personal/providers/microsoft.network/virtualnetworks/ubuntupremssdv2-vnet/providers/Microsoft.Advisor/recommendations/53c869a3-f373-d793-5721-68e7b225c836
2025-07-26 20:20:14.983 [INFO   ] Cost analysis completed - 0 relevant recommendations
2025-07-26 20:20:14.983 [DEBUG  ] Savings Breakdown:
  Annual Conversions: 0 recs ($0.00)
  Direct Monthly: 0 recs ($0.00)
  Generic Savings: 0 recs ($0.00)
2025-07-26 20:20:14.983 [INFO   ] No cost recommendations for scoring
2025-07-26 20:20:14.983 [INFO   ] 
Starting historical cost analysis...
2025-07-26 20:20:14.984 [INFO   ] Processing month 1/6: Feb 2025
2025-07-26 20:20:14.984 [DEBUG  ] Executing cost query for Feb 2025...
2025-07-26 20:20:14.984 [DEBUG  ] Initiating Cost Query Feb 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:20.424 [DEBUG  ] Cost Query Feb 2025 succeeded in 5.44s (attempt 1)
2025-07-26 20:20:20.425 [DEBUG  ] Retrieved cost data: $2.17
2025-07-26 20:20:20.425 [INFO   ] Month Feb 2025 cost: $2.17
2025-07-26 20:20:20.425 [DEBUG  ] Month 1/6 completed in 5.44s - Waiting 2.8s
2025-07-26 20:20:23.247 [INFO   ] Processing month 2/6: Mar 2025
2025-07-26 20:20:23.247 [DEBUG  ] Executing cost query for Mar 2025...
2025-07-26 20:20:23.247 [DEBUG  ] Initiating Cost Query Mar 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:26.292 [DEBUG  ] Cost Query Mar 2025 succeeded in 3.04s (attempt 1)
2025-07-26 20:20:26.292 [DEBUG  ] Retrieved cost data: $8.72
2025-07-26 20:20:26.292 [INFO   ] Month Mar 2025 cost: $8.72
2025-07-26 20:20:26.292 [DEBUG  ] Month 2/6 completed in 3.05s - Waiting 4.0s
2025-07-26 20:20:30.261 [INFO   ] Processing month 3/6: Apr 2025
2025-07-26 20:20:30.261 [DEBUG  ] Executing cost query for Apr 2025...
2025-07-26 20:20:30.261 [DEBUG  ] Initiating Cost Query Apr 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:33.488 [DEBUG  ] Cost Query Apr 2025 succeeded in 3.23s (attempt 1)
2025-07-26 20:20:33.488 [DEBUG  ] Retrieved cost data: $5.00
2025-07-26 20:20:33.488 [INFO   ] Month Apr 2025 cost: $5.00
2025-07-26 20:20:33.488 [DEBUG  ] Month 3/6 completed in 3.23s - Waiting 2.7s
2025-07-26 20:20:36.228 [INFO   ] Processing month 4/6: May 2025
2025-07-26 20:20:36.228 [DEBUG  ] Executing cost query for May 2025...
2025-07-26 20:20:36.229 [DEBUG  ] Initiating Cost Query May 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:38.865 [DEBUG  ] Cost Query May 2025 succeeded in 2.64s (attempt 1)
2025-07-26 20:20:38.865 [DEBUG  ] Retrieved cost data: $7.94
2025-07-26 20:20:38.865 [INFO   ] Month May 2025 cost: $7.94
2025-07-26 20:20:38.865 [DEBUG  ] Month 4/6 completed in 2.64s - Waiting 3.3s
2025-07-26 20:20:42.173 [INFO   ] Processing month 5/6: Jun 2025
2025-07-26 20:20:42.173 [DEBUG  ] Executing cost query for Jun 2025...
2025-07-26 20:20:42.173 [DEBUG  ] Initiating Cost Query Jun 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:43.071 [WARNING] RATE LIMIT: Cost Query Jun 2025 | Attempt 1/10 | Retry in 3.5s | Error: (429) Too many requests. Please retry.
Code: 429
Message: Too many requests. Please retry.
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:48.024 [WARNING] RATE LIMIT: Cost Query Jun 2025 | Attempt 2/10 | Retry in 8.6s | Error: (429) Too many requests. Please retry.
Code: 429
Message: Too many requests. Please retry.
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:20:57.931 [WARNING] RATE LIMIT: Cost Query Jun 2025 | Attempt 3/10 | Retry in 12.2s | Error: (429) Too many requests. Please retry.
Code: 429
Message: Too many requests. Please retry.
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:21:11.135 [WARNING] RATE LIMIT: Cost Query Jun 2025 | Attempt 4/10 | Retry in 24.3s | Error: (429) Too many requests. Please retry.
Code: 429
Message: Too many requests. Please retry.
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:21:40.536 [DEBUG  ] Cost Query Jun 2025 succeeded in 5.06s (attempt 5)
2025-07-26 20:21:40.536 [DEBUG  ] Retrieved cost data: $2.20
2025-07-26 20:21:40.536 [INFO   ] Month Jun 2025 cost: $2.20
2025-07-26 20:21:40.536 [DEBUG  ] Month 5/6 completed in 58.36s - Waiting 3.0s
2025-07-26 20:21:43.499 [INFO   ] Processing month 6/6: Jul 2025
2025-07-26 20:21:43.500 [DEBUG  ] Executing cost query for Jul 2025...
2025-07-26 20:21:43.500 [DEBUG  ] Initiating Cost Query Jul 2025...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:21:46.522 [DEBUG  ] Cost Query Jul 2025 succeeded in 3.02s (attempt 1)
2025-07-26 20:21:46.522 [DEBUG  ] Retrieved cost data: $1.82
2025-07-26 20:21:46.522 [INFO   ] Month Jul 2025 cost: $1.82
2025-07-26 20:21:46.522 [DEBUG  ] Month 6/6 completed in 3.02s - Waiting 2.6s
2025-07-26 20:21:49.156 [INFO   ] 
Analyzing service costs for current month...
2025-07-26 20:21:49.157 [DEBUG  ] Executing service breakdown query...
2025-07-26 20:21:49.157 [DEBUG  ] Initiating Service Breakdown...
Datetime with no tzinfo will be considered UTC.
Datetime with no tzinfo will be considered UTC.
2025-07-26 20:21:52.138 [DEBUG  ] Service Breakdown succeeded in 2.98s (attempt 1)
2025-07-26 20:21:52.138 [INFO   ] Top services identified: Azure DNS ($1.41), Storage ($0.41)
2025-07-26 20:21:52.138 [INFO   ] 
Finalizing data for CSV export...
2025-07-26 20:21:52.138 [DEBUG  ] Writing CSV row: ['8c2771cb-048c-401c-a465-3936aee29f05', 'Baldacchino - Personal', 'N/A', 0.0, 2.17, 8.72, 5.0, 7.94, 2.2, 1.82, 'Azure DNS ($1.41), Storage ($0.41)']
2025-07-26 20:21:52.139 [INFO   ] Data successfully committed to CSV
2025-07-26 20:21:52.139 [INFO   ] Subscription completed in 98.4s
Progress: 1/1 (100.0%)
Elapsed: 1.7m | ETA: 0.0m | Remaining: 0 subs
2025-07-26 20:21:52.140 [INFO   ] 
============================================================
2025-07-26 20:21:52.140 [INFO   ] PROCESSING COMPLETE
2025-07-26 20:21:52.140 [INFO   ] Total subscriptions processed: 1
2025-07-26 20:21:52.140 [INFO   ] Total execution time: 101.1s (1.7m)
2025-07-26 20:21:52.140 [INFO   ] Output file: azure_cost_report.csv
2025-07-26 20:21:52.140 [INFO   ] ============================================================
(3.11.8) baldacchino_admin@Shanes-MacBook-Pro % 
Progress: 1/1 (100.0%)
Elapsed: 1.7m | ETA: 0.0m | Remaining: 0 subs
2025-07-26 20:21:52.140 [INFO   ] 



You will see in the code below I am implementing exponential back offs. This is needed.
This code will

  • Enumerate every subscription
    • Advisor : Set the scope of Azure Advisor to the Subscription Level
      • Retrieve the set of recommendations from Azure Advisor and calculate the savings per recommendation and calculate total savings
    • Cost History : Calculate past 6 months costs broken down month
      • Exponential back-offs are required as this process requires multiple calls to the Azure Cost API
    • Top 5 Services : Current months cost is broken down in to Azure Service Families, showing the service families (Compute etc) that make up the main subscription costs

This code will take some time to run. In 150 subscriptions it takes close to 4 hours run time. It will create a CSV upon competition called ‘azure_cost_report.csv‘. This CSV containers the raw data

import csv
import time
import random
from datetime import datetime
from dateutil.relativedelta import relativedelta
from azure.identity import DefaultAzureCredential
from azure.mgmt.resource import SubscriptionClient
from azure.mgmt.costmanagement import CostManagementClient
from azure.mgmt.advisor import AdvisorManagementClient
from azure.core.exceptions import HttpResponseError

# Configuration
MAX_RETRIES = 10
INITIAL_DELAY = 2.0
MAX_DELAY = 60.0
JITTER_FACTOR = 0.3
VERBOSE_LOGGING = True

def log(message, level="INFO", verbose=False):
    """Enhanced logging with verbosity control and microsecond precision"""
    if verbose and not VERBOSE_LOGGING:
        return
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
    print(f"{timestamp} [{level.ljust(7)}] {message}")

def exponential_backoff(retries):
    """Calculate backoff delay with jitter"""
    delay = min(INITIAL_DELAY * (2 ** retries), MAX_DELAY)
    jitter = delay * JITTER_FACTOR * random.uniform(-1, 1)
    return max(delay + jitter, 0.1)

def query_with_retry(client, scope, query_params, operation_name):
    """Execute query with detailed retry logging"""
    log(f"Initiating {operation_name}...", "DEBUG", verbose=True)
    for attempt in range(1, MAX_RETRIES+1):
        try:
            attempt_start = time.time()
            result = client.query.usage(scope=scope, parameters=query_params)
            elapsed = time.time() - attempt_start
            log(f"{operation_name} succeeded in {elapsed:.2f}s (attempt {attempt})", "DEBUG", verbose=True)
            return result
        except HttpResponseError as e:
            if e.status_code == 429:
                delay = exponential_backoff(attempt)
                log((f"RATE LIMIT: {operation_name} | "
                     f"Attempt {attempt}/{MAX_RETRIES} | "
                     f"Retry in {delay:.1f}s | "
                     f"Error: {e.message}"), "WARNING")
                time.sleep(delay)
                continue
            log(f"HTTP Error {e.status_code}: {e.message}", "ERROR")
            raise
        except Exception as e:
            log(f"Critical failure during {operation_name}: {str(e)}", "CRITICAL")
            raise
    return None

def format_currency(amount):
    """Format currency with thousands separators"""
    return f"${amount:,.2f}"

# Main execution flow
log("Initializing Azure credentials...", "INFO")
credential = DefaultAzureCredential()

log("Connecting to SubscriptionClient...", "DEBUG", verbose=True)
subscription_client = SubscriptionClient(credential)

# Generate 6-month date ranges
log("Calculating date ranges for cost analysis...", "INFO")
today = datetime.utcnow()
six_months = []
for i in range(5, -1, -1):
    calculation_start = time.time()
    target_date = today - relativedelta(months=i)
    first_day = target_date.replace(day=1, hour=0, minute=0, second=0, microsecond=0)
    last_day = (first_day + relativedelta(months=1)) - relativedelta(days=1)
    six_months.append({
        'name': first_day.strftime("%b %Y"),
        'start': first_day,
        'end': last_day
    })
    log(f"Generated date range: {first_day.date()} to {last_day.date()} ({time.time()-calculation_start:.3f}s)", 
        "DEBUG", verbose=True)

log(f"Processing {len(six_months)} months: {[m['name'] for m in six_months]}", "INFO")

# CSV Setup
CSV_PATH = 'azure_cost_report.csv'
log(f"Initializing output file: {CSV_PATH}", "INFO")
with open(CSV_PATH, 'w', newline='', buffering=1) as csv_file:
    writer = csv.writer(csv_file)
    
    # Build headers
    headers = [
        'Subscription ID', 
        'Subscription Name', 
        'Advisor Score',
        'Est. Monthly Savings'
    ]
    headers += [f"{m['name']} Cost" for m in six_months]
    headers.append('Top 5 Services (Current Month)')
    
    log(f"Writing CSV headers: {headers}", "DEBUG", verbose=True)
    writer.writerow(headers)
    csv_file.flush()

    total_subscriptions = 0
    global_start = time.time()
    subscription_list = list(subscription_client.subscriptions.list())
    
    log(f"Found {len(subscription_list)} subscriptions to process", "INFO")
    
    for sub in subscription_list:
        sub_id = sub.subscription_id
        sub_name = sub.display_name
        total_subscriptions += 1
        sub_start = time.time()
        
        log(f"\n{'#' * 60}", "DEBUG", verbose=True)
        log(f"PROCESSING SUBSCRIPTION {total_subscriptions}/{len(subscription_list)}", "INFO")
        log(f"Display Name: {sub_name}", "INFO")
        log(f"Subscription ID: {sub_id}", "DEBUG", verbose=True)
        log(f"Processing Started: {datetime.now().strftime('%H:%M:%S')}", "DEBUG", verbose=True)

        # Advisor Recommendations Analysis
        log("Initializing Advisor recommendations analysis...", "INFO")
        advisor_client = AdvisorManagementClient(credential, sub_id)
        cost_recommendations = []
        total_monthly_savings = 0.0
        cost_score = "N/A"
        
        try:
            log("Fetching recommendations...", "DEBUG", verbose=True)
            rec_start = time.time()
            recommendations = list(advisor_client.recommendations.list())
            rec_duration = time.time() - rec_start
            log(f"Retrieved {len(recommendations)} recommendations in {rec_duration:.2f}s", "INFO")

            savings_breakdown = {
                'annual': {'count': 0, 'amount': 0.0},
                'monthly': {'count': 0, 'amount': 0.0},
                'generic': {'count': 0, 'amount': 0.0}
            }

            log("Analyzing cost recommendations...", "INFO")
            # In the recommendation processing loop:
            for idx, recommendation in enumerate(recommendations, 1):
                try:
                    props = getattr(recommendation, 'extended_properties', {}) or {}
                    log(f"Recommendation properties: {props}", "DEBUG", verbose=True)
                    
                    # Validate savings fields
                    valid_saving_keys = ['savingsAmount', 'annualSavingsAmount', 'monthlySavingsAmount']
                    has_valid_savings = any(float(props.get(k, 0)) > 0 for k in valid_saving_keys if k in props)
                    
                    if recommendation.category == 'Cost' and has_valid_savings:
                        cost_recommendations.append(recommendation)
                        
                        # Log recommendation details
                        if hasattr(recommendation, 'short_description') and hasattr(recommendation.short_description, 'text'):
                            desc = recommendation.short_description.text
                        else:
                            desc = "No description available"
                        log(f"Cost Recommendation {idx}: {desc}", "INFO")
                        
                        # Detailed savings processing
                        try:
                            savings = 0.0
                            if 'annualSavingsAmount' in props:
                                savings = float(props['annualSavingsAmount']) / 12
                                savings_breakdown['annual']['count'] += 1
                            elif 'monthlySavingsAmount' in props:
                                savings = float(props['monthlySavingsAmount'])
                                savings_breakdown['monthly']['count'] += 1
                            elif 'savingsAmount' in props:
                                savings = float(props['savingsAmount'])
                                savings_breakdown['generic']['count'] += 1
                            
                            total_monthly_savings += savings
                            log(f"Applied savings: {format_currency(savings)}", "DEBUG", verbose=True)
                            
                        except Exception as e:
                            log(f"Failed to process savings for recommendation {recommendation.id}: {str(e)}", "ERROR")
                            
                    else:
                        log(f"Skipping non-cost recommendation: {recommendation.id}", "DEBUG", verbose=True)
                        
                except Exception as e:
                    log(f"General error processing recommendation {idx}: {str(e)}", "ERROR")

            log(f"Cost analysis completed - {len(cost_recommendations)} relevant recommendations", "INFO")
            log(("Savings Breakdown:\n" 
                 f"  Annual Conversions: {savings_breakdown['annual']['count']} recs ({format_currency(savings_breakdown['annual']['amount'])})\n"
                 f"  Direct Monthly: {savings_breakdown['monthly']['count']} recs ({format_currency(savings_breakdown['monthly']['amount'])})\n"
                 f"  Generic Savings: {savings_breakdown['generic']['count']} recs ({format_currency(savings_breakdown['generic']['amount'])})"), 
                 "DEBUG", verbose=True)

            # Impact score calculation
            if cost_recommendations:
                log("Calculating Advisor impact score...", "DEBUG", verbose=True)
                impact_levels = {'High': 3, 'Medium': 2, 'Low': 1}
                try:
                    cost_score = max(
                        cost_recommendations,
                        key=lambda x: impact_levels.get(getattr(x, 'impact', 'Low'), 0)
                    ).impact
                    log(f"Determined Advisor score: {cost_score}", "INFO")
                except Exception as e:
                    log(f"Score calculation error: {str(e)}", "ERROR")
                    cost_score = "Error"
            else:
                log("No cost recommendations for scoring", "INFO")

        except Exception as e:
            log(f"Advisor processing failed: {str(e)}", "ERROR")

        # Cost Data Collection
        log("\nStarting historical cost analysis...", "INFO")
        cost_client = CostManagementClient(credential)
        monthly_costs = []
        
        for month_idx, month in enumerate(six_months, 1):
            month_start = time.time()
            log(f"Processing month {month_idx}/6: {month['name']}", "INFO")
            
            try:
                query = {
                    "type": "Usage",
                    "timeframe": "Custom",
                    "timePeriod": {
                        "from": month['start'],
                        "to": month['end']
                    },
                    "dataset": {
                        "granularity": "None",
                        "aggregation": {
                            "totalCost": {
                                "name": "PreTaxCost",
                                "function": "Sum"
                            }
                        }
                    }
                }

                log(f"Executing cost query for {month['name']}...", "DEBUG", verbose=True)
                result = query_with_retry(
                    cost_client,
                    f"/subscriptions/{sub_id}",
                    query,
                    f"Cost Query {month['name']}"
                )

                if result and result.rows:
                    cost = sum(row[0] for row in result.rows)
                    log(f"Retrieved cost data: {format_currency(cost)}", "DEBUG", verbose=True)
                else:
                    cost = 0.0
                    log("No cost data available", "WARNING")
                
                monthly_costs.append(round(cost, 2))
                log(f"Month {month['name']} cost: {format_currency(cost)}", "INFO")

            except Exception as e:
                monthly_costs.append(0.0)
                log(f"Month {month['name']} failed: {str(e)}", "ERROR")
            
            finally:
                delay = random.uniform(1.5, 4.0)
                log(f"Month {month_idx}/6 completed in {time.time()-month_start:.2f}s - Waiting {delay:.1f}s", "DEBUG", verbose=True)
                time.sleep(delay)

        # Service Breakdown Analysis
        log("\nAnalyzing service costs for current month...", "INFO")
        current_month = six_months[-1]
        top_services = []
        
        try:
            service_query = {
                "type": "Usage",
                "timeframe": "Custom",
                "timePeriod": {
                    "from": current_month['start'],
                    "to": current_month['end']
                },
                "dataset": {
                    "granularity": "None",
                    "aggregation": {
                        "totalCost": {
                            "name": "PreTaxCost",
                            "function": "Sum"
                        }
                    },
                    "grouping": [
                        {
                            "type": "Dimension",
                            "name": "ServiceName"
                        }
                    ]
                }
            }

            log("Executing service breakdown query...", "DEBUG", verbose=True)
            result = query_with_retry(
                cost_client,
                f"/subscriptions/{sub_id}",
                service_query,
                "Service Breakdown"
            )

            if result and result.rows:
                sorted_services = sorted(result.rows, key=lambda x: x[0], reverse=True)[:5]
                top_services = [f"{svc[1]} ({format_currency(svc[0])})" for svc in sorted_services]
                log(f"Top services identified: {', '.join(top_services)}", "INFO")
            else:
                log("No service breakdown data available", "WARNING")

        except Exception as e:
            log(f"Service analysis failed: {str(e)}", "ERROR")

        # CSV Writing
        log("\nFinalizing data for CSV export...", "INFO")
        csv_row = [
            sub_id,
            sub_name,
            cost_score,
            round(total_monthly_savings, 2),
            *monthly_costs,
            ", ".join(top_services) if top_services else "No data"
        ]
        
        log(f"Writing CSV row: {csv_row}", "DEBUG", verbose=True)
        writer.writerow(csv_row)
        csv_file.flush()
        log("Data successfully committed to CSV", "INFO")

        # Progress Tracking
        sub_elapsed = time.time() - sub_start
        total_elapsed = time.time() - global_start
        avg_time_per_sub = total_elapsed / total_subscriptions
        remaining_subs = len(subscription_list) - total_subscriptions
        eta = avg_time_per_sub * remaining_subs
        
        log((
            f"Subscription completed in {sub_elapsed:.1f}s\n"
            f"Progress: {total_subscriptions}/{len(subscription_list)} "
            f"({(total_subscriptions/len(subscription_list)*100):.1f}%)\n"
            f"Elapsed: {total_elapsed/60:.1f}m | "
            f"ETA: {eta/60:.1f}m | "
            f"Remaining: {remaining_subs} subs"
        ), "INFO")

log("\n" + "=" * 60, "INFO")
log("PROCESSING COMPLETE", "INFO", verbose=True)
log(f"Total subscriptions processed: {total_subscriptions}", "INFO")
log(f"Total execution time: {time.time()-global_start:.1f}s ({((time.time()-global_start)/60):.1f}m)", "INFO")
log(f"Output file: {CSV_PATH}", "INFO")
log("=" * 60, "INFO")


Drawing Insights With Python (Keras + Pandas)
After we have our CSV full of RAW data we need want to draw insights. When I authored this code what was important to me was to find those subscriptions with high rates of change over time. I am not so fussed about subscriptions with high spend, if they have historically been spending large. What I want to know of is accounts that have increased spend both in absolute dollars and percentage.

So for this we need to break out some core data engineering Python libraries in Pandas, Matplot and Numpy. The following code will analyse azure_cost_report.csv and allow you to draw some quick insights

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import numpy as np

# Function to parse and analyze the CSV data (updated for wide format)
def analyze_subscription_costs(csv_file):
    # Read the CSV data
    df = pd.read_csv(csv_file)

    # Standardize column names (remove leading/trailing spaces)
    df.columns = [col.strip() for col in df.columns]

    # Identify monthly cost columns (e.g., 'Feb 2025 Cost', ...)
    month_cols = [col for col in df.columns if 'Cost' in col and 'Est.' not in col and 'Savings' not in col]

    # Melt the dataframe to long format
    df_long = df.melt(
        id_vars=['Subscription Name'],
        value_vars=month_cols,
        var_name='Month',
        value_name='Cost'
    )

    # Clean up Month column to just the month (e.g., 'Feb 2025')
    df_long['Month'] = df_long['Month'].str.replace(' Cost', '', regex=False)
    # Parse month to datetime for sorting
    df_long['Month'] = pd.to_datetime(df_long['Month'], format='%b %Y')
    df_long['Cost'] = pd.to_numeric(df_long['Cost'], errors='coerce').fillna(0)

    # Group by subscription and month, summing the costs
    monthly_costs = df_long.groupby(['Month', 'Subscription Name'])['Cost'].sum().reset_index()

    # Get list of unique months and subscriptions
    months = sorted(df_long['Month'].unique())
    subscriptions = df_long['Subscription Name'].unique()

    # Create a results dataframe to store the changes
    results = []

    for subscription in subscriptions:
        # Skip empty subscription names
        if not subscription:
            continue
            
        # Get data for this subscription
        sub_data = monthly_costs[monthly_costs['Subscription Name'] == subscription]
        
        if len(sub_data) <= 1:
            continue  # Skip if not enough data for comparison
            
        # Sort by month
        sub_data = sub_data.sort_values('Month')
        
        # Calculate first and last month costs
        first_month = sub_data.iloc[0]['Month']
        last_month = sub_data.iloc[-1]['Month']
        first_cost = sub_data.iloc[0]['Cost']
        last_cost = sub_data.iloc[-1]['Cost']
        
        # Calculate overall change
        absolute_change = last_cost - first_cost
        if first_cost > 0:
            percentage_change = (absolute_change / first_cost) * 100
        elif absolute_change > 0:
            percentage_change = float('inf')
        else:
            percentage_change = 0
        
        # Calculate month-on-month changes
        mom_changes = []
        for i in range(1, len(sub_data)):
            prev_cost = sub_data.iloc[i-1]['Cost']
            curr_cost = sub_data.iloc[i]['Cost']
            mom_change = (curr_cost - prev_cost) / prev_cost * 100 if prev_cost > 0 else 0
            mom_changes.append({
                'From': sub_data.iloc[i-1]['Month'],
                'To': sub_data.iloc[i]['Month'],
                'Change%': mom_change,
                'ChangeAmount': curr_cost - prev_cost
            })
        
        results.append({
            'SubscriptionName': subscription,
            'FirstMonth': first_month,
            'LastMonth': last_month,
            'StartCost': first_cost,
            'EndCost': last_cost,
            'AbsoluteChange': absolute_change,
            'PercentageChange': percentage_change,
            'MonthlyChanges': mom_changes
        })
    
    # Sort results by percentage change (descending)
    results_by_percentage = sorted(results, key=lambda x: x['PercentageChange'], reverse=True)
    
    # Sort results by absolute change (descending)
    results_by_absolute = sorted(results, key=lambda x: x['AbsoluteChange'], reverse=True)
    
    return results_by_percentage, results_by_absolute, months

# Function to print the results
def print_results(results_by_percentage, results_by_absolute):
    print("===== SUBSCRIPTIONS RANKED BY PERCENTAGE CHANGE (HIGHEST TO LOWEST) =====")
    for i, result in enumerate(results_by_percentage[:50], 1):  # Top 20
        print(f"{i}. {result['SubscriptionName']}")
        print(f"   Period: {result['FirstMonth']} to {result['LastMonth']}")
        print(f"   Start Cost: ${result['StartCost']:.2f} → End Cost: ${result['EndCost']:.2f}")
        pct = result['PercentageChange']
        pct_str = '∞' if pct == float('inf') else f"{pct:.2f}%"
        print(f"   Change: ${result['AbsoluteChange']:.2f} ({pct_str})")
        print("   Month-to-month changes:")
        for change in result['MonthlyChanges']:
            direction = "▲" if change['Change%'] > 0 else "▼"
            print(f"     {change['From']} → {change['To']}: {direction} ${change['ChangeAmount']:.2f} ({change['Change%']:.2f}%)")
        print()
    print("\n===== SUBSCRIPTIONS RANKED BY ABSOLUTE CHANGE (HIGHEST TO LOWEST) =====")
    for i, result in enumerate(results_by_absolute[:50], 1):  # Top 20
        print(f"{i}. {result['SubscriptionName']}")
        print(f"   Period: {result['FirstMonth']} to {result['LastMonth']}")
        print(f"   Start Cost: ${result['StartCost']:.2f} → End Cost: ${result['EndCost']:.2f}")
        pct = result['PercentageChange']
        pct_str = '∞' if pct == float('inf') else f"{pct:.2f}%"
        print(f"   Change: ${result['AbsoluteChange']:.2f} ({pct_str})")
        print()

# Function to visualize the top changing subscriptions
def plot_top_subscriptions(results_by_absolute, months, n=5):
    # Get top n subscriptions by absolute change
    top_subs = results_by_absolute[:n]
    
    plt.figure(figsize=(14, 8))
    
    # For each subscription, plot a line
    for result in top_subs:
        sub_name = result['SubscriptionName']
        monthly_data = {change['To']: change['ChangeAmount'] for change in result['MonthlyChanges']}
        
        # Extract the months and corresponding costs
        x = [str(m) for m in months if m >= result['FirstMonth'] and m <= result['LastMonth']]
        y = [monthly_data.get(m, 0) for m in months if m >= result['FirstMonth'] and m <= result['LastMonth']]
        
        plt.plot(x, y, marker='o', linewidth=2, label=f"{sub_name} (${result['AbsoluteChange']:.2f})")
    
    plt.title('Month-on-month Changes for Top Subscriptions')
    plt.xlabel('Month')
    plt.ylabel('Change Amount (AUD)')
    plt.grid(True, alpha=0.3)
    plt.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig('subscription_changes.png')
    plt.show()

# Main execution
csv_file = "azure_cost_report.csv"
results_by_percentage, results_by_absolute, months = analyze_subscription_costs(csv_file)
print_results(results_by_percentage, results_by_absolute)
plot_top_subscriptions(results_by_absolute, months)

# Generate a summary report

def generate_summary_report(results_by_absolute, csv_file):
    # Re-read and melt the data to get month totals
    df = pd.read_csv(csv_file)
    df.columns = [col.strip() for col in df.columns]
    month_cols = [col for col in df.columns if 'Cost' in col and 'Est.' not in col and 'Savings' not in col]
    df_long = df.melt(
        id_vars=['Subscription Name'],
        value_vars=month_cols,
        var_name='Month',
        value_name='Cost'
    )
    df_long['Month'] = df_long['Month'].str.replace(' Cost', '', regex=False)
    df_long['Month'] = pd.to_datetime(df_long['Month'], format='%b %Y')
    df_long['Cost'] = pd.to_numeric(df_long['Cost'], errors='coerce').fillna(0)
    # Calculate total cost per month
    month_totals = df_long.groupby('Month')['Cost'].sum().sort_index()
    # Calculate month-on-month $ and % change
    month_changes = month_totals.diff()
    month_pct_changes = month_totals.pct_change() * 100

    # Calculate month-on-month increase/decrease/unchanged counts for subscriptions
    # Aggregate costs by Subscription Name and Month to avoid duplicates
    grouped = df_long.groupby(['Subscription Name', 'Month'], as_index=False)['Cost'].sum()
    df_pivot = grouped.pivot(index='Subscription Name', columns='Month', values='Cost').fillna(0)
    month_labels = list(df_pivot.columns)
    mom_subs = []
    for i in range(1, len(month_labels)):
        increased = []
        decreased = []
        unchanged = []
        for sub in df_pivot.index:
            prev = df_pivot.loc[sub, month_labels[i-1]]
            curr = df_pivot.loc[sub, month_labels[i]]
            if curr > prev:
                increased.append(sub)
            elif curr < prev:
                decreased.append(sub)
            else:
                unchanged.append(sub)
        mom_subs.append({
            'from': month_labels[i-1].strftime('%B %Y'),
            'to': month_labels[i].strftime('%B %Y'),
            'increased': increased,
            'decreased': decreased,
            'unchanged': unchanged
        })

    # Calculate total change over all months (first to last)
    total_change = month_totals.iloc[-1] - month_totals.iloc[0]
    total_pct_change = (total_change / month_totals.iloc[0]) * 100 if month_totals.iloc[0] != 0 else 0

    total_increase = sum(r['AbsoluteChange'] for r in results_by_absolute if r['AbsoluteChange'] > 0)
    total_decrease = sum(r['AbsoluteChange'] for r in results_by_absolute if r['AbsoluteChange'] < 0)
    net_change = total_increase + total_decrease
    increase_count = len([r for r in results_by_absolute if r['AbsoluteChange'] > 0])
    decrease_count = len([r for r in results_by_absolute if r['AbsoluteChange'] < 0])
    same_count = len([r for r in results_by_absolute if r['AbsoluteChange'] == 0])

    print("\n===== SUMMARY REPORT =====")
    print(f"Total number of subscriptions analyzed: {len(results_by_absolute)}")
    print(f"Subscriptions with increased cost (total duration): {increase_count}")
    print(f"Subscriptions with decreased cost (total duration): {decrease_count}")
    print(f"Subscriptions with unchanged cost (total duration): {same_count}")
    print()
    for mom in mom_subs:
        print(f"{mom['from']} → {mom['to']}: ")
        print(f"  Subscriptions increased: {len(mom['increased'])}")
        print(f"    {', '.join(mom['increased']) if mom['increased'] else 'None'}")
        print(f"  Subscriptions decreased: {len(mom['decreased'])}")
        print(f"    {', '.join(mom['decreased']) if mom['decreased'] else 'None'}")
        print(f"  Subscriptions unchanged: {len(mom['unchanged'])}")
        print(f"    {', '.join(mom['unchanged']) if mom['unchanged'] else 'None'}")
    print()
    for i, (month, total) in enumerate(month_totals.items()):
        month_str = month.strftime('%B %Y')
        if i == 0:
            print(f"{month_str} - Total Cost For All Subscriptions: ${total:,.2f}")
        else:
            change = month_changes.iloc[i]
            pct = month_pct_changes.iloc[i]
            print(f"{month_str} - Total Cost For All Subscriptions: ${total:,.2f} | Change: ${change:,.2f} ({pct:+.2f}%) from prior month")
    print()
    print(f"Total increase amount (subscription level): ${total_increase:.2f}")
    print(f"Total decrease amount (subscription level): ${total_decrease:.2f}")
    print(f"Net change (subscription level): ${net_change:.2f}")
    print(f"Total change in overall cost (all months): ${total_change:,.2f} ({total_pct_change:+.2f}%)")

generate_summary_report(results_by_absolute, csv_file)

Summary

That’s it for part 1, the hard work is in these two snippets of code. One does the harvesting and the other helps make sense of raw data, helping you find those needles in the haystack.

The true value isn’t in the Python code – it’s in the cultural transformation and visibility they provide with builders seeing a direct connections between their architecture choices and financial outcomes.

In Part 2 of this post we will create visual output, making the text output here really pop.

Thanks
Shane Baldacchino

Leave a Comment