We do a lot of azure resource automation these days and one of the things you need to be careful of is accidently deleting something. It can also happen if someone accidently manually deletes something too.

Azure has a feature called resource locks which is pretty handy. We do also use PIM and RBAC controls on all of our resources so it tends to deal with a lot of the risk of accidently manually deleting something, but automation still has a risk.

One of the examples where you can get bitten with Terraform is if the state detects a certain property has changed it will sometimes depending on the resource and property advise that it will replace the resource rather than update it in place. I had this happen once in my dev environment where I wanted to change a property on APIM and didnt read the plan carefully enough and it replaced APIM rather than updated it. This is one of the reasons we use resource locks as a backup.

You can add a resource group level delete lock which covers everything in the resource group but I think that can have some problems with deleting deployments etc and also I want to be explicit about which things we are allowing to be deleted. If you remove a resource group level lock you can now delete everything so a mistake in your automation can still cause a problem.

The process we follow for automation normally doesnt need to replace or delete something on a day to day basis but when something needs to be removed, if we hadnt identified this up front the terraform script will fail because the resource lock stops the resource from being deleted. If we genuinely do want to remove the resource then we will use Privileged Identity Management to activate an eligible group which allows us to remove locks. We can then remove the lock on the resources we want to allow to be deleted and run terraform again.

When a lock is removed we found that sometimes it gets forgotten to put it back on so I wanted to have a script which gets run regularly to check all of the things I want locked have a lock on them. I knocked up a Powershell script to do this, which also allows me a place to choose to exclude a resource or resource type from being locked if there is a reason for that and we then setup an Azure DevOps pipeline to run this a few times per week to make sure we remove this risk.

A couple of people asked me to share how we did this so the info is below.

Powershell Script




$mainEnvironmentName = $env:CommonSetting_EnvironmentName_LowerCase
$mainResourceGroupName = "MyResourceGroupName-" + $mainEnvironmentName


function ApplyResourceLock([string] $resourceGroupName, [string] $resourceType, [string]$resourceName)
{       

    $existingLock = Get-AzResourceLock -ResourceName $resourceName -ResourceType $resourceType -ResourceGroupName $resourceGroupName
    if($existingLock -eq $null){

        $newLock = New-AzResourceLock -LockLevel CanNotDelete -LockName LockSite -ResourceName $resourceName -ResourceType $resourceType -ResourceGroupName $resourceGroupName -Force

        Write-Host 'New Lock Created, ID: ' $newLock.LockId
    }
    else{
        Write-Host 'The Lock is already in place for resource' $resourceName ', Lock ID: ' $existingLock.LockId
    }
}

Write-Host $mainEnvironmentName
Write-Host $mainResourceGroupName


$resourcesToExclude = New-Object "System.Collections.Generic.List[System.String]"     
$resourcesToExclude.Add('Add Any resources here you want to exclude' + $mainEnvironmentName) 


$resources = Get-AzResource -ResourceGroupName $mainResourceGroupName
foreach($resource in $resources){
    Write-Host ''
    Write-Host '=========================='
    Write-Host 'Checking for Resource Lock'
    Write-Host 'Resource: ' $resource.Name
    Write-Host 'Resource Type: ' $resource.Type


    #If resource is not in the list to exclude
    if($resourcesToExclude.Contains($resource.Name) -eq $false){

        #Check that a lock is applied to the resource
        ApplyResourceLock -resourceName $resource.Name  -resourceType $resource.Type -resourceGroupName $mainResourceGroupName 
    }
} 


DevOps Pipeline

In the pipeline we install Az powershell modules as in this case we run it on our local build agent which doesnt have them already on, but then its just a case of running the Azure Powershell task with a service principal which has permission to your environment.

Release Pipeline

In our case we are using a classic pipeline at the moment still and we can just trigger a scheduled release which runs the agent job on all of our environments and we can schedule it to run a few times a week to check no resource locks get missed.

 

Buy Me A Coffee