Sometimes the Virtual Machine breaks down and can’t be restored from the backup. It can occur both if there is no backup and if the backup didn’t work for some reason. Fixing non-bootable VM can be complicated and time-consuming task. In this post I will describe how to fix non-booting VM using az cli commands.
First thing you should consider when creating any new Azure VM (it can be enabled for existing VMs too) is Boot Diagnostics. The Boot Diagnostics gives the ability to see your VM status on the screenshot in Azure Portal. Based on that, one can see what kind of error prevented the VM from working normally (More about Boot Diagnostic can be read on official Microsoft Page: https://docs.microsoft.com/en-us/azure/virtual-machines/boot-diagnostics).
Once discovered the root-cause of an issue one can start working on remediating it. Microsoft provides us a command, which can help remediate the non-booting Virtual Machine. Below I will describe two ways of fixing your VM using “az vm repair” command (more information can be found on Microsoft related page: https://docs.microsoft.com/en-us/troubleshoot/azure/virtual-machines/repair-windows-vm-using-azure-virtual-machine-repair-commands).
Repair script:
- Open cloud shell or local PowerShell session and login to Azure using az cli
- Install “vm repair” extension by running – “az extension add -n vm-repair”.
- Create new repair Virtual Machine by running :“az vm repair create -g broken_vm_resource_group_name -n broken_vm_name –repair-username new_repair_vm_username –repair-password new_repair_vm_password”. This command will create new ResourceGroup with repair VM, make a copy of broken VM OsDisk and mount it as a DataDisk on repair VM. You will be asked about the preference to create a Public IP for repair VM. In this case we won’t need a Public IP.
- After the repair VM has been created one can run one of several scripts, which can help repair your VM. –run-on-repair switch indicates that repair script will be run on repair VM. In this example we can choose script which runs a sfc /scannow command – “az vm repair run -g broken_vm_resource_group_name -n broken_vm_name –run-on-repair –run-id win-sfc-sf-corruption –verbose”. All scripts are located in Microsoft script library: https://github.com/Azure/repair-script-library. After script finishes one can see the script output. Based on that, it can be decided if we want to run any of other scripts or finish the process.
- If satisfied with the script outcome the process can be finished by switching OsDisk on your broken VM with its remediated copy from repair VM. To complete that step, run: “az vm repair restore -g broken_vm_resource_group_name -n broken_vm_name –verbose”. You will be asked if you want to remove the repair VM and its ResourceGroup. In this case you can confirm the deletion.
- Now the broken VM should be working properly.
Nested VM:
If running repair script didn’t solve the issue, you can create repair VM with enabled nested Hyper-V. This operation allows more advanced troubleshooting.
- Open cloud shell or local PowerShell session and login to Azure using az cli
- Install az extension by running – “az extension add -n vm-repair” (if it hasn’t been installed before)
- Create new repair Virtual Machine by running :“az vm repair create -g broken_vm_resource_group_name -n broken_vm_name –repair-username new_repair_vm_username –repair-password new_repair_vm_password –enable-nested –verbose”. This command will create new ResourceGroup with repair VM inside, install Hyper-V on repair VM, make a copy of your broken VM’s OsDisk and mount it as a OsDisk to a Hyper-V VM inside repair. You will be asked if You want to create a Public IP for repair VM. In this case we need Public IP.
Now you can RDP to Your repair VM and inspect nested VM.
Repairing Nested VM using WindowsRE:
- Disable Internet Explorer and download ISO file related to your OS version (https://www.microsoft.com/en-us/evalcenter/download-windows-server-2019).
- Mount ISO to your nested VM (remember to change boot order, so the VM would first boot from CD).
- Launch Windows Server Wizard and choose “Repair your computer”.
- Select “Troubleshoot”.
- Select “Command Prompt”.
In this example I will run “sfc/ scannow” command – it will have the same effect as repair script. Firstly, one needs to find out the letter associated with Windows partition. In order to do that, I use BCDEdit command. Once the correct partition letter is found, one can start sfc scan by running “_**sfc /scannow /offbootdir=D: /offwindir=D:Windows”.**_
After a couple of minutes we get command results – all corrupted files were repaired. Now we can finish the whole process by switching OsDisk and removing repair VM – “az vm repair restore -g broken-vm-01 -n broken-vm-01 –verbose”. We confirm VM deletion.