Some Microsoft Azure customers running Ubuntu 18.04 virtual machines are continuing to experience problems, which Microsoft is attributing to DNS issues in the "Bionic" Ubuntu release. Problems began on August 30 at 2 a.m. ET and, according to Microsoft's Azure status page, are still affecting the Azure Kubernetes Service (AKS) and Azure Container Apps (ACA) as of 10:30 a.m. ET on August 31.
Microsoft officials said a bug in Ubuntu 18/04 "will lead to DNS resolution errors", and that "reports of this issue are confined to this single Ubuntu version."
Microsoft officials pointed to the bug for a potential fix on the Canonical/Ubuntu website. The Azure status page also notes that customers can try rebooting the impacted VM instances so that they receive a fresh DHCP lease and new DNS resolver(s).
Microsoft officials said the company's engineering deployed an auto-remediation for Azure Kubernetes Service (AKS) clusters. However there have been a "subset of cases" where AKS nodes weren't covered by the auto-remediation detection and weren't fixed. The team is working on fixing all the clusters across all regions worldwide, officials said in an update just before 8 a.m. ET today, August 31.
Based on a screen capture of the Azure status page from August 30 by BleepingComputer.com, it seems even more Azure services initially were impacted by the Ubuntu issue. While AKS and ACA are mentioned as being affected by the outage today, yesterday, the Azure Database for PostgreSQL, Azure Monitor, Azure Sentinel and the Azure VMware Solution also were affected, that screen shot indicates.
Microsoft officials said the offending Ubuntu updates all have been removed until further investigation is completed. They said they would provide further updates as warranted.
Update (August 31, 4 p.m. ET): It looks like most of the issues around this were mitigated as of noon ET, according to Microsoft's Azure Status history post. "We will continue to deliver communications via Azure Service Health for long-tail impact to the remaining subset of customers affected and publish a Preliminary Post Incident Review (PIR) within 3-days," officials said.