Microsoft Azure Application Gateway exposes your backend health API server

Firewall Weakness in Microsoft Azure’s Backplane Health Check

I decided to do this write up because Microsoft doesn’t really give the full story on their website when describing why ports 65503-65534 need to be open to everything on the internet. Azure customers should be aware of this risk when deploying onto the Azure Cloud.

TLDR: Cloud providers need a way to connect to your instances in order to do health check monitoring for Service Level Agreements (SLA). However, one would not expect the Azure backplane to need public internet facing ports open for monitoring.

The Azure Application Gateway

On Azure you can use an Application Gateway for a variety of front-end services:

  • Web application firewall (WAF)
  • Load balancer
  • Cookie-based session affinity (Think: user –> HTTPS –> APPGW –> HTTP –> Backend)
  • SSL offload, centralized SSL settings, HTTP–> HTTPS redirection
  • URL based/multi-site routing
  • Health check monitoring
  • Web server and firewall logs
  • Firewall rules are applied here (or so I’ve been told.)

 Exhibit A: Network layout diagram

I couldn’t find much low-level information about the Application Gateway online. I’ve been told by a Microsoft representative that the system runs on Windows, which isn’t a huge surprise–it is after all, Microsoft. They said the system is fully managed by Microsoft, including regular patching. I’ve also been informed that the system uses certificate authentication and has their certificates rotated regularly. I have doubts about their certificate authentication configuration, but I will get into that later.

Next: Network Security Groups (NSG)

Microsoft Azure provides a firewall feature that they call Network Security Groups (NSG). You can think the NSG as a firewall running on a VM that sits in front of your Azure systems. Their rules can be applied to subnets, virtual machines, or interfaces. I’m told that these rules are pushed to the Application Gateway from the Azure backend. I can’t find any documentation to corroborate this. It leads me to wonder if the Application Gateway is the same virtual machine that Azure uses for NSG firewall rules.

The diagram in Exhibit A looks straight forward. However, when you scroll down the page where they explain the Application Gateway you’ll see the following note:

“Note: If there is an NSG on [sic] Application Gateway subnet, port ranges 65503-65534 should be opened on the Application Gateway subnet for Inbound traffic. These ports are required for the backend health API to work.”

If you block the internet from connecting to these ports, Microsoft loses the ability to monitor the health of your systems. No one running anything critical could afford to do this, for obvious reasons. One would assume you could make firewall rules in your NSG to whitelist the Microsoft endpoints that query your Application Gateway. I’m told that isn’t possible as their addresses are dynamic. Some customers who were rightfully not okay with this have found a workaround where they grep through HTTP access logs to spot Microsoft systems and then dynamically open firewall ports to let the health check probes through the firewall. This is obviously not supported by Microsoft and is likely to fail.

What could go wrong if I leave these ports open?

The more open ports you have, the bigger your attack surface is. This gives attackers more opportunity to find weaknesses in your systems. In this case the cloud provider is forcing you to leave ports open for their ability to monitor your system health. There might not be a known issue in their Application Gateway today, but there could be one tomorrow. No software is completely secure, so it is always a good idea to limit exposure when possible.

Masscan, this is what you were made for

The IP blocks used by Azure for Application Gateways can be found fairly easily. For example, Azure offers free trials to try out their cloud offerings. I did an ARIN lookup on my Azure Application Gateway IP and found a few blocks associated with it. A /16, /15, /14, /13, and a /11 CIDR block which adds up to 3,080,182 IP addresses. I’m only sitting on one Azure zone and haven’t looked up which IP addresses Azure uses for different zones. Microsoft doesn’t share how many customers they have in Azure but scanning their entire network for systems might give you a rough figure. Doing this scan on a monthly basis could also give a rough estimate on how fast their cloud customer base is growing. But with that said, one Application Gateway does not equal one customer. There can be many hosts sitting behind it.

Using masscan an attacker could easily discover basically all of Microsoft’s Azure customers who have an SLA (basically anyone with production workloads running in the Azure cloud).

Example masscan command:

$ sudo masscan -p65503 x.x.x.x/15 --rate 16000

If you are not familiar with masscan, it is an extremely fast port scanner. I personally have not tried scanning the Azure blocks–doing that would more than likely break their bug bounty rules. However, I can say an Azure wide scan would probably not take very long to complete. After all, masscan claims you can scan the entire internet (4,294,967,296 IP addresses) on a single port per IP in about 10 hours with “–rate 100000”.

What are you exposing to the internet port 65504?

When you connect to the Application Gateway health check API port (65504 for example), you’ll see the following output:

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=us-ascii
Server: Microsoft-HTTPAPI/2.0
Date: Tuesday, March 2107 08:92:31 EDT
Connection: close
Content-Length: 932

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Not Found</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Not Found</h2>
<hr><p>HTTP Error 404. The requested resource is not found.</p>
</BODY></HTML>

If you were thinking about trying to reverse engineer these endpoints, I reccomend you start soon. Once Microsoft locks them down you might not get another chance if they block them entirely.

This endpoint is also hosted over unencrypted HTTP, which surprised me. Certificate authentication over insecure HTTP doesn’t make much sense. Is this even possible? The application might accept the certificate over HTTP, but should it trust it? Going back to the claim Microsoft made about using certificate authentication for their remote monitoring, this doesn’t seem likely. So, either they aren’t using certificates, or I was given incorrect information by the Microsoft representative. Either way it appears they are doing their authentication for health checks over HTTP.

Final thoughts

Microsoft made a mistake when designing how they locked down their internal SLA monitoring endpoints. Normally one would assume a cloud provider would have their backplane connect to their customers on some hidden address that is unknown to the outside world and customer. I’ve been informed that they are working to resolve this by creating a predefined NSG rule that is automatically updated with the backplane remote addresses.