In this post we’ll cover two basic use cases for Aviatrix because often they go hand in hand. The first is what’s known as the Site 2 Cloud connection, whereby we can connect a non-Aviatrix resource to our fabric via external connectivity. The second is a solution to a common problem when doing this, in which IP overlap is present between the connected resources and resources in the cloud. Aviatrix makes NAT a lot easier to administrate and plan, and NAT is a problem native cloud networking simply cannot solve.
Let’s look at what the expanded Aviatrix topology will look like when we are done:
If it isn’t immediately clear from the diagram, the Spoke 1 VPC in Oregon is using 10.10.0.0/16 IP address space, which is also being used by the data center we will be connecting to the cloud. Normally this would require a complex NAT to be set up on the DC side, or to change IP addressing in the cloud. With Aviatrix this is pretty simple.
Before we get started, I want to point out that Aviatrix has a LOT of NAT options. Source NAT, Destination NAT, both, custom SNAT/DNAT, and the use case I’ll show here is really the simplest and most straight forward. In future posts we can dig more into specific use cases and the right NAT to use there, but for now let’s suppose that the data center in the diagram needs to access resources in the Spoke 1 VPC and vice versa.
There are some workloads communicating back and forth and they have overlapping IP addresses due to poor IPAM and a lack of communication between developers, the network team and the cloud team. Access to the other cloud resources isn’t needed at this time, just Spoke 1 and the data center. This is important because we’ll be using the Mapped NAT functionality, which works like a 1:1 NAT setup, single source, single destination.
Land the Site 2 Cloud Tunnel
I’m not going to focus on a best practice deployment in this series in favor of making things interesting and illustrative, but I will say that in general it’s best to centralize NAT services as much as possible to simplify routing. In this case, the most expedient solution would probably be to simply create the Site 2 Cloud connection between the DC and the Spoke 1 VPC directly since the DC only needs to access Spoke 1 resources. However, for educational purposes, we are instead landing the S2C connection in a dedicated landing VPC, on dedicated spoke gateways, so that we can show future NAT use cases without splitting the NAT design all over the place.
First, deploy a new VPC dedicated to the Site 2 Cloud Landing Zone. Unlike previous posts, we’re going to modify the created subnets using the controller this time. For VPCs created by the controller, the default is to subdivide all availability zones into one public/private subnet per availability zone, minus any special-interest subnets such as those created by Firenet. In this case we are restricting the created subnets to two pairs only, one public/private across two AZ, and specifying that each should have a CIDR block size of /20.
We can see the VPC subnets created:
Two pairs of subnets with a /20 CIDR, as intended. The controller can save a lot of time in planning subnets for a workload or Aviatrix gateway deployment. Moving on, let’s create a spoke gateway in the landing zone to terminate the Site 2 Cloud connection.
In this case we don’t need any special features enabled on the gateway so we leave most unchecked. Though we do have an HA gateway, we won’t be landing a redundant tunnel to it in this specific use case because (to avoid redundant explanation) we have only a single edge router in the data center to connect. In future posts we will explore other S2C options such as HA tunnels and dynamic routing, for this initial one let’s stay simple.
The next step is to attach the S2C Landing Zone to the Aviatrix transit.
Hopefully by this point in the series some of the basic operation seems clear. By attaching the spoke VPC to the transit, we give connectivity to other cloud spokes (or one in particular, in this use case).
The next piece of the puzzle is to create the Site 2 Cloud tunnel connection. This tunnel will be used to bring in non-Aviatrix connectivity to the fabric. In this particular setup, we’ll be using an IPSec S2C tunnel landing on a customer router in a remote DC.
As discussed, we’ll be using a mapped (essentially 1:1 source/destination) NAT. It will be route-based, meaning that the devices will have to direct traffic to the tunnel via the routing table instead of using a policy. Note that a S2C connection does not require a NAT at all, this just fits our particular use case.
We have a lot of options in a S2C connection, starting with what gateway will be the termination point of the S2C tunnel and what remote device IP should be the other end. Below we’ll drill in on the NAT details.
By default, the S2C NAT assumes that only the remote end needs to be initiating connections to resources in the cloud, so in order to provide bidirectional connectivity we need to add a Custom Mapping and check Local Initiated, Remote Initiated, or both.
For the Locally Initiated Traffic section, enter the real source subnet range from the point of view of a local (not S2C-based) resource. The virtual CIDR is a NAT IP to represent the local resource to the remote end (remember, IP overlap is the concern here), as well as the real destination CIDR and, finally, the stand-in CIDR for that remote end to send traffic towards.
This can break the brain pretty quickly so let’s break that down, and it will make the reverse explanation simpler.
Let’s see the topology again:
In this case, ‘locally-initiated’ traffic would be coming from the Spoke 1 VPC in Oregon. Since the VPC CIDR overlaps the on-prem DC, clearly the workload in Spoke 1 can’t send a packet to 10.10.255.254 (the DC server) because the VPC router would see the destination as local. That means we need a stand-in IP range for the real data center network. We’ll use 10.111.0.0/16 to stand in for 10.10.0.0/16 so that the workload in Spoke 1 has a destination that goes somewhere.
The locally-initiated traffic will need to have a stand-in on the remote network as well so that data center servers can send traffic without hitting the same problem. This stand-in IP will be 10.211.0.0/16 and is what the data center will send traffic to for Spoke 1-destined packets. These stand-ins are the ‘Virtual’ CIDR ranges, and of course the Real ranges are the overlapping IP ranges themselves. We’ll dig into this more when we get to the packet walk. For now let’s continue setting up the S2C connection.
Once the connection is created, we can edit the S2C configuration and fix any vendor-specific issues. In our case, the CSR1000v we are using to simulate an on-prem router has no idea what its public IP address will be (it’s an Elastic IP) and so we need to tweak the configuration to use the CSRs private address as its identifier for IPSec.
Once that is tweaked, we can (potentially) download the configuration to use on the remote device, similar to how AWS gives CGW configs. While not every device type and OS is supported, a good amount are and there is a Generic download which will simply have the remote end details to apply if the specific one isn’t listed. Since I am using a CSR1000v router to simulate an on-prem DC, that’s the config to grab.
Here’s what it looks like, it can almost be copy/pasted directly into the router.
Here is the result, via CoPilot.
Routing Requirements
We’ll be diving into the packet walk soon, but there are just two more things to take care of now that the S2C tunnel is up and the NAT configuration is in place.
The spoke gateway will handle the source/dest swap of IP addressing as traffic comes through it matching that traffic pattern, but routing is another concern. How do we ensure that packets sent to a CIDR range that doesn’t actually exist anywhere get to the right place for the NAT to happen?
The answer is to advertise routes for them to attract the traffic toward the device doing the NAT. On the data center side, we simply add a route to the virtual cloud address pointing at the S2C tunnel:
On the Aviatrix side, we need the S2C landing spoke to advertise the CIDR range of the on-prem virtual address. For locally-initiated traffic to 10.111.0.0/16 (a CIDR which doesn’t exist) the route advertisement ensures the traffic destined for the DC reaches the landing spoke for the NAT. Because a custom spoke advertisement overrides any implicit advertisement, it’s important to include the original advertisement (the local VPC CIDR) as well.
Now we are all set to packet walk from the on-prem DC to the Spoke 1 VPC and see everything in between.
Here There Be Dragons!
This packet walk is not for the faint of heart, but I promise to make it make sense by the end.
Here’s a traceroute from the cloud resource to the DC router. Unfortunately the reverse doesn’t work because of the immediate NAT, so we’ll just have to examine this one and I’ll explain the opposite flow as well.
Let’s represent this traffic visually and then step through it hop by hop. We’ll do the basic traffic flow first and then devote a whole section just to the NAT part.
- Ping is initiated by the Spoke 1 VPC host to 10.111.255.254. Note, that a Mapped NAT sets up a range of 1:1 NAT entries. So in this example, we take anything 10.10.0.0/16 and map it to the equivalent IP using 10.111.0.0/16. The Loopback interface on our DC router (simulating a DC server) is configured as 10.10.255.254, so to reach that server the host in Spoke VPC 1 pings 10.111.255.254.
- The primary gateway picks up the packet because of the native VPC route table which sends anything destined to an RFC 1918 address to the gateway. The gateway consults its own routing table and, because we advertised this CIDR for 10.111.0.0/16 from the S2C Landing spoke gateway, it has an entry in the route table for that.
- Packet arrives at the connected transit gateway. Per the inspection policy, any traffic coming from or going to Spoke 1 should be inspected by a firewall, this triggers the Firenet behavior that hashes traffic to a firewall, in this case it hashes to the second firewall attached to the HA gateway.
- Per the inspection policy the packet is bounced through the firewall and back before the gateway route table takes over.
- The gateway routing table points at the peered transit gateway in the N. California region and the ECMP hash sends the packet to the primary gateway over there.
- The transit gateway in N. California does the same lookup and passes the packet to the HA gateway of the landing zone VPC.
- We lose an ICMP echo reply here because at this point the gateway performing the NAT has the packet and this lost reply is just an artifact of that.
So far this has been the same traffic behavior we expect from previous posts and so I won’t go as far into detail in this one. The next step is the NAT function and so we will tackle that separately.
NAT, the Magic Dragon
When the packet enters the S2C landing spoke gateway, the source and destination looks like this:
Src: 10.10.3.8 —> Dst: 10.111.255.254
The gateway consults its Mapped NAT table and determines that this traffic flow matches, and so it performs a source and destination NAT. The source is changed from 10.10.3.8 to 10.211.3.8 because if it remained the same, it would overlap with the DC address space.
The destination IP, 10.111.255.254 doesn’t exist on the DC side, so the destination IP is changed to the actual real IP of the simulated server, 10.10.255.254. Now the packet looks like this, just before being forwarded to the DC router:
Src: 10.211.3.8 –> Dst: 10.10.255.254
The DC router receives this packet and forwards the ping request to its simulated server interface (itself, a Loopback), and the echo reply will look like this:
Src: 10.10.255.254 –> Dst: 10.211.3.8
The router looks up the next hop in its routing table, sees the static route for 10.211.0.0/16 pointing at the S2C tunnel and forwards the echo reply to the S2C landing spoke gateway.
When the spoke gateway receives the packet, it again consults the Mapped NAT table, sees a match, and performs the same swap it did earlier in reverse. The source IP changes from 10.10.255.254 to 10.111.255.254.
The destination IP changes from 10.211.3.8 to 10.10.3.8, the real IP of the Spoke 1 VPC host. As the packet moves back along the routing path to the host, remember that the source address is the virtual stand-in for the DC prefix (10.111.0.0/16) and the destination is the real address of the server that the Aviatrix fabric knows how to reach. This is how Mapped NAT works.
Awesome! What’s the Catch?
The catch to deploying this sort of NAT is that it only allows the DC and Spoke 1 VPC to communicate. No other packets, despite not overlapping, can be sent to or from other cloud resources. The Mapped NAT is an all-in affair, so if solving overlapping IP addresses AND providing multiple resource connectivity is needed, the NAT needs to be unmapped and more creatively scoped. It’s entirely possible and we’ll cover that in a future post. For now, I hope this gives good insight into ways we can connect extra-cloud resources and offer them access to workloads with overlapping IP addressing.
1 Comment
Aviatrix Packet Walk: NAT, the Magic Dragon - Carpe DMVPN · June 10, 2022 at 1:27 pm
[…] step through a Site 2 Cloud connection with redundancy. I covered most of the details in the last post so I won’t spend a lot of extra time here, I’ll simply focus on setting up the […]
Comments are closed.