Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2019006042) ASSIGNMENT OF UNIQUE NETWORK ADDRESSES FOR LOGICAL NETWORK ADDRESSES
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

ASSIGNMENT OF UNIQUE NETWORK ADDRESSES FOR LOGICAL NETWORK ADDRESSES

Yusheng Wang, Donghai Han

BACKGROUND

[0001] Network virtualization plays a crucial role in the operation of datacenters, with two different approaches generally used to achieve network virtualization. In the overlay approach, the physical datacenter network is used as a packet carrier, and the network functionalities (of the logical networks) are separated and implemented in the upper overlaying layer. A common technique is to embed logical L2 (data link layer) packets in physical L3 (network layer) packets. In the underlay approach, the physical network devices (e.g., switches, routers) are programmed based on the logical network model, so that the physical datacenter network acts as both the packet carrier and logical network provider.

[0002] Using overlays provides flexibility, but the network stack includes five layers rather than three for IP networking. The extra protocol layers consume extra physical network bandwidth, which means less bandwidth is available for the actual payload. In addition, the packet encapsulation and resulting fragmentation and checksum calculation use extra CPU cycles, which otherwise would be available for guest workloads (e.g., virtual machines). As such, other techniques for network virtualization would be useful.

BRIEF SUMMARY

[0003] Some embodiments provide a method for implementing multiple logical networks in a physical network without using encapsulation, and without the physical network being required to perform logical network services and processes. Instead, some embodiments map each logical network address to a unique physical network address, and use address replacement on logical network packets rather than encapsulation.

[0004] In some embodiments, a network controller (or cluster of network controllers) maintains a pool of available physical network addresses, and handles requests from managed forwarding elements (MFEs) to assign unique physical addresses to logical network addresses for interfaces that connect to the MFEs. For example, when an interface (e.g., a virtual network interface controller (VNIC)) of a virtual machine (VM) or other data compute node (DCN) corresponding to a logical port of a logical forwarding element attaches to a MFE, that interface is assigned a logical network address. This assignment may be via dynamic host configuration protocol (DHCP), statically assigned or otherwise pre-configured, etc. The MFE notifies the network controller of the new logical network address.

[0005] The network controller receives this notification with the logical network address and assigns a unique physical network address for the interface (i.e., mapped to the logical network address). The network controller provides this physical network address to the requesting MFE, and stores the mapping between logical network address (and interface) and physical network address. In some embodiments, based on its network topology information, the network controller distributes the mapping to other MFEs that could potentially be sending packets to (or receiving packets from) the logical network address, and would thus need the physical mapping. In other embodiments, when a different MFE receives a first packet sent to the logical network address (e.g., from one of its local DCNs) or from the physical network address, that MFE sends a request to the controller for the mapping. The controller notifies the MFE regarding the mapping so that the MFE can use the mapping to process packets, as described further below.

[0006] In some embodiments, each physical network address is not just unique within a particular logical network, but is unique among all logical interfaces for all logical networks implemented within the physical network. That is, while logical address spaces may overlap between separate logical networks (i.e., the same subnet and/or IP address could be used in multiple logical networks), the physical network uses a single network address space. In a typical datacenter, this physical address space is allocated privately (i.e., does not need to be used or known outside of the datacenter), and thus the available address space is fairly large. [0007] To process a packet at the source MFE (i.e., the MFE that sends the packet onto the physical network, which is often the MFE that first receives the packet from its source DCN), the source MFE first performs logical network processing. This processing may include logically forwarding the packet through one or more logical forwarding elements (e.g., a logical switch, a logical router, and another logical switch), performing logical ACL and distributed firewall checks, etc. If the packet is routed, the time to live and logical MAC address may be changed.

[0008] Once this logical processing is complete, a typical overlay network would encapsulate the packet based on its destination address being mapped to a physical tunnel endpoint address. However, in some embodiments, the MFE determines whether the packet is eligible for address replacement instead of encapsulation. In some embodiments, only unicast packets sent between logical network addresses are eligible for address replacement. That is, multicast/broadcast packets, and packets sent to (or received from) a destination outside of the logical network are not eligible for address replacement. Assuming that the packet is eligible (and the MFE has the mapping information for the source and destination addresses), the source MFE replaces the logical source and destination network (e.g., IP) addresses in the packet with the unique physical addresses to which they are mapped. Some embodiments also modify the source and destination data link (e.g., MAC) addresses with those that would be used for an encapsulated packet (e.g., a source MAC corresponding to the physical interface of the MFE and a destination MAC corresponding to the physical network next hop).

[0009] In addition, a logical interface might send a packet that could cause the physical network routers to perform various unwanted actions when using address replacement (e.g., an ICMP packet). Whereas an encapsulated packet would have this information hidden in the inner header (being encapsulated with, e.g., a TCP or UDP packet), with address replacement the physical network would see this protocol and potentially act upon it. Thus, for certain protocols, the source MFE replaces the protocol header field value with an unused or reserved protocol value that (i) would not cause the physical network to take any unwanted action and (ii) should not be used within the logical network.

[0010] The packet is then processed through the physical network as normal. Once the packet reaches the destination MFE, additional processing is required to handle the non-encapsulated packet. The destination MFE maps the protocol field value to its original value, if needed (i.e., if the protocol value is one of the unused or reserved values to which a different value was mapped at the source MFE). The physical network addresses are also

replaced with the logical network addresses based on the mappings stored by the MFE. To determine the logical data link addresses, some embodiments use the network topology. If the source and destination network addresses are on the same logical switch, then the data link addresses will be those of the corresponding logical interfaces. However, if the source network address is on a different logical switch from the destination, then the data link address of the logical router interface that connects to the logical switch will be the source data link address. Once the data link layer address is also replaced, the MFE can perform any additional required logical processing and deliver the packet to the destination interface.

[0011] The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawing, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The novel features of the invention are set forth in the appended claims.

However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.

[0013] Figure 1 conceptually illustrates a network controller and its communication with an MFE to provide the MFE with a physical IP address for a newly connected interface.

[0014] Figure 2 conceptually illustrates a process of some embodiments for assigning a physical IP address to map to a logical IP address.

[0015] Figure 3 conceptually illustrates a process of some embodiments for releasing an assigned physical IP address when a logical interface is moved or released.

[0016] Figure 4 conceptually illustrates a set of MFEs that implement at least one logical network within a datacenter network of some embodiments, and the difference in physical network traffic between two logical network endpoints (e.g., VMs) and physical network traffic between a logical network endpoint and an external network.

[0017] Figure 5 conceptually illustrates a process of some embodiments for replacing logical IP addresses with physical IP addresses.

[0018] Figure 6 conceptually illustrates a process of some embodiments for replacing physical IP addresses with logical IP addresses before delivering a packet to an interface.

[0019] Figure 7 conceptually illustrates a logical network and the logical to physical

IP address mappings assigned for the endpoints of that network.

[0020] Figures 8 and 9 illustrate examples of packets sent through the physical implementation of that logical network.

[0021] Figure 10 conceptually illustrates an electronic system with which some embodiments of the invention are implemented.

DETAILED DESCRIPTION

[0022] In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.

[0023] Some embodiments provide a method for implementing multiple logical networks in a physical network without using encapsulation, and without the physical network being required to perform logical network services and processes. Instead, some embodiments map each logical network address to a unique physical network address, and use address replacement on logical network packets rather than encapsulation.

[0024] In some embodiments, a network controller (or cluster of network controllers) maintains a pool of available physical network addresses, and handles requests from managed forwarding elements (MFEs) to assign unique physical addresses to logical network addresses for interfaces that connect to the MFEs. For example, when an interface (e.g., a virtual network interface controller (VNIC)) of a virtual machine (VM) or other data compute node (DCN) corresponding to a logical port of a logical forwarding element attaches to a MFE, that interface is assigned a logical network address.

[0025] Figure 1 conceptually illustrates such a network controller 100 and its communication with an MFE 105 to provide the MFE 105 with a physical IP address for a newly connected interface. It should be understood that while a single central network controller 100 is shown, in some embodiments a cluster of such controllers operates to communicate with numerous MFEs on numerous host machines.

[0026] As shown, the MFE 105 operates on a host 110, and at least one DCN (in this case a VM 115) attaches to the MFE 105. The MFE 105, in some embodiments, is a virtual switch or other software forwarding element that operates in the virtualization software (e.g., hypervisor) of the host machine 110, and which is configured by a network control system that includes the network controller 100. In some embodiments, a local controller operates on the host machine 110 (e.g., also within the virtualization software). This local controller receives configuration data from the network controller 100 and translates the configuration data from the network controller 100 for the MFE 105. In some such embodiments, the communication between the MFE 105 and the controller 100 is sent through the local controller.

[0027] The VM 115 attaches to the MFE 105 via a VNIC or similar interface. When a

VNIC attaches to the network, it will be assigned a logical network address. In the subsequent discussion, Internet Protocol (IP) addresses will be used, but it should be understood that these addresses could be other types of network layer addresses in different embodiments. A logical IP address is the address that the VNIC uses to send/receive traffic on a logical network. As described further below, multiple distinct logical networks may be implemented within a single physical datacenter network, with each logical network having its own address space (which can overlap with the address spaces of other logical networks). The MFEs implement the logical networks based on the configuration data received from the network controllers.

[0028] The assignment of an IP address may be accomplished via dynamic host configuration protocol (DHCP), static assignment, other pre-configuration of the IP, etc. When the MFE 105 identifies the logical IP address of a new interface (by intercepting a DHCP packet, receiving the information from the VNIC, processing a packet from the VNIC, etc.), the MFE 105 notifies the network controller 100 of the new logical network address and interface, so that the network controller 100 can assign a unique physical IP address for the interface (i.e., mapped to the logical network address).

[0029] Figure 2 conceptually illustrates a process 200 of some embodiments for assigning a physical IP address to map to a logical IP address. The process 200 is performed by a network controller (e.g., the controller 100) in response to receiving a request from an MFE (e.g., the MFE 105) for a physical IP address.

[0030] As shown, the process 200 begins by receiving (at 205) a new logical IP address and a corresponding interface from an MFE. Because the logical IP address is not necessarily exclusive to the logical network, an additional identifier is required for the mapping. Some embodiments use a unique VNIC identifier or a unique logical port identifier. Figure 1 illustrates that the MFE 105 sends a message 120 to the network controller 100 with the interface and logical IP address of the VNIC by which the VM 115 connects to the MFE 105. As mentioned, the MFE may have become aware of this after a DHCP request, when a first packet is sent by the VM 115, etc. In some embodiments, the message 120 from the MFE only needs to identify the presence of the logical interface on the host 110, as the controller 100 already has the corresponding logical IP address that has been assigned to the interface.

[0031] The process 200, in response to this request, assigns (at 210) an available unique physical IP address to the logical IP address/interface combination. In some

embodiments, each physical network address is not just unique within a particular logical network, but is unique among all logical interfaces for all logical networks implemented within the physical network. That is, while logical address spaces may overlap between separate logical networks (i.e., the same subnet and/or IP address could be used in multiple logical networks), the physical network uses a single network address space. In a typical datacenter, this physical address space is allocated privately (i.e., does not need to be used or known outside of the datacenter), and thus the available address space is fairly large. In some embodiments, the datacenter may use both IPv4 and IPv6 addresses. In such embodiments, these addresses are allocated separately. That is, when a logical IPv4 address is sent to the controller 100, the controller 100 allocates a unique physical IPv4 address, and when a logical IPv6 address is sent to the controller 100, the controller 100 allocates a unique physical IPv6 address.

[0032] The process 200 then provides (at 215) the assigned unique physical IP address to the requesting MFE. As shown in Figure 1, the network controller 100 sends a message 125 with the assigned physical IP address to the MFE 100. As noted, in some embodiments this message is sent to a local controller on the host 100, which in turn provides the data to the MFE 105. The MFE 105 stores this mapping, and uses the mapping to process packets sent to and from the VM 115, as described in more detail below. In some embodiments, the MFE sends a gratuitous ARP packet to notify the physical network of the new IP address.

[0033] The process 200 also stores (at 220) the mapping of logical IP address and interface to the physical IP address. As shown in Figure 1, the network controller 100 stores a physical to logical network address mapping table 130, as well a pool of available IP addresses 135 and a waiting pool of IP addresses 140. The network controller 100 stores this mapping table (which, in some embodiments, also identifies the host machine for each logical IP address and interface combination) in order to distribute the mappings to other MFEs that need the data. In some embodiments, based on its network topology information, the network controller distributes the mapping to other MFEs that could potentially be sending packets to (or receiving packets from) the logical network address, and would thus need the physical mapping. In other embodiments, when a different MFE receives a first packet sent to the logical network address (e.g., from one of its local DCNs) or from the physical network address, that MFE sends a request to the controller 100 for the mapping. The controller 100 notifies the MFE regarding the mapping so that the MFE can use the mapping to process packets, as described further below.

[0034] As noted, the network controller 135 also includes a pool 135 of available physical IP addresses and a waiting pool 140 of physical IP addresses. The physical IP addresses, as described above, are unique within a datacenter (or other privately-allocated physical network). Thus, the available physical IP addresses pool 135 lists all of the IP addresses available to be used for mapping - i.e., the physical IP addresses that are not currently mapped to a logical IP address of an operating interface. Once the network controller 100 assigns a particular physical IP address to an interface, the controller 100 stores this mapping in the table 130 and removes the physical IP address from the pool 135 of available IPs.

[0035] Figure 3 conceptually illustrates a process 300 of some embodiments for releasing an assigned physical IP address when a logical interface is moved or released. The process 300 is performed by a network controller (e.g., the controller 100) in response to receiving a notification from an MFE (e.g., the MFE 105) that a logical interface is no longer in use.

[0036] As shown, the process 300 begins by receiving (at 305) from an MFE (or a local controller operating on a host with an MFE) a notification that an interface with a logical IP address is no longer present on the MFE. If a VM is migrated to a different host, some embodiments release the physical IP and reassign a new one; other embodiments keep the same logical IP to physical IP mapping. Other circumstances that could cause a logical IP address to no longer be present on an MFE are the removal of that interface from its logical network (i.e., by an administrator changing the logical network configuration), or the logical IP is changed (e.g., also by a change to the logical network configuration).

[0037] In response, the process 300 places (at 310) the physical IP address corresponding to the released logical IP address in a waiting pool for a threshold period of time. As indicated, the network controller 100 includes a waiting pool 140 for physical IP addresses. The waiting pool 140 is used to ensure that a physical IP address is not reallocated too quickly after being released, giving the network time to flush packets that may be sent to the previous interface to which the physical IP address is mapped.

[0038] Thus, the process determines (at 315) whether the period of time has expired.

If not, the process continues to evaluate this until the period of time expires. It should be understood that the process 200 (as well as the other processes described herein) is a conceptual process, and that some embodiments do not perform continuous checks for each physical IP address in the waiting pool 140. Instead, some embodiments use an event-driven process that simply waits and then takes action upon the waiting period expiring. Once the period of time has expired, the process 300 moves (at 320) the physical address from the waiting pool into the pool of available physical IP addresses. That is, the network controller 100 moves the IP address from the waiting pool 140 to the available IP address pool 135.

[0039] The above description relates to the network controller operations to assign and manage the logical to physical IP address mappings. Once these mappings are assigned, packets are sent between MFEs without encapsulation (at least for certain packets that meet certain criteria). Figure 4 conceptually illustrates a set of MFEs that implement at least one logical network within a datacenter network 400 of some embodiments. Specifically, this figure illustrates the difference in physical network traffic between two logical network endpoints (e.g., VMs) and physical network traffic between a logical network endpoint and an external network.

[0040] As shown, the datacenter 400 includes two host machines 405 and 410 that host VMs, which belong to the same logical network (they may attach to the same logical switch or different logical switches). The VMs 415 and 420 connect to MFEs 425 and 430, respectively, which operate on the host machines 405 and 410 to implement the logical network. In addition, the logical network to which the VMs 415 and 420 belong includes a connection (e.g., a logical router connection) to an external network 435. This connection is implemented by a gateway 440 operating on a third host machine 445. In some embodiments, the gateway 440 is a separate component of a logical router, and may be implemented in a VM or other DCN on the host 445, in the datapath of the host 445, etc.

[0041] When the VM 420 (or the VM 415) sends traffic to the external network 435 or receives traffic from this external network, the traffic between the gateway 440 and the MFE 430 is encapsulated with the physical IP addresses. As shown by the packet 450, this traffic includes inner IP and Ethernet headers as well as outer (encapsulation) IP and Ethernet headers. For the sake of simplicity, the other inner and outer protocols (e.g., transport protocols) are not shown here. Because the external IP address will not have a mapping to a unique IP address, if the MFE or gateway were to replace this IP in the packet (e.g., with the IP address of a PNIC of the host 445), the receiving MFE/gateway would not be able to map this back to the correct IP address. Instead, encapsulation is used for this communication between logical network endpoints and the external network in order to preserve these addresses.

[0042] On the other hand, when the VM 415 sends a packet to the VM 420 (or vice versa), the MFE 425 performs address replacement to replace the logical IP (and logical MAC) addresses with physical IP and MAC addresses, as indicated by the packet 455. This

packet 455 has fewer headers and thus more room for payload without fragmentation if the network is constrained by a maximum transmission size. Address replacement is available for the packet 455 because the traffic is unicast communication between two logical network endpoints that have one-to-one mappings with physical IP addresses. In some embodiments, the MFEs do not use address replacement for multicast/broadcast communications, because the packets are sent to multiple physical destinations. However, in other embodiments, at least some multicast/broadcast packets are replicated into unicast packets by the MFE (e.g., a separate unicast packet for each destination, each packet having a different destination address), and these unicast packets can be sent onto the physical network using address replacement rather than encapsulation.

[0043] Figures 5 and 6 describe processes performed by a source MFE (i.e., the first-hop MFE for a packet) and a destination MFE (the recipient of such a packet via the physical network) to perform address replacement on a packet. These processes assume that the MFEs performing the respective processes have the logical IP to physical IP mapping information, and do not need to request this information from a network controller in order to process the packet.

[0044] The processes of Figures 5 and 6 will be described in part by reference to

Figures 7-9. Figure 7 conceptually illustrates a logical network 700 and the logical to physical IP address mappings assigned for the endpoints of that network, while Figures 8 and 9 illustrate examples of packets sent through the physical implementation of that logical network. The logical network 700 includes two logical switches 705 and 710 that are logically connected by a logical router 715. Two VMs (VMl and VM2) connect to the first logical switch 705 and two VMs (VM3 and VM4) connect to the second logical switch 710. Each of these logical interfaces has a MAC address (MAC A, MAC B, MAC C, and MAC D). In addition, the logical router downlinks (interfaces to the logical switches) have their own logical MAC addresses (MAC E and MAC F).

[0045] The logical to physical IP address mapping table 720 is information that would be stored by a network controller (or network controller cluster), as well as the MFEs that implement the logical network. As shown in this table, the VMs are implemented on three hosts, and thus the three MFEs operating on these hosts would store the information in the mapping table 720. VMl and VM3 are implemented on a first host, with VM2 on a second host and VM4 on a third host. The first logical switch 705 is assigned a subnet 10.1.1.0/24, and the logical IP addresses of the two VMs on this subnet are 10.1.1.5 and 10.1.1.6. Similarly, the second logical switch 710 is assigned a subnet 10.2.1.0/24, and the logical IP addresses of the two VMs on this subnet are 10.2.1.5 and 10.2.1.6. According to the mapping table 720, each of these logical interfaces maps to a unique physical IP address. While this example shows only a single logical network, if other logical networks were implemented on the hosts (or even on some of the hosts), those hosts would also map the logical IP addresses of the additional logical networks to unique physical IP addresses. A single host could, for example, have numerous mappings for the logical IP address 10.1.1.5, to different physical IP addresses for different interfaces of different logical networks.

[0046] Figure 5 conceptually illustrates a process 500 of some embodiments for replacing logical IP addresses with physical IP addresses. In some embodiments, the source MFE for a packet (i.e., the MFE to which the source interface for the packet connects) performs this process 500 on the packet upon receiving the packet (e.g., from a VNIC).

[0047] As shown, the process 500 begins by receiving (at 505) a packet from an interface with a logical IP address. The packet, as sent, will have logical source and destination IP addresses as well as logical source and destination MAC addresses. The source addresses are those of the interface from which the packet was received (e.g., the VNIC or similar interface) by the MFE. The destination IP address is the address of the ultimate destination for the packet, while the MAC address is either that of the destination (if the destination is on the same logical switch) or of the local logical gateway (if the packet requires logical routing).

[0048] Figures 8 and 9 illustrate examples of such packets as they are sent through the physical network. In Figure 8, VM1 sends a packet 800 to VM2 (on the same logical switch, but operating in a different physical host machine). The packet 900, as sent to an MFE 805, has a source IP address of 10.1.1.5, a destination IP address of 10.1.1.6, a source MAC address of MAC A, and a destination MAC address of MAC B. In addition, the protocol field of the IP header has the value 17 (for User Datagram Protocol (UDP)). In Figure 9, VM1 sends a packet 900 to VM4 (on a different logical switch and operating in a different physical host machine). The packet 900, as sent to the MFE 805, has a source IP address of 10.1.1.5, a destination IP address of 10.2.1.6, a source MAC address of MAC A, and a destination MAC address of MAC E (corresponding to the default gateway for VM1). In addition, the protocol field of the IP header has the value 1 (for Internet Control Message Protocol (ICMP)).

[0049] Returning to Figure 5, the process 500 performs (at 510) logical processing on the received packet. That is, the MFE processes the packet through the logical network, which may include application of ACL and firewall (e.g., distributed firewall) rules, network address translation (NAT) processing, distributed load balancing, etc. The logical processing also includes logical switching and/or routing. If logical routing is required (e.g., for the packet 900 of Figure 9), the logical MAC address is modified and the time to live (TTL) is decremented for the packet.

[0050] After logical processing is completed, the process 500 determines (at 515) whether the packet is eligible for address replacement. In some embodiments, only unicast packets sent between logical network addresses are eligible for address replacement. That is, multicast/broadcast packets, and packets sent to (or received from) a destination outside of the logical network are not eligible for address replacement. Because the logical IP addresses are no longer in the packet at all when address replacement is used, some embodiments only use the technique when there is a 1 : 1 mapping between the logical IP addresses being replaced and the physical IP addresses that replace them.

[0051] In the case of broadcast/multicast, the MFEs do not use address replacement because the packets are sent to multiple physical destinations. However, in other embodiments, at least some multicast/broadcast traffic is replicated into multiple unicast packets by the MFE, and these unicast packets can be sent onto the physical network using address replacement rather than encapsulation. For packets sent to/from the external network, using address replacement would require assigning unique physical IP addresses for every external IP address that communicated with the logical network(s). Given the large number of such IP addresses and that the nature of the communication is more likely to be transient, there is likely to be less value in such local physical IP address assignment.

[0052] If the packet is not eligible for address replacement (e.g., the packet is a multi-recipient packet, or is addressed to or received from an external IP address that is not a logical network endpoint), the process 500 encapsulates (at 520) the packet. For the encapsulation headers, some embodiments use tunnel endpoint IP addresses that are on the physical network but separate from the unique physical IP addresses used for address replacement. The process 500 then proceeds to 550, described below.

[0053] On the other hand, when the packet is eligible for address replacement, the process identifies (at 525) the unique physical IP addresses for the source and destination logical IP addresses and interfaces. The source MFE identifies the logical IP addresses based on the data in the packet header fields, and the source interface based on the interface from which the packet is received. The destination logical interface is identified by the MFE during the logical processing operations (e.g., during logical forwarding).

[0054] The MFE consults its IP address mapping table to identify the physical IP addresses. In some embodiments, if the MFE does not have a unique physical IP address stored for the destination logical IP address and interface (or the source, if this is the initial packet from the source interface), the MFE sends a message to the network controller requesting the unique physical IP address. In some embodiments (not shown in this process), rather than wait for the controller, the first packet (or first several packets) are encapsulated rather than sent using address replacement, until the MFE receives the corresponding physical IP address from the network controller.

[0055] Assuming that the physical IP addresses are identified, however, the process

500 replaces (at 530) the logical IP addresses in the packet with the identified unique physical IP addresses. In addition, the process modifies (at 532) the time to live (TTL) field of the packet to account for the number of physical network hops the packet will traverse (each of which will decrement the TTL field). In some embodiments, the TTL field should only be decremented by logical processing (for each logical router that processes the packet). The physical datacenter network will often be stable with respect to the number of physical hops between two physical endpoints (when a logical network interface is migrated, this could change the number of physical network hops, but the interface will be assigned a new unique physical network address at this point). Some embodiments use probe messages or other techniques to determine the number of hops to each possible destination physical IP address, and store this information in the mapping tables (e.g., as another column in the table 720).

[0056] The process 500 also replaces (at 535) the logical MAC addresses with physical network MAC addresses. The source MAC is that of the physical interface to which the source physical IP address corresponds, while the destination MAC is that of the local gateway (unless the destination physical interface is on the same physical switch as the source physical interface).

[0057] Figure 8 illustrates that the packet sent by the source MFE 805 has the source and destination physical IP addresses that have replaced. The source and destination IP addresses are replaced with the unique physical IP addresses shown in the mapping table 720 as corresponding to 10.1.1.5 (VM1) and 10.1.1.6 (VM2). For the physical MAC addresses, the source MAC (PMAC1) is that of the PNIC to which the 192.168.1.10 address corresponds, while the destination MAC (PMAC2) is that of the local default gateway. Figure 9 illustrates similar address replacement of the source and destination IP and MAC addresses for the packet 900. The same source physical IP address is used, while the destination IP address corresponding to 10.2.1.6 (VM4) is used. In this case, the same

physical MAC addresses are used as for the first packet, because the packet is again sent to the local default gateway on the physical network.

[0058] In addition to replacing the logical addresses with physical addresses, the process 500 also determines (at 540) whether the protocol field of the IP header matches one of a set of pre-specified values. When the protocol field does match one of these pre-specified values, the process replaces (at 445) the protocol field value with a replacement value. A logical interface (i.e., the DCN to which the logical interface belongs) might send a packet that could cause the physical network routers to perform various unwanted actions when using address replacement (e.g., an ICMP packet). Whereas an encapsulated packet would have this information hidden in the inner header (being encapsulated with, e.g., a TCP or UDP packet), with address replacement the physical network would see this protocol and potentially act upon it. Thus, for certain protocols, the source MFE replaces the protocol header field value with an unused or reserved protocol value that (i) would not cause the physical network to take any unwanted action and (ii) should not be used within the logical network.

[0059] For example, the packet 800 of Figure 8 has the protocol field value 17, which corresponds to UDP. As UDP packets will be forwarded normally by the routers of the physical network, this protocol field value is not modified by the MFE 805. On the other hand, the packet 900 of Figure 9 has the protocol field value 1, which corresponds to ICMP. ICMP packets may be acted upon by the physical routers in ways that are not desired, so the MFE 805 replaces this with the value 143, which is a reserved value that will be ignored by the physical network routers.

[0060] Finally, whether the packet is encapsulated or has address replacement performed, the process transmits (at 550) the packet to the physical network (i.e., the physical datacenter network 810). The packet is then processed through the physical network as normal, during which the physical MAC addresses may be modified.

[0061] Figure 6 conceptually illustrates a process 600 of some embodiments for replacing physical IP addresses with logical IP addresses before delivering a packet to an interface. In some embodiments, the destination MFE for a packet (i.e., the MFE to which the destination interface for the packet connects) performs the process 600 on the packet upon receiving the packet from the physical datacenter network.

[0062] As shown, the process 600 begins by receiving (at 605) a logical network packet with physical IP addresses. The packet, as received, will have physical IP addresses that may correspond to logical interfaces or that may be tunnel endpoint addresses in an

encapsulation header. These physical IP addresses, in some embodiments, are the IP addresses either added as encapsulation headers or replaced in the packet by the source MFE (e.g., using a process such as that shown in Figure 5). In Figure 8, the packet 800 has the same source and destination physical IP addresses when received by the destination MFE 815 as when sent by the source MFE 805, though different physical MAC addresses owing to the routing through the physical datacenter network 810. The same is true in the example shown in Figure 9.

[0063] Thus, the process 600 determines (at 610) whether the packet is encapsulated.

In some embodiments, the IP addresses will be different for encapsulated packets as compared to non-encapsulated packets. Specifically, if the source and destination IP addresses correspond to tunnel endpoints of the source and destination MFEs, then the packet is encapsulated. On the other hand, if the source and destination IP addresses are unique physical IP addresses in the logical to physical IP address mapping table of the MFE, then the packet was sent using address replacement. If the packet is encapsulated, the process decapsulates (at 615) the packet and proceeds to 645, described below. It should be noted that, in some embodiments, the MFE performs additional processing to determine that the packet is not sent to an IP address associated with neither a VTEP nor a unique physical IP address that maps to a logical IP address. For example, management traffic or other types of traffic may be received and processed by the MFE in some embodiments.

[0064] If the packet is not encapsulated (i.e., because address replacement was performed on the packet at the source MFE), the process 600 essentially performs the opposite operations of those in Figure 5. The process 600 determines (at 620) whether the protocol field matches one of a set of pre-specified mapped values. This identifies whether the protocol field value is one of the reserved or unused values to which a particular protocol field value (e.g., ICMP) is mapped. If this is the case, the process replaces (at 625) the protocol field value with the original value. For example, in Figure 9, the MFE 905 maps the value 143 (a reserved value) back to the original value of 1 (for ICMP).

[0065] The process 600 identifies (at 630) the logical IP address and interface for the source and destination physical IP addresses. As noted, each physical IP address maps not just to a logical IP address but also to a logical interface. While the source interface is not necessarily critical for the destination MFE (although it could be, depending on the processing required), the destination interface is important in terms of delivering the packet to the appropriate interface.

[0066] Based on the information identified from the physical IP addresses, the process 600 replaces (at 635) the physical IP addresses in the packet with the identified logical IP addresses. These should be the logical IP addresses that were in the packet prior to address replacement by the source MFE. In addition, the process replaces (at 640) the physical MAC addresses with logical MAC addresses based on the logical network topology. If the source and destination interfaces are on the same logical switch, then the MAC addresses will be those that correspond to these interfaces. However, if the source interface is on a different logical switch from the destination interface, then the MAC address of the logical router interface that connects to the destination logical switch will be the source MAC address.

[0067] In Figure 8, the source and destination IP addresses are converted back into

10.1.1.5 and 10.1.1.6, respectively, by the MFE 815. Similarly, because the source and destination interfaces (VM1 and VM2) are on the same logical switch 705, both the source and destination logical MAC addresses are those that correspond to the interfaces (i.e., the same as when the packet was sent to the MFE 805). However, in Figure 9, the source logical MAC address in the packet 900 as sent from the MFE 905 to the destination VM4 is MAC F, the address of the logical router interface that connects to the logical switch 710. In addition, the destination logical MAC address for the packet is MAC D, the MAC address of the destination VM4. The MFE 905 identifies that the source interface is on a different logical switch 705 based on the network topology, and performs this MAC address replacement.

[0068] Having completed the reverse address replacement (or having decapsulated the packet), the process 600 performs (at 645) any additional logical processing, such as applying egress ACL rules, additional distributed firewall rules, etc. The process then delivers (at 650) the packet to the identified destination interface.

[0069] Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When these instructions are executed by one or more processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

[0070] In this specification, the term "software" is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.

[0071] Figure 10 conceptually illustrates an electronic system 1000 with which some embodiments of the invention are implemented. The electronic system 1000 can be used to execute any of the control, virtualization, or operating system applications described above. The electronic system 1000 may be a computer (e.g., a desktop computer, personal computer, tablet computer, server computer, mainframe, a blade computer etc.), phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1000 includes a bus 1005, processing unit(s) 1010, a system memory 1025, a readonly memory 1030, a permanent storage device 1035, input devices 1040, and output devices 1045.

[0072] The bus 1005 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1000. For instance, the bus 1005 communicatively connects the processing unit(s) 1010 with the read-only memory 1030, the system memory 1025, and the permanent storage device 1035.

[0073] From these various memory units, the processing unit(s) 1010 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments.

[0074] The read-only-memory (ROM) 1030 stores static data and instructions that are needed by the processing unit(s) 1010 and other modules of the electronic system. The permanent storage device 1035, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1000 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1035.

[0075] Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 1035, the system memory 1025 is a read-and-write memory device. However, unlike storage device 1035, the system memory is a volatile read-and-write memory, such a random access memory. The system memory stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 1025, the permanent storage device 1035, and/or the read-only memory 1030. From these various memory units, the processing unit(s) 1010 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.

[0076] The bus 1005 also connects to the input and output devices 1040 and 1045.

The input devices enable the user to communicate information and select commands to the electronic system. The input devices 1040 include alphanumeric keyboards and pointing devices (also called "cursor control devices"). The output devices 1045 display images generated by the electronic system. The output devices include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as a touchscreen that function as both input and output devices.

[0077] Finally, as shown in Figure 10, bus 1005 also couples electronic system 1000 to a network 1065 through a network adapter (not shown). In this manner, the computer can be a part of a network of computers (such as a local area network ("LAN"), a wide area network ("WAN"), or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1000 may be used in conjunction with the invention.

[0078] Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is

executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

[0079] While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.

[0080] As used in this specification, the terms "computer", "server", "processor", and

"memory" all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification, the terms "computer readable medium," "computer readable media," and "machine readable medium" are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

[0081] This specification refers throughout to computational and network environments that include virtual machines (VMs). However, virtual machines are merely one example of data compute nodes (DCNs) or data compute end nodes, also referred to as addressable nodes. DCNs may include non-virtualized physical hosts, virtual machines, containers that run on top of a host operating system without the need for a hypervisor or separate operating system, and hypervisor kernel network interface modules.

[0082] VMs, in some embodiments, operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.). The tenant (i.e., the owner of the VM) can choose which applications to operate on top of the guest operating system. Some containers, on the other hand, are constructs that run on top of a host operating system without the need for a hypervisor or separate guest operating system. In some embodiments, the host operating system uses name spaces to isolate the containers from each other and therefore provides operating-system level segregation of the different groups of applications that operate within different containers. This segregation is akin to the VM segregation that is offered in hypervisor-virtualized environments that virtualize system hardware, and thus can be viewed as a form of virtualization that isolates different groups of applications that operate in different containers. Such containers are more lightweight than VMs.

[0083] Hypervisor kernel network interface modules, in some embodiments, is a non- VM DCN that includes a network stack with a hypervisor kernel network interface and receive/transmit threads. One example of a hypervisor kernel network interface module is the vmknic module that is part of the ESXi™ hypervisor of VMware, Inc.

[0084] It should be understood that while the specification refers to VMs, the examples given could be any type of DCNs, including physical hosts, VMs, non-VM containers, and hypervisor kernel network interface modules. In fact, the example networks could include combinations of different types of DCNs in some embodiments.

[0085] While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, a number of the figures (including Figures 2, 3, 5, and 6) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.