Wednesday, January 31, 2007

QEMU TAP bridge network configuration

Well, it is interesting to see how many posts out there on the web deals with this topic "configure tun/tap bridge for QEMU". Yet, I have to write one more page about it.

OK, here is why: 1st while most of them are informative, there has been no SINGLE page I read (dozens, if not more) that really helped me to configure the thing easily. 2nd, once I figured it out, it is unbelievable EASY!!! - it's just the lack of documentation make people stump. Last but not least, there are some docs out there are incorrect or misleading. While following their instruction may lead to a working configuration, it will cause you trouble later, or lead you to a wrong route when troubleshoot.

Here is what would have saved my 4 hours:

First and foremost, bridge in this case is really the ISO term of bridge. It is not something specific for QEMU. Actually, the bridge functionality is provided by Linux kernel, not QEMU - some docs on the net may lead you to believe bridge is a QEMU feature (well, QEMU does have this feature - by utilizing Linux feature, instead of providing the functionality itself).

Back to QEMU network 101, there are several ways to configure QEMU network, the default which happens to be the easiest is "-net user" this is a very neat function implemented by QEMU - it creates a virtual router (with virtual DHCP server, virtual DNS, etc) it uses 10.0.2.x address.
Just think about a home high-speed internet router, it gives you NAT addresses to all your computers at home, the router then connects to the WAN (read internet here) and forward traffic back and forth between your computers and the internet. The user mode QEMU does the same thing, QEMU gives your "Virtual computer" a "virtual IP - i.e." and QEMU talks to the real network.
This usually is good enough if you only surf the internet from your virtual computer. However, since not every application work with NAT, (and since most of the home network already use a NAT to share one real internet IP, the virtual computer is behind 2 NATs). You may want some "direct" connection for your virtual computer.

Now, here comes the confusing part. QEMU supports tap device ("-net tap" instead of "-net user"). And its official document says "... you can then configure it as if it was a real ethernet card." But how? if everyone is as familiar with TAP device as the QEMU author, probably I won't need to write this (at least it would have saved me 4 hours if someone has written something like this :-)

TUN/TAP device is "virtual ethernet driver", the official doc of tuntap.txt says: "TUN/TAP provides packet reception and transmission for user space programs. It can be seen as a simple Point-to-Point or Ethernet device, which, instead of receiving packets from physical media, receives them from user space program and instead of sending packets via physical media writes them to the user space program."

So, QEMU runs on the real computer as a program, can write packets to the TAP device. That's cool! QEMU can simply write all the virtual computer generated packets to the TAP device and forward packets to the virtual computer, right? Right! but it is only sufficient if your virtual computer wants to talk to the real computer it runs on ONLY. Because once the packets from the virtual computer reaches the TAP device, how does Linux know where to send the packets? That's not part of the TAP device functionality.

So here comes bridge.
The bridge is (sort of) the most reliable way of connecting your virtual computer to the real network. Unlike some docs on the internet suggested - you do NOT need to configure IP for the TAP device for the bridge to work, no IP forwarding, routing needed! That's why I said "1st, bridge is ISO bridge!" at the beginning. Because packets are forwarded at Ethernet level (layer 2 in terms of ISO standard), bridge does not even care about IP (sort of, you will see why you still need to manipulate your IP a little bit later.

For those who are not familiar with Ethernet bridging (this is not a scientific definition, rather it is an example tries to help people get a feeling of what bridge is): bridge is a device have 2 (or more) NIC (Network Interface Card), one NIC connected to one Ethernet, the other connected to a second Ethernet. The bridge then joins the 2 Ethernet together, to become one. Of course, a bridge can have more than 2 NIC and connects more than 2 networks. The good use of bridge for example is connecting a 1GBps Ethernet with a 10MBps Ethernet. Since you don't want to slow down the GB net with all the 10MB traffic, you keep them separate but connect them with a bridge. So they can talk to each other, but do not bother each other that much.

OK, back to how to bridge QEMU.
You must have already figured out you need to bridge your real NIC with the TAP device QEMU uses (once you start QEMU with -net tap, it creates a TAP device automatically).
One thing you need to know is once you start a bridge, you cannot manipulate your eth0 (or whatever real NIC device in your real Linux) anymore. "The reason?" I head. Well, since bridge is at level 2, (before IP level), it needs to be able to tell where a packet is going (by the MAC address in the packet). If you manipulate the eth0 device, your packet is going out to the real network directly, the bridge will not have access to the packet, hence it breaks the bridge. (think about a DHCP solution on our 1GBps network and 10MBps network. If you request a DHCP IP address on your eth0 - which is on your 1GBps network, your DHCP server has to be on the 1GBps network or it won't be able to get an IP address. In this situation, you will have to have 2 DHCP servers one on 1GBps, one on 10MBps. You will either create 2 segment of dynamic IPs or find a way to synchronize 2 DHCP servers leases).

So, to solve this problem (well, the problem is there only because you use your bridge-your Linux box-as a computer and a bridge; for a hardware bridge, it is transparent on the Ethernet.) you need to connect to the Ethernet though the bridge interface (once you configured it, you will see the bridge interface has the eth0 MAC address). So the bridge (read the Linux kernel bridge module) can then check the content of the packets and send the packets to the right network.

Since the IP layer is layer 3, which means it builds on top of the bridge which runs at layer 2, you will have to start your IP stack on top of the bridge instead of your real NIC (eth0 in most of the case).
Finally here comes the howto:
Be warned, your IP network will have to come down for a little while (for me this was about 5 seconds). This is because you need to start the IP address on the bridge interface later.
First, release your IP address assigned to your real NIC. I also rename my eth0 to reth0 (stands for Real ETH0) and create the bridge with name "eth0". The reason being it is simpler for your other programs (i.e. iptables which may use eth0 as interface name somewhere).
Then you create a bridge, and add 2 NIC (one is the real NIC the other is the TAP device).
You then re-acquire a DHCP lease with the bridge interface (in my case, is called eth0 now - hey I know it is confusing, but it is just a name, right?).
The following is tested on SLES 9, it should work on most of the linux distro with little change (i.e. the dhcp client tool may be different). You also need the bridge package which is not installed by default (at least on SLES 9 it is not)

# 1st, release all DHCP address and remove all IP address associated with the original eth0
/sbin/dhcpcd -k
/sbin/ip addr flush eth0
# then take the interface down so we can rename it
/sbin/ip link set eth0 down
# now rename the original eth0 to reth0 (Real ETH0)
/sbin/nameif -r reth0 eth0
# OK, bring the same interface (with new name though) back up
/sbin/ip link set reth0 up
# 2nd let's create a bridge called eth0 so other programs think they are talking to the same old interface (actually they will talk to the bridge which is a clone of the original eth0 - with name MAC addr)
/sbin/brctl addbr eth0
# then add both origianl eth0 and tap1 device to the bridge
/sbin/brctl addif eth0 tap1
/sbin/brctl addif eth0 reth0
echo "showing bridge mac addresses"
/sbin/brctl showmacs eth0
# 3rd, we need to bring the newly created bridge UP
/sbin/ip link set eth0 up
# 4th, renew the DHCP address if possible
/sbin/dhcpcd -n
/sbin/ip addr show

Please note, there are other ways of connecting your virtual computer to the real network with TAP device: you can configure routing on your real computer (I don't see a reason why I want to do that - if you use NAT, why don't you use the "-net user" instead which is much simpler; if you use real world IP, why don't you use bridge? To configure routing for the TAP, you need to start IP stack on the TAP device and start ip forwarding on the real Linux - unnecessarily complex for my environment).

Another good option is VDE (Virtual Distributed Ethernet). It is a very good solution, in some case, it maybe better than bridge. The best feature is you don't need the brief down time as the bridge solution. I found the description of VDE is much better than bridge and hence I will skip this part.