Thursday, July 8, 2010

Writing Open Source ESX Network Drivers

Thankfully VMWare has allowed ESXi 3.5 and 4 to be downloaded and used for free. In my office we find that the ESX Hypervisor offers the most usable features (as compared to Oracle VM) but at the expense that the Hypervisor only a limited of number of NICs which are supported out of the box.

Fortunately, due to efforts for websites like http://www.vm-help.com/esx/esx3i/customize_oem_tgz.php and http://sourceforge.net/projects/open-vdrivers, I can now use more common NICs with my whitebox ESX box. But I found that I have a number of Realtek chipset cards (such as the D-Link DFE 538T) lying around and I would like to use them rather than invest in Intel E1000 cards.

So starting with information from these websites, http://www.kernelcrash.com/blog/using-a-marvell-lan-card-with-vmware-esxi-35/2009/08/14/ and http://sourceforge.net/projects/open-vdrivers/forums/forum/915617/topic/3358585?message=7548991, I managed to build a ESX 3.5u4 driver for the RTL 8139 chipset (and its family).

This is my experience of building the driver.

At first I tried using the source for 8139too.c from the kernel 2.4.18 as recommended by KernelCrash. But I hit a road-block as I could not figure out how to fix the reference to skb_copy_and_csum_dev function. In the end I gave up and looking around at various 8139 driver source, I finally found that I could compile the source from 2.4.10 and load it into the ESX system.

There is a simple way to check for unresolved module references without needing the actual PC and card. I created a bootable USB to run ESX and I put a customized oem.tgz driver file into it and booted it up on any PC that will support it. Then I used the command "vmkload_mod new_module.o" to see if the module will load or will it complain. The list of unresolved reference will show up and you can then go back to the source to fix the references.

The next problem was that though the new module will load without problems, it will not been usable by the ESX system. After the driver is loaded, it shows up in the "lspci -p" command but not in "esxcfg-nics -l" listing. And you cannot see it on the command console (yellow band screens). So then I started comparing what unusual lines appear in the source codes for r8168 and r8169 drivers. I found that these source had 2 new lines, in particular,

SET_MODULE_OWNER(dev);
SET_NETDEV_DEV(dev, &pdev->dev);

So I added these lines, near the "alloc_etherdev" line and now there seems to be some
impact when the driver was loaded. Unfortunately, the ESX kernel now crash when it tries to load the 8139too driver. And it has something to do with the copy_from_user routine. Oh and by the way, I also had to rewrite the "interruptible_sleep_on_wait()" to use "schedule_timeout()" and other related lines.

Going back to the drawing board, I looked at the 2.4.18 source again and put in the above 2 lines. In the end, I substituted the "skb_copy_and_csum_dev()" routine with the set of lines from the early version of the driver, which uses dma_addr, etc, and now the driver can compile and there are no more unresolved references other than simple to fix ones.

Now I had a 8139too driver than works on ESXi 3.5 kernel. I have tested it for a while now and it seems to work, though I am not sure if it is efficient.

3 comments:

  1. do you have the oem.tgz file available to the public?

    ReplyDelete
  2. I will provide the oem.tgz file if you can send me your email address. Or alternatively suggest where can I publicly upload the file.

    ReplyDelete
  3. Hi Cheelee, thanks for the post!

    I have installed ESXi 3.5-update5 on ASUS P5NSLI. Surely the onboard NIC is not supported, so I decided to try it with RTL8139.
    - do you still have the oem.tgz ?
    - can you provide any update on installing RTL8139 in ESXi 3.5 or above?

    Regards,
    Alex

    ReplyDelete