Thursday, July 8, 2010

Writing Open Source ESX Network Drivers

Thankfully VMWare has allowed ESXi 3.5 and 4 to be downloaded and used for free. In my office we find that the ESX Hypervisor offers the most usable features (as compared to Oracle VM) but at the expense that the Hypervisor only a limited of number of NICs which are supported out of the box.

Fortunately, due to efforts for websites like http://www.vm-help.com/esx/esx3i/customize_oem_tgz.php and http://sourceforge.net/projects/open-vdrivers, I can now use more common NICs with my whitebox ESX box. But I found that I have a number of Realtek chipset cards (such as the D-Link DFE 538T) lying around and I would like to use them rather than invest in Intel E1000 cards.

So starting with information from these websites, http://www.kernelcrash.com/blog/using-a-marvell-lan-card-with-vmware-esxi-35/2009/08/14/ and http://sourceforge.net/projects/open-vdrivers/forums/forum/915617/topic/3358585?message=7548991, I managed to build a ESX 3.5u4 driver for the RTL 8139 chipset (and its family).

This is my experience of building the driver.

At first I tried using the source for 8139too.c from the kernel 2.4.18 as recommended by KernelCrash. But I hit a road-block as I could not figure out how to fix the reference to skb_copy_and_csum_dev function. In the end I gave up and looking around at various 8139 driver source, I finally found that I could compile the source from 2.4.10 and load it into the ESX system.

There is a simple way to check for unresolved module references without needing the actual PC and card. I created a bootable USB to run ESX and I put a customized oem.tgz driver file into it and booted it up on any PC that will support it. Then I used the command "vmkload_mod new_module.o" to see if the module will load or will it complain. The list of unresolved reference will show up and you can then go back to the source to fix the references.

The next problem was that though the new module will load without problems, it will not been usable by the ESX system. After the driver is loaded, it shows up in the "lspci -p" command but not in "esxcfg-nics -l" listing. And you cannot see it on the command console (yellow band screens). So then I started comparing what unusual lines appear in the source codes for r8168 and r8169 drivers. I found that these source had 2 new lines, in particular,

SET_MODULE_OWNER(dev);
SET_NETDEV_DEV(dev, &pdev->dev);

So I added these lines, near the "alloc_etherdev" line and now there seems to be some
impact when the driver was loaded. Unfortunately, the ESX kernel now crash when it tries to load the 8139too driver. And it has something to do with the copy_from_user routine. Oh and by the way, I also had to rewrite the "interruptible_sleep_on_wait()" to use "schedule_timeout()" and other related lines.

Going back to the drawing board, I looked at the 2.4.18 source again and put in the above 2 lines. In the end, I substituted the "skb_copy_and_csum_dev()" routine with the set of lines from the early version of the driver, which uses dma_addr, etc, and now the driver can compile and there are no more unresolved references other than simple to fix ones.

Now I had a 8139too driver than works on ESXi 3.5 kernel. I have tested it for a while now and it seems to work, though I am not sure if it is efficient.

Saturday, May 29, 2010

Moving VMs

These days I needed to rebalance the VMs I have running and move the VMs off from an OracleVM server to a VMware ESX 3i environment. Most of these VMs are running some
version of Linux.

I have a number of ways to move the filesystem from the VM in the old server to the new one.

Before you start you will need to make sure that you have created a VM with sufficient disk space to be able to "copy" all used disk space to the new VM. You can do a "df -h" and start adding up the Used column.

Note that you must not have any process that keeps state in memory running such as databases, application servers, etc. Please shutdown these down before doing the transfer.

These are the steps:

1. Create the new VM

After the VM instance has been created, boot into a minimal Linux distro, in my case I used SystemRescueCD version 1.3.5.

2. Make the partitions on the new Virtual Disk

Next create the partitions on the new virtual disk. You can use "fdisk" for this remember to make a "swap partition" (type 82) and also make the "/" or "/boot" partition as Active, otherwise it may not boot.

Also, the boot partition (or the root partition if you don't want a boot partition) must be as
closer as possible to the beginning of the virtual disk due to the typical limitation of "grub" and boot BIOS.

3. Put a filesystem on them

After the partitions have been created, you need to format them for the appropriate filesystem.
Generally most older Linux distros use EXT3 for the root and boot partitions. There is caveat here in that the newer Linux distro use a 256 bytes INODE size which will prevent it from booting using a older "grub" version. The older ext3 partitions use 128 bytes.

To fix this, please use the following command:

# mkfs.ext3 -I 128 /dev/sda1

assuming that the new root/boot partition is on partition 1.

For the swap partition, you need to use:

# mkswap /dev/sda2

assumming that the swap partition 2.

4. Mount them and copying the filesystem

The next step is when we start to clone the file system onto the new virtual disk.
Mount the new partition on the SystemRescueCD filesystem. I used a new directory "temp" under the "/mnt" directory.

# mkdir /mnt/temp
# mount /dev/sda1 /mnt/temp

To copy the filesystem from the old VM you must ensure that the new VM can reach it via the network, and so you need to configure the network interface on the SystemRescueCD.

# ifconfig eth0 X.X.X.X
# ping old_server_ip_address

You have a number of different ways to clone the filesystem:

a) using "dump" and "restore"

# cd /mnt/temp
# (ssh root@src_server dump -f - /) | restore -rf -

This assumes that you can log in to the old server via SSH on the root user. If not, you may need
to enable this.

Note that this is only for the root ("/") filesystem. Repeat the process for each of the mounted filesystem on the old server. You need not have the same set of mounted partitions on the new virtual disk and you can consolidate all of them into a single large partition, including the /boot partition.

# cd /mnt/temp/boot
# (ssh root@src_server dump -f - /boot) | restore -rf -

b) Using RSYNC

Sometimes I find that the dump/restore fails. In that case, rather than restarting from beginning I use RSYNC to only transfer the missing files. For example if the /u01 partition failed halfway you can run this command to continue the transfer:

# cd /mnt/temp
# rsync -avz root@src_server:/u01 .

c) Using tar

Previously I used "tar" to do the cloning and it still works.

# cd /mnt/temp
# (ssh root@src_server tar -c /) | tar -C /mnt/temp -xv

d) Using cpio

# (ssh root@src_server "find ./source -depth -print | cpio -cvo" ) | cpio -icvmdI

e) Using nc (netcat) command

I have not tried this method but it is suppose to work:

On the source machine:
# cd /source
# tar -czpsf - . | pv -b | nc -l 3333

On the destination machine (new VM):
# cd /mnt/sda1
# nc src_server 3333 | pv -b | tar -xzpsf -

5. Making the new VM bootable

If you are using "grub" for booting the VM, you may need to perform the following
steps.

a) copy the minimal devices to the new /dev directory
# cd /mnt/temp/dev
# cp -a /dev/sda* .
# cp -a /dev/core .
# cp -a /dev/null .
# cp -a /dev/zero .
# cp -a /dev/console .
# cp -a /dev/mem .
# mkdir pts

b) fixing back the /etc/fstab file
You may need to edit the /etc/fstab file to point to the correct devices to
mount

c) running in a chroot environment

# chroot /mnt/temp /bin/bash
# mount -t proc none /proc
# mount -t sysfs none /proc
# mount -t devpts none /dev/pts

d) installing grub

# cd /boot/grub
# grub
grub> root (hd0,0)
grub> find /boot/grub/stage1
grub> setup (hd0)

6. Fixing back the kernel and/or the initrd.img

In some cases, you need to install an appropriate kernel into the new VM if the
old one cannot be use (e.g. if you are running a Xen DomU kernel).

Or, you may have a kernel that has a default initrd.img without the correct disk
driver (e.g. mptscsi or sd_mod) module. You may need to re-create it and fix back
the /boot/grub/grub.conf configuration to reflect the correct initrd.img.

7. Reboot the new VM

UPDATE:

In some cases the server will complain that the filesystem is using unsupported features especially on EXT3 filesystem. You need to then disable these features and you need to use the "debugfs" utility. For example on the Centos servers, the older versions do not support "resize_inode" option. In this case, you need to turn this off by using:

# debugfs -w /dev/sda1
debugfs: feature -resize_inode
debugfs: close
#

Thursday, March 25, 2010

Extending the use of the Lokatoo A1000

I recently received a Lokatoo A1000 GPS unit which has a Windows CE 5.0 OS built in.

The basic user menu and GPS software is usable but with a large screen (relatively) and SD card expansion capability (up to 4GB), I wanted to see if I can get out of the device.

You can access the SD card either by using the USB cable (which turns the device into a SD card reader) or plug it out and mount it on USB card reader to connect to your PC.

Hijacking the Navigation button

First thing first. Without making major changes to the system, you can hijack the "Navigation" button on the user menu to start your own program. Any program that you want to run on the device must be compatible with Windows CE 5.0 based on the ARM-I CPU. The device comes with 64MB SDRAM.

If you are accessing the device using a USB cable, the SD card appears as the second drive. To access the data and programs from within the Windows CE OS, it is mounted as \SDMMC.

To hijack the "Navigation" menu button, you need to edit a file in the SD card, "GpsRunfile.txt".
Put the full path to the program that you want to run into this text file. E.g. "\SDMMC\Garmin\Garmin.exe". Once this is done, you can now activate this program by clicking the "Navigation" icon on the boot up menu screen.

Adding Another User Menu

Since the boot up user menu cannot be easily changed, I used a program called CEMENU, which can get from http://sourceforge.net/projects/cemenu/, to create a second level, user-configurable menu. Read the included documentation on how to configure the XML files used to define the menu options.

Getting to the Windows Explorer

Since this device runs on Windows CE, you can actually start the Windows Explorer in the device. The program is located at "\Windows\explorer.exe". You can either set this in the "GpsRunfile.txt" file or configured it as an option in the CEMENU program mentioned above.

Reading PDF files

One limitation of the default software is that the ebook reader does not work on PDFs. So I tried using Foxit PDF reader, http://www.foxitsoftware.com/pdf/reader/, or you can use PocketXpdf, http://pocketxpdf.sourceforge.net/. However, note that these programs will give an exception when you try to open the file explorer if the Windows Explorer is not running. So run these programs only from the Windows Explorer interface.

Coding in Java

You can also build new programs using Java but you will need to appropriate JDK to compile and run the Java programs. The JDK / JVM that worked for me is the phoneme version, https://phoneme.dev.java.net/. If you other successes please let me know. The phoneme JVM supports both the CDLC and Personal versions of the JavaME standard.

Coding in Embedded Visual C++

Lastly, if you want to code at a native level you can try using the Embedded Visual C++ IDE form Microsoft which is free for download. You will need 3 installations:
1. eVC4 (Embedded Visual C++ 4.0)
2. eVC4SP4 (Service Pack 4)
3. Standard_SDK for Windows CE 5.0.

You can install these into Windows XP, develop your application and compile it for ARMi. Then
copy the executable to the SD card to run.