A face tracking sample with XNA and Emgu

In my previous post, we saw how to implement a simple stream from a web cam as a background in an XNA application. Now I will go a little further and implement a simple face tracking algorithm.

For this scope we will need to download the Emgu libraries. This libraries are a wrap for C# of the well known OpenCV.

Before we start, if you want to understand the principles of face detection I can suggest to start digging from here:

Now we can go on and do some changes to our webcam component. First we need to include the dll from emgu in our project. We will need three of them:

Emgu.CV
Emgu.CV.UI
Emgu.Util

After that, we need to copy the dlls needed by emgu in our bin\debug folder (if you are using a version more recent than 2.2, copy the x64 or x86 folder to your bin\debug folder).

We do not need anymore the DirectShowLib because we will handle the capture of the web cam stream with Emgu. So, after removing the code from DirectShowLib and changing the method to use the Capture component of Emgu, we will end with something like that:

public class VideoEmgu
{
  GraphicsDevice device;
  Texture2D frame;
  Capture capture;
  Image nextFrame;
  ThreadStart thread;
  public bool IsRunning;
  public Color[] colorData;

  public Texture2D Frame
  {
    get
    {
      if (frame.GraphicsDevice.Textures[0] == frame)
        frame.GraphicsDevice.Textures[0] = null;
      frame.SetData(0, null, colorData, 0, colorData.Length);
      return frame;
    }
  }
  
  public VideoEmgu(GraphicsDevice device)
  {           
    this.device = device;
    capture = new Capture();
    frame = new Texture2D(device, capture.Width, capture.Height);
    colorData = new Color[capture.Width * capture.Height];
  }

  public void Start()
  {
    thread = new ThreadStart(QueryFrame);
    IsRunning = true;
    thread.BeginInvoke(null, null);
  }

  public void Dispose()
  {
    IsRunning = false;
    capture.Dispose();
  }
  
  private void QueryFrame()
  {
    while (IsRunning)
    {
      //Flip the image to have mirror effect
      nextFrame = capture.QueryFrame().Flip(FLIP.HORIZONTAL);
      if (nextFrame != null)
      {
        byte[] bgrData = nextFrame.Bytes;
        for (int i = 0; i < colorData.Length; i++)
          colorData[i] = new Color(bgrData[3 * i + 2], bgrData[3 * i + 1], bgrData[3 * i]);
      }
    }
  }
}

The code do exactly the same thing as before, I only did a little variation implementing the method Start() to start the web cam instead of having it starting automatically on creation.
The code to convert the frame to a Texture2D haven't changed, it's still a transformation of the BGR to the RGB. I removed the wait part, because from QueryFrame() we will receive always a full frame.

Now we will add some variable to handle the face detection:

private Image gray = null;
//Emgu 2.4+ have changed the name of this file. so act accordingly
private HaarCascade haarCascade = new HaarCascade("haarcascade_frontalface_default.xml");

We can now change the QueryFrame like this

private void QueryFrame()
{
  while (IsRunning)
  {
    nextFrame = capture.QueryFrame().Flip(FLIP.HORIZONTAL);
    if (nextFrame != null)
    {
      gray = nextFrame.Convert();
      MCvAvgComp[][] facesDetected = gray.DetectHaarCascade(haarCascade, 1.2, 2, HAAR_DETECTION_TYPE.DO_CANNY_PRUNING, new System.Drawing.Size(40, 40));
      foreach (MCvAvgComp face in facesDetected[0])
        nextFrame.Draw(face.rect, new Bgr(System.Drawing.Color.White), 2);
      byte[] bgrData = nextFrame.Bytes;
      for (int i = 0; i < colorData.Length; i++)
        colorData[i] = new Color(bgrData[3 * i + 2], bgrData[3 * i + 1], bgrData[3 * i]);
    }
  }
}

What we are doing is grabbing the frame and converting it in gray scale and then do the DetectHaarCascade(). What we get back is a list of detected faces (if any) so we can easily draw a rectangle around them before showing it on video.

For the options and tweaking that you can do to optimize the face detection, look at the Emgu's documentation here.
In my sample I put values to have a fast detection with a face near the cam, you should consider to lower the size (let's say 20x20) if you want to have more people on video. As usual play with the values to see what happens.

I have put together a working sample on github for you to play with, here.

If everything works as it should you will get something like this: