XNA and webcam stream as background

Recently I worked on a XNA project that needed to show the stream from a web cam as the background of the game.

After some searches I found nothing that worked with XNA 4.0. So I ended up putting together all the bits of info I found and created a component that you can use as you like.

For capturing the stream I used the DirectShow.Net Library. This library contains all enums, interfaces, etc. of DirectShow. Here is an extract from their site:

The purpose of this library is to allow access to Microsoft’s DirectShow functionality from within .NET applications. This library supports both Visual Basic .NET and C#, and theoretically, should work with any .NET language.
Microsoft’s managed solution to allowing access to DirectShow from .NET isn’t nearly as complete as the DirectShow interfaces for C++. For developers who want the complete range of functionality of DirectShow in .NET, this library provides the enums, structs, and interface definitions to access them.

First of all we need to include the DirectShowLib to our project and create a new class that implement ISampleGrabberCB.
Then we create a constructor that accept the GraphicDevice of our game as parameter (we will need it to handle the frame and its lock during the Draw part)

using DirectShowLib;
using Microsoft.Xna.Framework;
using Microsoft.Xna.Framework.Graphics;

public class VideoCapture : ISampleGrabberCB
{
  protected GraphicsDevice GraphicsDevice;

  public VideoCapture(GraphicsDevice GraphicsDevice)
  {
    this.GraphicsDevice = GraphicsDevice;
    Initialize();           
  }
}

The following piece of code is an edit from the example of directshow.net that intialize and access the webcam.

protected ICaptureGraphBuilder2 CaptureGraphBuilder;
protected IGraphBuilder GraphBuilder;
protected IMediaControl MediaControl;
protected ISampleGrabber SampleGrabber;
protected Thread UpdateThread;
protected int Width = 640;
protected int Height = 480;
protected int DEVICE_ID = 0;
private bool isRunning;

protected void Initialize()
{
  FrameReady = false;
  frame = new Texture2D(GraphicsDevice, Width, Height, false, SurfaceFormat.Color);
  FrameBGR = new byte[(Width * Height) * 3];
  FrameRGBA = new byte[(Width * Height) * 4];
  GraphBuilder = (IGraphBuilder)new FilterGraph();
  CaptureGraphBuilder = (ICaptureGraphBuilder2)new CaptureGraphBuilder2();
  MediaControl = (IMediaControl)GraphBuilder;
  CaptureGraphBuilder.SetFiltergraph(GraphBuilder);
  object VideoInputObject = null;
  IBaseFilter VideoInput = null;
  IEnumMoniker classEnum;
  ICreateDevEnum devEnum = (ICreateDevEnum)new CreateDevEnum();
  devEnum.CreateClassEnumerator(FilterCategory.VideoInputDevice, out classEnum, 0);
  Marshal.ReleaseComObject(devEnum);
  if (classEnum != null)
  {
    IMoniker[] moniker = new IMoniker[1];
    if (classEnum.Next(moniker.Length, moniker, IntPtr.Zero) == DEVICE_ID)
    {
      Guid iid = typeof(IBaseFilter).GUID;
      moniker[0].BindToObject(null, null, ref iid, out VideoInputObject);
    }
    Marshal.ReleaseComObject(moniker[0]);
    Marshal.ReleaseComObject(classEnum);
    VideoInput = (IBaseFilter)VideoInputObject;
  }
  if (VideoInput != null)
  {
    isRunning = true;
    SampleGrabber = new SampleGrabber() as ISampleGrabber;
    GraphBuilder.AddFilter((IBaseFilter)SampleGrabber, "Render");
    AMMediaType Type = new AMMediaType();
    Type.majorType = MediaType.Video;
    Type.subType = MediaSubType.RGB24;
    Type.formatType = FormatType.VideoInfo;
    SampleGrabber.SetMediaType(Type);
    GraphBuilder.AddFilter(VideoInput, "Camera");
    SampleGrabber.SetBufferSamples(false);
    SampleGrabber.SetOneShot(false);
    SampleGrabber.GetConnectedMediaType(new AMMediaType());
    SampleGrabber.SetCallback((ISampleGrabberCB)this, 1);
    CaptureGraphBuilder.RenderStream(PinCategory.Preview, MediaType.Video, VideoInput, null, SampleGrabber as IBaseFilter);
    UpdateThread = new Thread(new ThreadStart(UpdateBuffer));
    UpdateThread.Start();
    MediaControl.Run();
    Marshal.ReleaseComObject(VideoInput);                
  }
}


As you can see I do two things. One is starting a new thread to handle the frames coming from the webcam and two, is using a variable “isRunning” to remember if it’s running or not.
The code that handle the frame is as follow:

protected byte[] FrameBGR;
protected byte[] FrameRGBA;
protected bool FrameReady;

protected void UpdateBuffer()
{
 int samplePosRGBA = 0;
 int samplePosRGB24 = 0;
 while (isRunning)
 {
  for (int y = 0, y2 = Height - 1; y < Height; y++, y2--)
  {
   for (int x = 0; x < Width; x++)
   {
    samplePosRGBA = (((y2 * Width) + x) * 4);
    samplePosRGB24 = ((y * Width) + (Width - x - 1)) * 3;
    FrameRGBA[samplePosRGBA + 0] = FrameBGR[samplePosRGB24 + 2]; //R
    FrameRGBA[samplePosRGBA + 1] = FrameBGR[samplePosRGB24 + 1]; //G
    FrameRGBA[samplePosRGBA + 2] = FrameBGR[samplePosRGB24 + 0]; //B
    FrameRGBA[samplePosRGBA + 3] = (byte)255; //Alpha
   }
  }
  FrameReady = false;
  while (!FrameReady) Thread.Sleep(20);
 }
}

public int BufferCB(double SampleTime, IntPtr pBuffer, int BufferLen)
{
 Marshal.Copy(pBuffer, FrameBGR, 0, BufferLen);
 FrameReady = true;
 return 0;
}

public int SampleCB(double SampleTime, IMediaSample pSample)
{
 return 0;
}

What I do is convert the array of byte of the image to the the array needed for the Texture2D of XNA (as you can see I had to implement the bytes for the alpha channel)
Now what is missing is to let our game access the Texture2D containing our webcam's frame.

protected Texture2D frame;
public Texture2D CurrentFrame
{
  get
  {
    if (frame.GraphicsDevice.Textures[0] == frame)
      frame.GraphicsDevice.Textures[0] = null;
    frame.SetData(FrameRGBA);
    return frame;
  }
}

Before calling "frame.SetData<>" we have to reset the current Texture of the graphics device, otherwise we will get an InvalidOperationException stating that "You may not call SetData on a resource while it is actively set on the GraphicsDevice. Unset it from the device before calling SetData.".
The last thing to do is to handle the dispose of our object. Here come in help our variable isRunnig. That's because without it we would end up in what someone call "Deadlock city". In fact, if we let the update thread go on after the dispose, you will dive into a loop having the webcam waiting for the update thread to complete and the thread waiting for new frames.

public void Dispose()
{
  isRunning = false; 
  Thread.Sleep(100); // wait that update thread finish
  if (MediaControl != null)
    MediaControl.StopWhenReady();
  Marshal.ReleaseComObject(MediaControl);
  Marshal.ReleaseComObject(GraphBuilder);
  Marshal.ReleaseComObject(CaptureGraphBuilder);
  CaptureGraphBuilder = null;
  GraphBuilder = null;
  MediaControl = null;
  frame.Dispose();
  frame = null;
  Marshal.ReleaseComObject(SampleGrabber);
  SampleGrabber = null;
}

Now from our Game1.cs we can do the following to draw the webcam's frames.

private VideoCapture capture;

protected override void LoadContent()
{
  spriteBatch = new SpriteBatch(GraphicsDevice);
  capture = new VideoCapture(GraphicsDevice);
}

protected override void UnloadContent()
{
  capture.Dispose();
}

protected override void Update(GameTime gameTime)
{
  base.Update(gameTime);
}

protected override void Draw(GameTime gameTime)
{
  GraphicsDevice.Clear(Color.CornflowerBlue);
  spriteBatch.Begin();
  spriteBatch.Draw(capture.CurrentFrame, new Rectangle(0, 0, ScreenWidth, ScreenHeight), Color.White);
  spriteBatch.End();
  base.Draw(gameTime);
}

That's all. If you want the complete class you can get it on GitHub here.

In the next days I will develop the principles explained here to do a simple face detection game (to follow face recognition) with Emgu and I will also publish a complete sample on GitHub

Update:
I have implemented the suggestions from Matthew about using a lock. You can find a working sample here