Audio output for .NET

A practical solution for outputting an PCM audio from any source. My solution utilises DirectSound so the SlimDX interfacing library will be required.

You can get the source code here: DirectSound

Please don't forget to get SlimDX as well.

Introduction

After gaining familiarity with all the frameworks provided with .NET 3.5 I was surely somewhat surprised by the lack of some audio output features. Surely, there is the SoundPlayer class in System.Media to do the job, but once your requirements call for playing a stream that comes from the network you are left alone in the dark. The SoundPlayer gives you the opportunity to play a WAV file only.

So how can you generally output audio under Windows? There are several APIs that were introduced for this purpose over the years. I've opted to utilise DirectSound that was the most recent one Microsoft added, soon to be replaced with something far more spectacular for sure. DirectSound is part of DirectX and is already available on Windows XP and later. DirectX was made to suit the game developers and provides only an unmanaged API as such. Even though more and more parts of .NET are relying on it (the whole WPF for example), there is no official managed code API available. This lack of still hinders the mainstream acceptance of the platform, but luckily the community has stepped in and provided the SlimDX project which tries to bridge the gap.

DirectSound

A quick description of DirectSound is in order as to clarify the need for my current implementation.

DirectSound works with buffers and has one primary buffer that is directly connected with the audio hardware and many secondary buffers that are held in main memory. The main task it performs is to mix the contents of all its buffers into the primary one. The primary buffer is usually directly mapped into hardware and is where the hardware takes its data. No much control over it is provided. Application developers are advised to use any number of secondary buffers that can be a lot more configured like size, channels, repetitions and more. Also the playback of each secondary buffer can be controlled. So your application is responsible for filling the buffer with data, starting and stopping it when necessary.

Well this functionality is really neat when you have a static sound you want to play in your game, but for a changing stream of audio you need a bit more work.

The Solution

What I really needed was to be able to play a stream of audio coming from the network. While the network does not guarantee that the stream comes with the right bit-rate, a way to iron this out was also implemented.

The solution that I found most appropriate for these needs was to provide an write-only stream that will inherit from System.IO.Stream and will do its own buffer management, start the playback when data is written and stop playing when the written bytes are used up. The producer (or source of the audio) can write out data at any rate. This way a simple interface is provided and the responsibilities are kept separate.

So the interface I called ISoundManager has only one function - GetSoundStream - that returns a write-only stream.

Stream GetSoundStream(int samplesPerSecond, short channels, short bitsPerSample);

The three parameters you need to provide are the configuration of your stream:

  • samplesPerSecond - rate of playback - PCM samples per second
  • channels - Number of channels interleaved in the PCM stream
  • bitsPerSample - Number of bits per sample, e.g. 8, 16

ISoundManager also inherits from IDisposable so that all unmanaged resources can be freed.

The actual manager implementation is the SoundManager class that needs the window handle of your main application window when initializing the unmanaged DirectSound code. In my implementation the constructor takes an System.Windows.Window instance for my WPF main window. If you don't use WPF you can change this code to pass on the window handle instead.

The stream implementation and DirectSound buffer management is to be found in the private SoundStream class.

As DirectSound does the audio mixing, you can have any number of sound streams playing concurrently, just call GetSoundStream any number of times to get a new stream. While using streams with different configurations, some re-sampling will need to take place, so please consult the DirectSound documentation as to the specifics. It will down-sample all your streams to the slowest one.

With this simple interface you can play any source of audio you can imagine, write your own tracker application or audio editing tool. One more feature of the System.IO.Stream class is that you can chain streams together so let's see if somebody will beat me to an Ogg Vorbis decoding stream :)