PulseAudio  5.0
Audio Streams

Overview

Audio streams form the central functionality of the sound server. Data is routed, converted and mixed from several sources before it is passed along to a final output. Currently, there are three forms of audio streams:

  • Playback streams - Data flows from the client to the server.
  • Record streams - Data flows from the server to the client.
  • Upload streams - Similar to playback streams, but the data is stored in the sample cache. See Sample Cache for more information about controlling the sample cache.

Creating

To access a stream, a pa_stream object must be created using pa_stream_new() or pa_stream_new_extended(). pa_stream_new() is for PCM streams only, while pa_stream_new_extended() can be used for both PCM and compressed audio streams. At this point the application must specify what stream format(s) it supports. See Sample Format Specifications and Channel Maps for more information on the stream format parameters. FIXME: Those references only talk about PCM parameters, we should also have an overview page for how the pa_format_info based stream format configuration works. Bug filed: https://bugs.freedesktop.org/show_bug.cgi?id=72265

This first step will only create a client-side object, representing the stream. To use the stream, a server-side object must be created and associated with the local object. Depending on which type of stream is desired, a different function is needed:

Similar to how connections are done in contexts, connecting a stream will not generate a pa_operation object. Also like contexts, the application should register a state change callback, using pa_stream_set_state_callback(), and wait for the stream to enter an active state.

Note: there is a user-controllable slider in mixer applications such as pavucontrol corresponding to each of the created streams. Multiple (especially identically named) volume sliders for the same application might confuse the user. Also, the server supports only a limited number of simultaneous streams. Because of this, it is not always appropriate to create multiple streams in one application that needs to output multiple sounds. The rough guideline is: if there is no use case that would require separate user-initiated volume changes for each stream, perform the mixing inside the application.

Buffer Attributes

Playback and record streams always have a server-side buffer as part of the data flow. The size of this buffer needs to be chosen in a compromise between low latency and sensitivity for buffer overflows/underruns.

The buffer metrics may be controlled by the application. They are described with a pa_buffer_attr structure which contains a number of fields:

  • maxlength - The absolute maximum number of bytes that can be stored in the buffer. If this value is exceeded then data will be lost. It is recommended to pass (uint32_t) -1 here which will cause the server to fill in the maximum possible value.
  • tlength - The target fill level of the playback buffer. The server will only send requests for more data as long as the buffer has less than this number of bytes of data. If you pass (uint32_t) -1 (which is recommended) here the server will choose the longest target buffer fill level possible to minimize the number of necessary wakeups and maximize drop-out safety. This can exceed 2s of buffering. For low-latency applications or applications where latency matters you should pass a proper value here.
  • prebuf - Number of bytes that need to be in the buffer before playback will commence. Start of playback can be forced using pa_stream_trigger() even though the prebuffer size hasn't been reached. If a buffer underrun occurs, this prebuffering will be again enabled. If the playback shall never stop in case of a buffer underrun, this value should be set to 0. In that case the read index of the output buffer overtakes the write index, and hence the fill level of the buffer is negative. If you pass (uint32_t) -1 here (which is recommended) the server will choose the same value as tlength here.
  • minreq - Minimum number of free bytes in the playback buffer before the server will request more data. It is recommended to fill in (uint32_t) -1 here. This value influences how much time the sound server has to move data from the per-stream server-side playback buffer to the hardware playback buffer.
  • fragsize - Maximum number of bytes that the server will push in one chunk for record streams. If you pass (uint32_t) -1 (which is recommended) here, the server will choose the longest fragment setting possible to minimize the number of necessary wakeups and maximize drop-out safety. This can exceed 2s of buffering. For low-latency applications or applications where latency matters you should pass a proper value here.

If PA_STREAM_ADJUST_LATENCY is set, then the tlength/fragsize parameters will be interpreted slightly differently than described above when passed to pa_stream_connect_record() and pa_stream_connect_playback(): the overall latency that is comprised of both the server side playback buffer length, the hardware playback buffer length and additional latencies will be adjusted in a way that it matches tlength resp. fragsize. Set PA_STREAM_ADJUST_LATENCY if you want to control the overall playback latency for your stream. Unset it if you want to control only the latency induced by the server-side, rewritable playback buffer. The server will try to fulfill the client's latency requests as good as possible. However if the underlying hardware cannot change the hardware buffer length or only in a limited range, the actually resulting latency might be different from what the client requested. Thus, for synchronization clients always need to check the actual measured latency via pa_stream_get_latency() or a similar call, and not make any assumptions about the latency available. The function pa_stream_get_buffer_attr() will always return the actual size of the server-side per-stream buffer in tlength/fragsize, regardless whether PA_STREAM_ADJUST_LATENCY is set or not.

The server-side per-stream playback buffers are indexed by a write and a read index. The application writes to the write index and the sound device reads from the read index. The read index is increased monotonically, while the write index may be freely controlled by the application. Subtracting the read index from the write index will give you the current fill level of the buffer. The read/write indexes are 64bit values and measured in bytes, they will never wrap. The current read/write index may be queried using pa_stream_get_timing_info() (see below for more information). In case of a buffer underrun the read index is equal or larger than the write index. Unless the prebuf value is 0, PulseAudio will temporarily pause playback in such a case, and wait until the buffer is filled up to prebuf bytes again. If prebuf is 0, the read index may be larger than the write index, in which case silence is played. If the application writes data to indexes lower than the read index, the data is immediately lost.

Transferring Data

Once the stream is up, data can start flowing between the client and the server. Two different access models can be used to transfer the data:

It is also possible to mix the two models freely.

Once there is data/space available, it can be transferred using either pa_stream_write() for playback, or pa_stream_peek() / pa_stream_drop() for record. Make sure you do not overflow the playback buffers as data will be dropped.

Buffer Control

The transfer buffers can be controlled through a number of operations:

  • pa_stream_cork() - Start or stop the playback or recording.
  • pa_stream_trigger() - Start playback immediately and do not wait for the buffer to fill up to the set trigger level.
  • pa_stream_prebuf() - Reenable the playback trigger level.
  • pa_stream_drain() - Wait for the playback buffer to go empty. Will return a pa_operation object that will indicate when the buffer is completely drained.
  • pa_stream_flush() - Drop all data from the playback or record buffer. Do not wait for it to finish playing.

Seeking in the Playback Buffer

A client application may freely seek in the playback buffer. To accomplish that the pa_stream_write() function takes a seek mode and an offset argument. The seek mode is one of:

  • PA_SEEK_RELATIVE - seek relative to the current write index
  • PA_SEEK_ABSOLUTE - seek relative to the beginning of the playback buffer, (i.e. the first that was ever played in the stream)
  • PA_SEEK_RELATIVE_ON_READ - seek relative to the current read index. Use this to write data to the output buffer that should be played as soon as possible
  • PA_SEEK_RELATIVE_END - seek relative to the last byte ever written.

If an application just wants to append some data to the output buffer, PA_SEEK_RELATIVE and an offset of 0 should be used.

After a call to pa_stream_write() the write index will be left at the position right after the last byte of the written data.

Latency

A major problem with networked audio is the increased latency caused by the network. To remedy this, PulseAudio supports an advanced system of monitoring the current latency.

To get the raw data needed to calculate latencies, call pa_stream_get_timing_info(). This will give you a pa_timing_info structure that contains everything that is known about the server side buffer transport delays and the backend active in the server. (Besides other things it contains the write and read index values mentioned above.)

This structure is updated every time a pa_stream_update_timing_info() operation is executed. (i.e. before the first call to this function the timing information structure is not available!) Since it is a lot of work to keep this structure up-to-date manually, PulseAudio can do that automatically for you: if PA_STREAM_AUTO_TIMING_UPDATE is passed when connecting the stream PulseAudio will automatically update the structure every 100ms and every time a function is called that might invalidate the previously known timing data (such as pa_stream_write() or pa_stream_flush()). Please note however, that there always is a short time window when the data in the timing information structure is out-of-date. PulseAudio tries to mark these situations by setting the write_index_corrupt and read_index_corrupt fields accordingly.

The raw timing data in the pa_timing_info structure is usually hard to deal with. Therefore a simpler interface is available: you can call pa_stream_get_time() or pa_stream_get_latency(). The former will return the current playback time of the hardware since the stream has been started. The latter returns the overall time a sample that you write now takes to be played by the hardware. These two functions base their calculations on the same data that is returned by pa_stream_get_timing_info(). Hence the same rules for keeping the timing data up-to-date apply here. In case the write or read index is corrupted, these two functions will fail with -PA_ERR_NODATA set.

Since updating the timing info structure usually requires a full network round trip and some applications monitor the timing very often PulseAudio offers a timing interpolation system. If PA_STREAM_INTERPOLATE_TIMING is passed when connecting the stream, pa_stream_get_time() and pa_stream_get_latency() will try to interpolate the current playback time/latency by estimating the number of samples that have been played back by the hardware since the last regular timing update. It is especially useful to combine this option with PA_STREAM_AUTO_TIMING_UPDATE, which will enable you to monitor the current playback time/latency very precisely and very frequently without requiring a network round trip every time.

Overflow and underflow

Even with the best precautions, buffers will sometime over - or underflow. To handle this gracefully, the application can be notified when this happens. Callbacks are registered using pa_stream_set_overflow_callback() and pa_stream_set_underflow_callback().

Synchronizing Multiple Playback Streams

PulseAudio allows applications to fully synchronize multiple playback streams that are connected to the same output device. That means the streams will always be played back sample-by-sample synchronously. If stream operations like pa_stream_cork() are issued on one of the synchronized streams, they are simultaneously issued on the others.

To synchronize a stream to another, just pass the "master" stream as last argument to pa_stream_connect_playback(). To make sure that the freshly created stream doesn't start playback right-away, make sure to pass PA_STREAM_START_CORKED and – after all streams have been created – uncork them all with a single call to pa_stream_cork() for the master stream.

To make sure that a particular stream doesn't stop to play when a server side buffer underrun happens on it while the other synchronized streams continue playing and hence deviate, you need to pass a "prebuf" pa_buffer_attr of 0 when connecting it.

Disconnecting

When a stream has served is purpose it must be disconnected with pa_stream_disconnect(). If you only unreference it, then it will live on and eat resources both locally and on the server until you disconnect the context.