2.6 KiB

Audio, Timing and Metadata Packet Stream

The current problems

There are no good tagging metadata formats for live content without re-starting the file decoding (for example, Ogg). This causes several issues.

ICY metadata exists, but this has several problems. It works on regular byte intervals, where most audio formats / containers work on packets. Additionally, the data you can add is severely limited, although you could send patches for players to support more fields.

For timing, there is no option other than decoding the audio or being aware of the container format. This can also have issues given the time the stream starts can be different for long-running streams.

Partially decoding such streams is also hard, without being aware of the container format. For example, technically you should be able to decode any two seconds, given you have header + related packets available.

The solution

The solution presented here is to split the stream on distinct packets and tag them with multiple fields:

  • Frame type unsigned varint: See next section
  • Category signed varint: This can be used to differentiate across same types or groups.
  • Start Sample Number signed varint: When the content applies from, counting in audio samples from the start of the stream (absolute, not relative to the player start).
  • Duration In Samples signed varint: Duration of content in samples
  • Data Length unsigned varint
  • Data bytes

Given we are sending packets with tagged types and classes, we can support these frame types:

  • Header: Simple metadata about the stream. Payload contains little-endian int64 Channels, int64 SampleRate, int32 len(MimeType), bytes MimeType
  • Audio Data: raw header / audio frame or packet. Several subtypes exist:
    • KeepLast: Keep the last of these frames per-category.
    • Keep: Keep the all of these frames.
    • GroupKeep: Keep frames in this group per-category.
    • GroupDiscard: Discard last frames kept on the group per-category.
    • Discard: This frame can be discarded after usage.
  • Identifier: Identifier for the new track. The payload, in MeteorLight, contains the queue_id as a signed varint.
  • Metadata: Metadata for the track. The payload contains a JSON-encoded TrackEntry.Metadata entry.

Even client code that is not aware of audio playback can receive and parse these frames easily.

Specify X-Audio-Packet-Stream: 1 on the HTTP request headers. You will receive a response with X-Audio-Packet-Stream: 1 header set and Content-Type: application/x-audio-packet-stream, with the body being continous frames.