The Annodex exchange format for time-continuous bitstreams, Version 3.0Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4180Silvia.Pfeiffer@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Conrad.Parker@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Andre.Pang@csiro.auhttp://www.ict.csiro.au/This specification defines "Annodex", an exchange format for
annotated and indexed time-continuous bitstreams. Annodex provides
a bitstream format for exchanging multitrack interleaved
time-continuous bitstreams and textual meta information attached
to temporal fragments of the binary bitstreams. The meta information
is given in the Continuous Media Markup Language (CMML).
Annodex enables integration of
time-continuous bitstreams into the browsing and searching
functionality of the World Wide Web.
The specification is not encumbered by patents. The Annodex
format is protected by a trade mark to prevent the use of the
term "Annodex" for any related but non-conformant and therefore
non-interoperable technology. Conformant technology is encouraged
to use the term "Annodex" when referring to the exchange format.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in RFC 2119.
When searching the World Wide Web, time-continuous data
such as audio and video files are currently treated as "dark
matter" outside the existing infrastructure of the World Wide Web:
It is not possible to look inside such files, search for their
content through common text-based search engines, or directly
hyperlink to points of interest inside them. The file can
generally only be consumed in its entirety. In addition, such
files are "dead ends" in that by consuming their content the
hyperlinking functionality of the Web is left behind.
Text documents were enabled for the Web through definition of a
markup language (HTML) for text documents
to enable description of the structure of a document, and thus
allow for the separation of content from presentation. This
specification takes the same approach for time-continuous documents.
The markup language for time-continuous documents is called CMML,
short for Continuous Media Markup Language.
It describes the structure of time-continuous documents and allows
for a clean separation of content from presentation.
To turn text documents into a Web resource that can be
exchanged between different applications, HTML markup is added. Such
an exchange format where CMML is merged with the time-continuous
document(s) it describes is also necessary to turn the time-continuous
document(s) into a Web resource and provide a standard exchange
format between applications. This format is called "Annodex" for
annotated and indexed documents and is defined here.
Annodex is using a container format that allows transport
and storage of interleaved time-synchronous bitstreams. In a clean
layering approach as is familiar from Internet protocols the
functionality of the container format and CMML is explicitly separated.
Each layer solves a specific problem without being dependent on layers
that are further up in functionality. The container format of
Annodex is the Ogg encapsulation format version
0. Annodex is an Ogg bitstream containing a "skeleton" and a
CMML logical bitstream, in addition to other temporally interleaved
data bitstreams. Ogg skeleton is a logical bitstream that describes all
the other logical bitstreams contained in the Ogg physical bitstream
(see section 4).It's purpose is to remove codec-specific information
requirements from the multiplexing/demultiplexing process.
Only an Annodex bitstream that contains a CMML bitstream can be
regarded as a Web resource and as part of the Web, because it can be
searched and browsed. An Ogg bitstream without a CMML bitstream is
not an Annodex bitstream, but only an Ogg bitstream with a "skeleton"
logical bitstream, which is still valuable as a multitrack media
format that can be addressed through temporal
hyperlinks, however it is not a first class citizen on the
Web because Web search engines cannot index and crawl it.
The file extension of Annodex files is ".anx". This
document also applies for registration of the MIME type
"application/annodex" for Annodex format bitstreams. In
the meantime, "text/x-annodex" will be used. Further MIME types that
this document applies for are "video/annodex" for
Annodex format (possibly multitrack) video and "audio/annodex"
for Annodex format (possibly multitrack) audio.
Please note that this document assumes that the reader understands
the Ogg encapsulation format
version 0. Also, knowledge of the network protocols HTTP and RTP/RTSP
as well as the extension of URIs to address temporal offsets into Web resources are
a prerequisite to understanding this document. To find out more about
the use of Annodex for creating searchable and surfable Web resources,
refer to the specification of the Continuous Media
Markup Language (CMML Version 2.0).
Annodex contains interleaved
bitstreams of time-related data. It is designed to be used
both as a persistent file format and as a streaming format to
exchange temporally addressable bitstreams. It enables encapsulation
of any type of time-continuous bitstream as long as it is streamable
and is based on a regular data sampling rate (called granulerate).
For variable sampling rate bitstreams, a least common multiple of
the used sampling rates must be known.
Using this container format, Annodex is designed to accommodate any
current or future compression format for time-continuous bitstreams.
The container format that Annodex is based on is designed to
allow several tracks of temporally synchronous time-continuous data.
Each track represents codec data for one type of
time-continuous data stream. Here is an example Annodex bitstream
with data bitstreams D1-D3 (for example, a video track and two audio
tracks) and an annotation track A1 (a CMML bitstream).
__________________________________________________________________
D1 | | | | | | | | | | |
__________________________________________________________________
D2 | | | | | | |
__________________________________________________________________
D3 | | | | | | | | | | | | | | | | | | | |
__________________________________________________________________
A1 | clip 1 | -- | clip 2 | clip 3 |
__________________________________________________________________
The time axis t
|----------------------------------------------------------------->
Bitstreams of time-continuous data are being regarded as a sequence
of data packets that each have a timestamp representing the time at
which the packet data ends. The packets contain all the data required
to cover the interval from the last packet. If it doesn't cover the
full period, it MUST cover the end part of the interval.
Bitstreams that represent data that is to be presented in one
single time instant are called time-instantaneous bitstreams. Their
timestamp represents the time at which the packet's data starts and
ends. The CMML track A1 above is one such bitstream. Its clips represent
time-instantaneous data that is displayed at the given timestamp.
The subsequent data packet replaces the information of the previous
one. To insert a gap in a data bitstream (as in A1 above), a data
packet MUST be inserted which explicitly annulls the data.
Data bitstreams generally contain the following information:
setup information for a codeccontent data
The setup information is inserted at the start of a data
bitstream before any content data.
Distribution of Annodex format bitstreams is performed using a
network protocol such as HTTP or
RTP/RTSP. The basic process is the
following: The client dispatches a download or streaming request
to the server with a certain URI. The server resolves the URI
and starts delivering Annodex format bitstreams, taking into
account potential URI addressed offsets. Currently the distribution
with HTTP is clear and discussed in this document, while the
details of a distribution via RTP/RTSP are not yet examined
and thus unspecified - in particular a RTP payload needs to be
defined for Annodex.
The following figure explains the protocol stack:
________ _________ _________ __________
\
| CMML | | Video | | Audio | | ... | |
________ _________ _________ __________ |
|
| skeleton | > Annodex
_____________________________________________ |
|
| Ogg | |
_____________________________________________ /
| HTTP | RTSP |
| _______________________|
|
| | RTP |
_____________________________________________
| TCP | UDP |
_____________________________________________
| IP |
_____________________________________________
The Annodex format has been designed to accommodate for
reliable and unreliable transport. In case of packet loss
due to an unreliable transport, data may get lost; this may be
important to the application or not and thus may need to be addressed.
All data, including CMML data, is treated with the same importance.
For instantaneous data tracks the loss of one packet implies that
the next packet will restore the proper state. We envisage, however,
that a client may require the current state information, so there
should be a protocol request for re-sending the current state. This
will be delivered by the server by inserting another copy of the
instantaneous data into the Annodex bitstream. For example, clips
within an annotation bitstream can be repeated in the Annodex
bitstream by having the same "track" attribute and the same
page_sequence_number as the previous "clip" element. This handling of
unreliable transport relates mostly to the use of Annodex over
RTP/RTSP and UDP and needs further elaboration.
In short, the Annodex bitstream specific features are:
index clips of Annodex content for retrieval, e.g. with a
Web search engine.crawl Webs of Annodex and other Web resources, e.g.
during an indexing operation of a Web search engine.directly address and retrieve temporal intervals inside the
Annodex bitstream without a need to decode logical bitstreams
aside from skeleton.directly address and retrieve named clips inside the Annodex
bitstream without a need to decode any more than the skeleton
and CMML logical bitstreams.extract, cache, and reuse temporal intervals or named clips
while retaining the annotation and index information.browse through Webs of Annodex and other Web resources in
an integrated manner making time-continuous content first class
citizen on the World Wide Web.For authoring of Annodex bitstream information, the CMML is defined. CMML's "stream" tag has
been designed to author the skeleton bitstream and describe the data
bitstreams to be interleaved into an Ogg bitstream. All other
tags of a CMML file provide for authoring of the CMML bitstream.
Use of a CMML bitstream without skeleton is strongly discouraged as
the time referencing and clip recomposition functionality of
Annodexing will get lost.
An Annodex physical bitstream has the following mandatory order of
Ogg pages:
skeleton bos page.CMML bos page.bos pages of the other logical bitstreams.secondary header pages of all logical bitstreams, including
fisbone.skeleton eos page.data and eos pages of logical bitstreams, excluding skeleton,
multiplexed in a time-synchronous fashion.
Such an Annodex bitstream is identified by the CMML bitstream's magic
number which can be found at Byte position 104 for this version of the
"skeleton" specification. This is calculated through the size of the
skeleton bos page, which is fixed because the skeleton ident header is
of fixed size and the Ogg page encapsulation header is also fixed size.
The Ogg page header has 28 Bytes (including a one Byte segment table
as this page has always less than 255 Bytes packet content), and the
skeleton ident header has 48 Bytes (see further down). Then, the
Byte position amounts to 28+48+28 = 104. The CMML bos page MUST thus also
have less than 255 Bytes packet content, which is a sensible restriction.
The CMML media mapping is defined in the CMML specification. However, for identification
of an Annodex bitstream, the bos page of the CMML logical bitstream
needs to be identifiable, which is provided through the first 12 Bytes
of the CMML ident packet containing the magic numbers and the version
information: Other fields exists and are described in the CMML specification.
Identifier: a 8 Byte field that identifies this file to
be of a CMML logical input bitstream. It
contains the magic numbers:
0x43 'C'0x4d 'M'0x4d 'M'0x4c 'L'0x00 '\0'0x00 '\0'0x00 '\0'0x00 '\0'Version major: 2 Byte unsigned integer signifying
the major version number of the CMML format bitstream.
Version minor: 2 Byte unsigned integer signifying
the minor version number of the CMML format bitstream.
For the rest of the CMML media mapping refer to the specification
of the CMML version that is being used (must be larger than 2.0).
The purpose of Ogg skeleton is to provide codec-specific
knowledge that allows parsing, demultiplexing and remultiplexing of
Ogg bitstreams without having to decode.
While the Ogg encapsulation format by itself is capable of
interleaving an unlimited number of time-continuous bitstreams,
it is not possible to identify the type of bitstreams (e.g. audio
or video) and their encoding format (e.g. Vorbis or Speex or Theora)
without decoding at least the bos page of the logical bitstreams.
Also, further general media type information such as the image
dimensions of a frame in a video bitstream or the language of a speech
bitstream may be provided in skeleton. Another limitation of Ogg
is that each logical bitstream defines its own mapping of
granule_position to time, which is therefore also given in the
skeleton.
This situation is not acceptable for Annodex, because an Annodex
server must be able to return media format information for an Annodex
resource without having to understand the codecs involved. And it
must be able to return temporal subparts of an Annodex resource
without needing to decode.
An addition to the Ogg format is thus necessary, which describes
all the logical bitstreams included in the Ogg stream. This is
defined via a logical bitstream called the "skeleton". For Annodex
bitstreams, use of a skeleton bitstream is mandatory. This section
specifies the content of the "skeleton" logical bitstream and how
it is mapped into Ogg. Knowledge of the Ogg bitstream format as
specified in the Ogg RFC is presumed.
Please also refer to that document for descriptions of the terms
used in this document.
The skeleton bitstream has the ability to generically describe
Ogg bitstreams that consist of one or more time-continuous data
bitstream and one or more time-instantaneous data bitstream
concurrently interleaved (in Ogg terms: multiplexed). It does
not describe sequentially multiplexed Ogg bitstreams, but
rather expects that a sequentially multiplexed bitstream has
its own skeleton logical bitstream.
The skeleton logical bitstream provides the following functionality
on top of Ogg:
allows for the identification of the codec format and the
content type of encapsulated logical bitstreams without the
need to decode that bitstream's headers or data.allows for extraction of a temporal interval of the Ogg
physical bitstream while retaining the original start
time offset of that interval.allows for attachment of a real-world wall-clock time and a
date to the Ogg physical bitstream, thus e.g. retaining
creation date/time or first broadcast date/time.allows for temporal offset operations into an Ogg physical
bitstream without a need to decode any data.allows generally for handling of content without a need to
decode it, such as is necessary in a caching Web proxy.allows for attachment of message header fields given as
name-value pairs that contain some sort of protocol messages
about the logical bitstream, e.g. the screen size for a video
bitstream or the number of channels for an audio bitstream.For authoring of the skeleton bitstream information the CMML can be used. CMML's "stream" tag has
been designed with that purpose in mind. However, it is not mandatory
to use CMML for authoring of skeleton information - that information
may well originate from a different source and be written directly
into the skeleton bitstream. See the CMML Internet-Draft for more
details.
The skeleton logical bitstream starts with an ident header
containing information for the complete Ogg physical bitstream.
The ident header has the following format:
Fields with more than one Byte length are encoded LSB (least
significant Byte) first.
The fields in the skeleton ident header have the following
meaning:
Identifier: a 8 Byte field that identifies this bitstream
as a skeleton. It contains the magic numbers:
0x66 'f'0x69 'i'0x73 's'0x68 'h'0x65 'e'0x61 'a'0x64 'd'0x00 '\0'Version major: 2 Byte unsigned integer
signifying the major version number of the skeleton
bitstream. This document specifies the major version 3.
Version minor: 2 Byte unsigned integer
signifying the minor version number of the skeleton
bitstream. This document specifies the minor version 0.
Presentationtime numerator & denominator: 8 Byte signed
integer each
They represent together the time at which to start
presenting the Ogg physical bitstream given as a rational number.
The denominator represents the temporal resolution at which the
presentationtime is given. E.g. 5 on 1000 results in a
presentationtime of 0.005 sec. This enables a very high temporal
resolution without having to store floating point numbers. In a
newly created physical bitstream presentationtime and basetime are
the same. When remultiplexing a subpart of the stream, this number
MUST be adapted to the requested start time offset of the newly
created stream.
Basetime numerator & denominator: 8 Byte signed integer
each
They represent together the basetime of the
Ogg physical bitstream given as a rational number like the
presentationtime. This number is fixed once the physical bitstream
is created and provides a mapping to time for the beginning of
the physical bitstream when it starts with a granule position of 0.
UTC: a 20 Byte string containing a UTC time in the form
of YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a
wall-clock time with the basetime. It is a sequence of 20 NUL
Bytes if not in use, making this ident packet and thus the
bos page of the skeleton bitstream constant length.
Please note: The possible temporal resolution of the presentation-
and basetime is on the order of 2^-64. For example, the time formats
in use for media that are described in this document range from
1/24 to 1/60 for the different smpte formats. This resolution
is enough for any one of these. It is also expected to accommodate
any future needs of time resolution for any other time format
and time-continuously sampled data.
The skeleton secondary headers are a sequence of packets
that each contain information about one of the time-continuous
or time-instantaneous other logical bitstreams contained
within the Ogg physical bitstream.
A skeleton secondary header packet has the following format:
Fields with more than one Byte length are encoded LSB
(least significant Byte) first.
The fields in a skeleton secondary header packet have the
following meaning:
Identifier: a 8 Byte field that identifies this packet
as a skeleton secondary header for identifying other
logical bitstreams. It contains the magic numbers:
0x66 'f'0x69 'i'0x73 's'0x62 'b'0x6f 'o'0x6e 'n'0x65 'e'0x00 '\0'Offset to message header fields: 4 Byte unsigned integer
that contains the number of Bytes used in this packet before the
message header fields. For the version of the skeleton bitstream
described in this document this number is fixed to 44. This
field accommodates future changes to the skeleton
bitstream allowing to parse message header fields even if
more fields get inserted before them.Serial number: 4 Byte unsigned integer containing the
bitstream_serial_number of the Ogg logical bitstream described
by this skeleton secondary header packet and thus connecting
it to the logical bitstream.Number of header packets: a 4 Byte unsigned integer
that contains the number of header packets of that
particular logical bitstream consisting of the bos page and the
secondary header pages.Granulerate numerator & denominator: 8 Byte signed integer
each
They represent the temporal resolution of the
logical bitstream in Hz given as a rational number in the
same way as the basetime attribute above.Startgranule: 8 Byte signed integer that represents the
granule number with which this logical bitstream starts, which
is originally 0, but will be a positive offset when only a
subpart of the stream is requested.Preroll: 4 Byte unsigned integer that contains the number of
packets to pre-roll in order to decode a current packet
correctly. This is for example the case with Ogg Vorbis,
which requires a pre-roll of 2 packets.Granuleshift: a 1 Byte unsigned integer describing
whether to partition the granule_position into two for that
logical bitstream, and how many of the lower bits to use for
the partitioning. The upper bits then still signify a
time-continuous granule position for a directly decodable
and presentable data granule. The lower bits allow for
specification of a finer resolution such that for example
predicted frames of a video can be addressed as well, though
not decoded without tracing back to the last fully decodable
data granule. This is e.g. the case with Ogg Theora.Padding/future use: 3 Bytes padding data that may be used for
future requirements and are mandated to zero in this revision.Message header fields: header fields, following the generic
Internet Message Format defined in RFC 2822. Each header field consists
of a name followed by a colon (":") and the field value.
Field names are case-insensitive. The field value MAY
be preceded by any amount of LWS, though a single SP is
preferred. Header fields can be extended over multiple lines
by preceding each extra line with at least one SP or HT.There is one mandatory Message header field for all of the
logical bitstreams: the "Content-type" header field. For an
application that is parsing the Annodex bitstream, this field
contains the MIME type and the character encoding of the data in
the logical bitstream. E.g. for the annotation bitstream, this
field will contain the value "Content-type: text/x-cmml; UTF-8"
if the character set used in the CMML bitstream is UTF-8.
E.g. for a bitstream containing Ogg Vorbis data the value is
"Content-type: audio/x-vorbis". The Content-type message header
field MUST come first for all of the Message header fields such that
it can be found at a fixed location in the skeleton
fisbone packet.
As per RFC 2277, message header
fields are considered protocol data, i.e. it is not expected to
have human readable text in there, and they MUST be entirely encoded
in UTF-8. In addition, the mandatory header fields MUST be encoded
in US-ASCII and it is recommended to also use US-ASCII
code points as much as possible for the optional header fields.
User defined optional message header fields MUST follow the naming
standard given in RFC2822.
The media mapping for skeleton into Ogg is as follows:
The skeleton ident (fishead) header is mapped into the skeleton
bos page.The secondary header pages of a skeleton logical bitstream
consist of the fisbone header packets that each describe one
particular logical data bitstream within the Ogg physical
bitstream.There are no content pages or data packets. As the skeleton
eos page is included before the first data page of any logical
bitstream, there actually cannot be any content data packets.The skeleton eos page contains one packet of length zero.When using a skeleton logical bitstream in Ogg, a further
restriction on the order in which Ogg pages appear is introduced
to allow for easier identification:
The skeleton bos page is the very first bos page. This allows its
differentiation from other Ogg bitstreams that don't contain
a skeleton logical bitstream.The bos pages of the other logical bitstreams come next as
is a requirement of the Ogg bitstream format.The secondary header pages of all the logical bitstreams
in the Ogg physical bitstream come next, as is also a
requirement of Ogg. The skeleton secondary header pages
are also included here.Before any data pages of any of the logical bitstreams appear
in the Ogg physical bitstream, the skeleton eos page MUST end
the skeleton logical bitstream. This is necessary to end the
control section of the bitstream. If an Ogg stream parser reaches
the skeleton eos page, it knows that it has received all the bos
and secondary header pages and can start setting up its decoding
or parsing environment.With time-continuous data like Annodex, one needs to handle
data at four different levels:
at the Bytes level, upon seeking.at the packets level, upon encapsulating.at the granules level, upon recomposing.at the time level, upon displaying and addressing.
This section explains how they all fit together.
Annodex bitstreams inherently represent one timeline only, where the
different logical bitstreams can be thought of as content tracks on that
timeline. All of these tracks relate to the same timeline which starts
at a certain time point and ends when the last bitstream ends.
An example bitstream can be seen in the following figure. It consists
of an Annodex bitstream that contains 4 media bitstreams and one CMML
bitstream. The picture is a conceptual representation of the
time intervals covered by the different logical bitstreams and the
Ogg pages used to encapsulate the data. In the flat representation these
are multiplexed such that the data packets of each of these bitstreams
occur at the correct time.
|
----------------------------------------------------------------------
|clip1 | clip 2 |/clip 3///////////////| clip 4 |
----------------------------------------------------------------------
CMML bitstream
----------------------------------------------
| | | | | | | | | | |//| | | | |
----------------------------------------------
audio bitstream 1
-------------------------------------------------------------
| | | |/////| | | | | | |
-------------------------------------------------------------
video bitstream 1
----------------------------------------------------
| | | | |//| | | | | | | | | | | | |
----------------------------------------------------
audio bitstream 2
-------------------------------
| |/////| | | |
-------------------------------
video bitstream 2
]]>The time point at which an Annodex bitstream starts (t_0 in the
above diagram) is called the "basetime" and represents the time in
seconds associated with the granule position of 0 on all logical
bitstreams. Typically, a newly created Annodex file starts all its
logical bitstreams at granule position 0, and a typical extract
of an Annodex bitstream, such as the one starting at t_url in the
image above, starts each of its logical bitstreams at
a different granule positions. These granule positions are stored
in the "startgranule" field of the skeleton secondary header packets.
The "basetime" of an Annodex bitstream may be 0, but it can also
be any positive time. For example, in professional video production,
the first frame of video of a program normally refers to a SMPTE
basetime of 01:00:00:00, not 00:00:00:00 (see also the temporal URI addressing specification).
Associating such a practice to a digital video resource requires
a way to store that basetime with the resource and interpreting it
correctly when addressing offsets such as t_uri. Annodex provides
such a mapping through the basetime field in the skeleton ident header.
Also associated with the basetime is a calendar date and
wall-clock time (a "UTC base") which represent a real-world time
giving some meaningful calendar date association to the content
such as the creation time or the first presentation time.
The UTC base is specified in the UTC field of the skeleton
ident header.
Each one of the encapsulated data bitstreams and the
CMML bitstream have their own temporal resolution at which
they provide data to cover the given timeline. This temporal
resolution is usually given through the sampling rate of the
particular bitstream. For example, a raw audio bitstream at CD
quality is sampled with a sampling rate of 44100 Hz. A video
bitstream may be sampled with a frame rate of 25 frames per
second.
This temporal resolution is called the "granulerate".
A granule is a data element that is based on a regular data rate
specific to the content type, such as the frame rate for video or
the sampling rate for audio.
It even exists for bitstreams that are not sampled at a regular
rate - then it is the highest resolution of any of the used
sampling rates. The granulerate is specified in the skeleton
secondary header packets for each logical bitstream.
Each one of the bitstreams insert data into the Ogg bitstream
through packets which have an associated temporal duration based on
the encoder packaging. Packets are packaged into Ogg pages, which
have a granule position associated with them. Not taking the special
case of a granuleshift into account, the granule position
specifies the number of granules that has been encapsulated since
the implicit start of the original bitstream until and including the
given Ogg page.
The granule position together with the granulerate and granuleshift
information of the skeleton secondary header packets for the particular
logical bitstream are used for the calculation of the time position
for which a data packet of the logical bitstream completes data.
A granule position of -1 indicates a special case and MUST NOT be
used for calculation of a mapping to time.
In principle, the granule position of an Ogg page divided by the
granulerate of this page's logical bitstream provides the time
position that is reached in that bitstream after decoding all data
packets finished on this page. However, the granule_position field
in an Ogg page allows for a more fine-grained description of
the temporal position. The following image explains the composition
of the granule_position field in an Ogg page:
The granuleshift field of the skeleton secondary header packets
describes how many of the granule_position's 64 bits are being used
for the keyoffset. The keyoffset part of the granule_position is
commonly used when the logical bitstream consists of packets that
can only be fully decoded when referring back to a previous packet.
For example, video streams often consist of inter and intra coded
frames, where the intra frames are fully decodable and the inter
frames are intermediate frames that require backtracking to the
last inter frame for accurate decoding. Another example is a
logical bitstream that is mapped as instantaneous information (i.e.
their granuleposition represents the start time and the end time of
the packet data), but actually has a duration associated to it, which
is provided through a subsequent packet. CMML is such an example.
The keyindex part of the granule_position is then used to provide
the temporal position of the reference packet
and the keyoffset part provides a counter for the data in between.
The calculation of the temporal position of an Ogg page in Annodex
is thus specified through the following algorithm:
The basetime provides the time offset used at the beginning of the
logical bitstream for the first data packet and thus MUST be
added for a correct calculation of the temporal position.
As an example regard an audio bitstream that has a granulerate
of 44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0,
and starts at 4 sec. When reaching a granule_position of 88200, this
maps to a time position of 6 seconds:
This signifies that the bitstream has reached the second sec of the
audio bitstream after the end of decoding this page's packets, but
maps to 6 seconds because of the basetime.
As another example consider a video bitstream that has a granulerate
of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because
it encodes - say - 7 partial frames between each fully encoded frame),
and starts at 0 sec. When reaching a granule_position of 997, i.e.
a keyindex of 62 and a keyshift of 5, this maps to a fully decodable
time position of 2.68 seconds:
The granulerate of a time-instantaneous bitstream such as
the CMML bitstream can be chosen arbitrarily by the bitstream
multiplexer. Per default, a granulerate of 1000 is used, which
is the resolution of npt. The resolution of all the time schemes
is given as:
npt: 1000 (milliseconds)smpte-24: 24 (24 fps)smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE)smpte-25: 25smpte-30: 30smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE)smpte-50: 50smpte-60: 60smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE)The granule position of the page finishing data of a
time-instantaneous bitstream packet MUST signify the start
time of that packet. For example, a CMML bitstream with a granulerate
of 1000, a basetime of 0, and a clip that lasts from npt=12.020
till npt=15.0 will get a granule_position of 12020. In contrast, the
granule_position of the page finishing data of e.g. an audio
bitstream with granulerate 44100, basetime 0 and containing
data from npt=12.020 to npt=15.0 will be 661500.
A note about field overflows: an overflow of the granule
position field can destroy the temporal integrity of the Annodex
physical bitstream. In this case, a multiplexer MUST end the Annodex
physical bitstream and restart a new one resetting the counter to 0 and
adjusting the basetime appropriately. This is also called
sequential multiplexing in Ogg. The same measure MUST be taken
in case of an overflow of the page_sequence_number on one of
the logical bitstreams.Addressing into an Annodex bitstream is
possible with the temporal URI
addressing scheme. Time is specified as a temporal offset
from the "beginning" of the stream, making use of the basetime
field. Time offsets can also be specified as calendar dates and
times. The UTC base is then used as a basis for offsetting.
The basetime allows to correctly map a temporal offset point such as
a temporal URI to a Byte position in the stream. In the above figure
take t_uri=npt:14.0 as the temporal offset addressed on a stream with
t_0=npt:5.0 as the basetime - this requires a stream offsetting of only
9 sec to the appropriate granule position in each of the bitstreams,
in the figure marked through patterned pages.
The seeking action is performed on the interleaved bitstream, in
which, the data packets occur in a temporally consecutive order based
on the time at which their data ends. These times are represented in
the granule positions of the Ogg pages, which are only allowed to
monotonically increase within one logical bitstream. This
implies that when having found an Ogg page with a granule position
that maps to a given seek time (i.e. covers the time or ends at it),
the seek has found the right location. This applies over all logical
bitstreams. In the above example, this means that the Byte position of
the first occurring page of the patterned pages has been found.
There is a complication to the seeking: some logical bitstreams have
backwards dependencies in their data packets and these have to be taken
into account for seeking. For example, a logical bitstream may require
several of its previous packets to allow a correct and complete decoding
of the actual packet that occurs at the seektime. This is the case for
Theora which requires to go back to the previous keyframe when decoding
from a time offset. It is also the case for Vorbis which requires the
previous 2 packets for accurate setup of the frequency transform - Speex
needs approximately 2 packets for similar reasons. Even instantaneous
bitstreams such as CMML may require to go back to a previous packet to
recover the last state information - the currently active clip in the
case of CMML.
Therefore, once seeking has located the correct Byte position that
refers to the given temporal offset, it MUST seek back. For logical
bitstreams that have a non-zero "granuleshift" in the skeleton, it MUST
seek back to the Ogg page that has a "keyindex" granule position. For
logical bitstreams that have a non-zero "preroll" in the skeleton, it
MUST seek back that many packets. The earliest Byte position that
satisfies all these requirements is the correct seek position.
A player that presents from an offset MUST take into account that
the bitstream may contain some packets that are only there to allow
accurate decoding of the seek time. When the backwards dependencies
were resolved for a specific logical bitstream, several non-relevant
Ogg pages of may also have ended up in the
intermediate. These have to be skipped by a player. The time that a
player MJST start presenting from is given in the "presentationtime"
in the skeleton ident header.
When a subpart of an Annodex bitstream is requested, such as through
a temporal URI query request from a Web server, the bitstream MUST be
recomposed and a remultiplexed bitstream served out. There are several
aims for performing the remultiplexing with as little effort and
therefore as little delay as possible:
no decoding of the logical bitstreams is performed.
no changes to the pages, in particular to the granule
positions are made.
changes occur only to the control section.
The fields of the skeleton track allow achievement of all these aims.
Remultiplexing is essentially achieved by seeking to the position as
described above and then including from each logical bitstream only the
relevant Ogg pages into the new stream. Changes to fields in the
bitstream are restricted to the control section:
the "presentationtime" MUST be adjusted to the requested start
time
the "startgranule" for each logical bitstream MUST be adjusted to
the granule position at which each logical bitstream starts. This
is not the first granule position of the Ogg pages included into
the bitstream, but rather the last one that did not get included,
as it represents the start time of the bitstream.
Everything else, and in particular the Ogg pages, stay the same. This is
important also to allow caching of such files as is required for Web
proxies and described in temporal URI
addressing.
This section contains the registration information for the
"application/annodex" media type. While this media type is not
approved by the IANA, "application/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type application/annodex
MIME media type name: application
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex is an exchange format for
any type of encoded time-continuously sampled data
stream. The authoring software MUST provide for the encoders,
providing the MIME type (and potentially the charset for
text-based formats) in the "Content-type" Message header field
of each bitstream. The client software can select an
appropriate decoder based on this information.
Security considerations: see next section.
Interoperability considerations: the Annodex bitstream
format is a free specification that is independent of any media
encoding format. It is designed to provide interoperability with
the existing World Wide Web.
Additional information:Magic numbers: "OggS" identifies an Ogg page at Byte position 0,
"fishead\0" identifies a skeleton logical bitstream at Byte
position 28. In the second Ogg page at Byte position 28
the magic number "CMML\0\0\0\0" can be found, identifying this
as an Annodex bitstream.File extension: .anxMacintosh File Type Code: "ANDX"Intended usage: COMMONAs Annodex bitstreams are time-continuous Web resources,
hyperlinking into Annodex bitstreams via URIs is possible
with the temporal URI query and
fragment specification. For the query case, an Annodex
server must supports the "X-Accept-TimeURI" http header field
(see the temporal URI query
specification for more details). The
"X-Accept-Range-Redirect" and "X-Range-Redirect" http header
fields MAY also be supported by an Annodex server and user agent.
As Annodex bitstreams contain CMML logical bitstreams, URI
addressing of clips via their name given in the "id" tag is
also supported. The same mechanisms as specified in the
CMML specification apply to
Annodex analogously. In particular, the id addressing is also
regarded as an alias for a time offset and an Annodex conformant
server that supports Annodex temporal URI addressing MUST also
support named URI addressing (see the CMML
specification for more details).
Examples for valid URI addresses:http://example.com/sample.anx?t=npt:4 , which relates to
an Annodex bitstream composed by the server from sample.anx by
starting it at an offset of 4 seconds.
http://example.com/sample.anx?id=dolphin ---
relates to the clip whose id attribute value is
"dolphin" and all further clips after that.
http://example.com/sample.anx?id="dolphin/" ---
relates only to the clip whose id attribute value is "dolphin".
http://example.com/sample.anx?id="intro/goldfish" ---
realtes to all the clips from the "intro" clip to
the "goldfish" clip.
http://example.com/sample.anx#t=npt:4 --- start using the
Annodex bitstream from a 4 second offset.
http://example.com/sample.anx#dolphin -- use the clip with
id="dolphin" only.
The Annodex and the CMML file that can be extracted
from it are very tightly related to each other: the
CMML file contains all annotation and indexing information
including basetime and UTC time about the Annodex file.
Therefore, receiving the CMML file instead of the Annodex
file is like receiving all information about the bitstreams
in the Annodex file except for the data bitstreams themselves.
This situation can be taken advantage of with the
"Accept" header of HTTP. When an Annodex file is requested
from a HTTP server and the acceptable content types given in
the "Accept" message header field contains "text/x-cmml"
with a higher priority than "application/x-annodex", then
the HTTP server SHOULD return the CMML file instead of the
requested Annodex file itself. As is standard, the HTTP
response will contain a "Content-type" field indicating what
content was actually returned. A Web crawler of a search
engine, e.g., can thus avoid extra network load and retrieve
more easily parsable information. It SHOULD set the "Accept" HTTP
header to "Accept: text/x-cmml" for every requested Annodex
URI. For example:
This section contains the registration information for the
"video/annodex" media type. While this media type is not
approved by the IANA, "video/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type "video/annodex"
MIME media type name: video
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex video is a subclass of
Annodex data where there is at least on video track encpsulated
together with the skeleton and CMML tracks, and a potentially
unlimited number of other audio and video tracks.
Security considerations: as in "application/annodex" MIME application.
Interoperability considerations: as in "application/annodex" MIME
application.
Additional information:Magic numbers: as in "application/annodex" MIME application.File extension: .axvMacintosh File Type Code: "ANXV"Intended usage: COMMONURI addressing and HTTP header field use of "application/annodex"
type content apply analogously to "video/annodex".This section contains the registration information for the
"audio/annodex" media type. While this media type is not
approved by the IANA, "audio/x-annodex" may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type "audio/annodex"
MIME media type name: audio
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: Annodex audio is a subclass of
Annodex data where there is at least on audio track encpsulated
together with the skeleton and CMML tracks, and a potentially
unlimited number of other audio tracks.
Security considerations: as in "application/annodex" MIME
application.
Interoperability considerations: as in "application/annodex" MIME
application.
Additional information:Magic numbers: as in "application/annodex" MIME application.File extension: .axaMacintosh File Type Code: "ANXA"Intended usage: COMMONURI addressing and HTTP header field use of "application/annodex"
type content apply analogously to "audio/annodex".Annodex format bitstreams contain several multiplexed
binary media and one XML annotation bitstream. There is no
generic encryption or signing mechanism provided for the
complete bitstream or anyone of its parts. As the format of the
encapsulated media bitstreams is not prescribed and is
identified through the "Content-type" Message header field in
that bitstream's skeleton secondary header packet,
it is possible to encrypt or sign
that media bitstream and then mark it accordingly with a MIME
type that signifies the encryption. It is up to the applications
that use this bitstream to provide an appropriate codec to
handle such bitstreams.
As Annodex format bitstreams contain binary media
bitstreams, it is possible to include executable content in
them. This can be an issue with applications that decode these
bitstreams, especially when they are used in a network
scenario. Such applications MUST ensure correct handling of
manipulated bitstreams, of buffer overflow and the like.
draft-pfeiffer-annodex-01:
Annodex version 2.0: changes because of renamings of CMML
tags and changes to the temporal and named URI addressing.
draft-pfeiffer-annodex-02:
Annodex version 3.0: The changes pertain to the bitstream
format to allow for a stronger decoupling of Annodex
and CMML. The Annodex format is now using the Ogg format with a
"skeleton" and a "CMML" logical bitstream. This change has
reinforced a layered approach that fits better with existing
practice in Internet protocols, where each layer solves a specific
problem without being dependent on other layers further up.
Key words for use in RFCs to Indicate Requirements LevelsHarvard University29 Oxford StreetCambridgeMA02138US+1 617 495 3864sob@harvard.eduExtensible Markup Language (XML) 1.0World Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+ 1 617 253 2613+ 1 617 258 5999timbl@w3.orghttp://www.w3c.orgHTML 4.01 SpecificationWorld Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+ 1 617 253 2613+ 1 617 258 5999timbl@w3.orghttp://www.w3c.orgXHTML(TM) 1.0 The Extensible Hyper Text Markup LanguageWorld Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+ 1 617 253 2613+ 1 617 258 5999timbl@w3.orghttp://www.w3c.orgUniform Resource Identifiers (URI): Generic
SyntaxWorld Wide Web ConsortiumMassachusetts Institute of Technology77 Massachusetts AvenueCambridgeMA02139US+1 617 253 5702+1 617 258 5999timbl@w3.orgDay Software5251 California Ave., Suite 110IrvineCA92617US+1 949 679 2960+1 949 679 2927fielding@gbiv.comAdobe Systems Incorporated345 Park AveSan JoseCA95110US+1 408 536 3024LMM@acm.orgHypertext Transfer Protocol -- HTTP/1.1University of California, IrvineDepartment of Information and Computer ScienceUniversity of California, IrvineIrvineCA92697-3425US+1 949 824 7403+1 949 824 1715fielding@ics.uci.eduWorld Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+1 617 258 8682jg@w3.orgWestern Research LaboratoryCompaq Computer Corporation250 University AvenuePalo AltoCA94305USmogul@wrl.dec.comWorld Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+1 617 258 8682frystyk@w3.orgXerox Corporation3333 Coyote Hill RoadPalo AltoCA94034US+1 650 812 4365+1 650 812 4333masinter@parc.xerox.comMicrosoft Corporation1 Microsoft WayRedmondWA98052USpaulle@microsoft.comWorld Wide Web ConsortiumMIT Laboratory for Computer Science545 Technology SquareCambridgeMA02139US+1 617 253 5702+1 617 258 8682timbl@w3.orgInternet Message FormatQUALCOMM Incorporated5775 Morehouse DriveSan DiegoCA92121-1714USA+1 858 651 4478+1 858 651 1102presnick@qualcomm.comIETF Policy on Character Sets and LanguagesUNINETTP.O.Box 6883 ElgeseterTrondheim7002Norway+47 73 59 70 94Harald.T.Alvestrand@uninett.noReal Time Streaming Protocol (RTSP)Columbia UniversityDept. of Computer Science1214 Amsterdam AvenueNew YorkNY10027USschulzrinne@cs.columbia.eduNetscape Communications Corp.501 E. Middlefield RoadMountain ViewCA94043USanup@netscape.comRealNetworks1111 Third Avenue Suite 2900SeattleWA98101USrobla@real.comTags for the Identification of LanguagesUNINETTPb. 6883 ElgeseterTrondheim7002NorwayHarald.T.Alvestrand@uninett.noMultipurpose Internet Mail Extensions (MIME) Part Two: Media TypesInnosoft International, Inc.1050 East Garvey Avenue SouthWest CovinaCA91790USAned@innosoft.comFirst Virtual Holdings25 Washington AvenueMorristownNJ07960USAnsb@nsb.fv.comXML Media TypesUniversity of California, IrvineDepartment of Information and Computer ScienceIrvineCA92697-3425USAejw@ics.uci.eduFuji Xerox Information SystemsKSP 9A7, 2-1, Sakado 3-chome, Takatsu-kuKawasaki-shiKanagawa-ken213Japanmurata@fxis.fujixerox.co.jpThe Ogg encapsulation format version 0Commonwealth Scientific and Industrial Research OrganisationLocked Bag 17North RydeNSW2113Australia+ 61 2 9325 3100+ 61 2 9325 3200Silvia.Pfeiffer@csiro.auhttp://www.annodex.net/SMPTE STANDARD for Television, Audio and Film - Time and Control Code The Society of Motion Picture and Television Engineers595 W. Hartsdale Ave.White PlainsNY10607USAsmpte@smpte.orgData elements and interchange formats -- Information interchange -- Representation of dates and times International Organization for Standardization1 rue de VarembreCase Postale 56Geneva201211CHcentral@iso.orgSpecifying time intervals in URI queries and fragments of time-based Web resources (work in progress)Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4180Silvia.Pfeiffer@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Conrad.Parker@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Andre.Pang@csiro.auhttp://www.ict.csiro.au/The Continuous Media Markup Language (CMML), Version 2.0 (work in progress)Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4180Silvia.Pfeiffer@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Conrad.Parker@csiro.auhttp://www.ict.csiro.au/Commonwealth Scientific and
Industrial Research Organisation CSIRO,
AustraliaPO Box 76EppingNSW1710Australia+61 2 9372 4222Andre.Pang@csiro.auhttp://www.ict.csiro.au/any sequence of
binary data that represents an analog-time signal sampled in
discrete time steps. In contrast to actual discrete-time signals
as known from signal processing, time-continuously sampled data
may also come in compressed form, such that a block of numbers
represents an interval of time.a time-continuously
sampled data stream where the components provide information for
a specific time-instant.a time-continuously
sampled data stream where the components provide ongoing
information as time goes by.a temporal section of a time-continuous
data stream.a free-text, unstructured description
of a clip.a name-value pair that provides a
structured, database-like description of the content.a Unified Resource Identifier (URI).collection of information
about a data stream, which may include annotations, hyperlinks,
and metadata.a subpart of a media document
covering some temporal interval.XML tags and their content used to
describe a media document.encapsulated
time-continuous bitstream with head and clip elements.the task of giving textual
descriptions to fragments of media documents.the task of identifying index points
for media documents or fragments thereof.the task of linking from one Web
resource to another. If a link has an offset into the
resource, this is sometimes called deep hyperlinking.CMML data containing information on
an Annodexed media file.a block of digital data that
represents a temporal subpart of a stream of continuous
media. Media packets of one continuous media file do not
overlap in time.a sequence of time-continuous data.Continuous Media Markup Language.Document Type Declaration.eXtensible Markup Language.Continuous Media Web.World Wide Web.Unified Resource Identifier.The authors greatly acknowledge the contributions of Rob Collins,
Zentaro Kavanagh, Andrew Nesbit and Simon Lai in developing this
specification.