Network Working Group S. Pfeiffer Internet-Draft C. Parker Intended status: Informational Annodex Expires: May 4, 2008 November 2007 The "skeleton" meta information track for Ogg draft-pfeiffer-oggskeleton-00 Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 4, 2008. Pfeiffer & Parker Expires May 4, 2008 [Page 1] Internet-Draft SKELETON November 2007 Abstract This specification defines "Skeleton", a logical bitstream for the Ogg encapsulation format version 0 [Ogg]. Skeleton is a header-style bitstream that describes the content of the other logical bitstreams encapsulated inside an Ogg container. Its purpose is to remove codec-specific information requirements from the multiplexing/ demultiplexing process. It provides default structure and semantic information to describe multitrack physical Ogg bitstreams. There is also a mechanism through which more information than the default can be provided. Please note that this document assumes that the reader understands the Ogg encapsulation format version 0 [Ogg]. The specification of Skeleton is not encumbered by patents. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [rfc2119]. Pfeiffer & Parker Expires May 4, 2008 [Page 2] Internet-Draft SKELETON November 2007 Table of Contents 1. Features of Ogg and Skeleton . . . . . . . . . . . . . . . . . 4 2. The Ogg skeleton logical bitstream . . . . . . . . . . . . . . 5 2.1. The format of the skeleton ident header . . . . . . . . . 6 2.2. The format of the skeleton secondary headers . . . . . . . 8 2.3. Media mapping of skeleton into Ogg . . . . . . . . . . . . 11 3. Handling time in an Ogg format bitstream . . . . . . . . . . . 13 3.1. Conceptual overview . . . . . . . . . . . . . . . . . . . 13 3.2. Mapping a granule position to a time position . . . . . . 15 3.3. Seeking into the bitstream . . . . . . . . . . . . . . . . 17 3.4. Remultiplexing an Ogg bitstream using Skeleton . . . . . . 19 4. Security considerations . . . . . . . . . . . . . . . . . . . 20 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 Intellectual Property and Copyright Statements . . . . . . . . . . 24 Pfeiffer & Parker Expires May 4, 2008 [Page 3] Internet-Draft SKELETON November 2007 1. Features of Ogg and Skeleton Ogg is a container format for encapsulation of several tracks of temporally interleaved bitstreams of time-continuous data. It enables encapsulation of any type of time-continuous data stream as long as it is streamable. Each track represents codec data for only one type of time-continuous data stream. Ogg is designed to be used both as a persistent file format and as a streaming format to exchange temporally addressable bitstreams. Skeleton adds to Ogg a means to describe the codec tracks contained inside Ogg. It assumes reasonably that for each logical bitstream there is a regular data sampling rate (called granulerate). For variable sampling rate bitstreams, it assumes there is a common multiple of the used sampling rates that is used as granulerate. Codec tracks generally contain the following information: o setup information for a codec o content data The setup information is inserted at the start of a data bitstream before any content data. Skeleton pulls out the key information about the codecs from their headers and puts them into a defined location in a defined manner, such that no decoding of logical bitstreams is required to find out about the tracks of content encapsulated inside Ogg. An Ogg physical bitstream with a Skeleton track has the following mandatory order of Ogg pages: 1. skeleton bos page. 2. bos pages of the other logical bitstreams. 3. secondary header pages of all logical bitstreams, including fisbone. 4. skeleton eos page. 5. data and eos pages of logical bitstreams, excluding skeleton, multiplexed in a time-synchronous fashion. Pfeiffer & Parker Expires May 4, 2008 [Page 4] Internet-Draft SKELETON November 2007 2. The Ogg skeleton logical bitstream The purpose of Ogg skeleton is to provide codec-specific knowledge that allows parsing, demultiplexing and remultiplexing of Ogg bitstreams without having to decode. While the Ogg encapsulation format by itself is capable of interleaving an unlimited number of time-continuous bitstreams, it is not possible to identify the type of bitstreams (e.g. audio or video) and their encoding format (e.g. Vorbis or Speex or Theora) without decoding at least the bos page of the logical bitstreams. Also, further general media type information such as the image dimensions of a frame in a video bitstream or the language of a speech bitstream may be provided in skeleton. Another limitation of Ogg is that each logical bitstream defines its own mapping of granule_position to time, which is therefore also given in the skeleton. This section specifies the content of the "skeleton" logical bitstream and how it is mapped into Ogg. Knowledge of the Ogg bitstream format as specified in the Ogg RFC [Ogg] is presumed. Please also refer to that document for descriptions of the terms used in this document. The skeleton bitstream has the ability to generically describe Ogg bitstreams that consist of one or more time-continuous data bitstream and one or more time-instantaneous data bitstream concurrently interleaved (in Ogg terms: multiplexed). It does not describe sequentially multiplexed Ogg bitstreams, but rather expects that a sequentially multiplexed bitstream has its own skeleton logical bitstream. The skeleton logical bitstream provides the following functionality on top of Ogg: o allows for the identification of the codec format and the content type of encapsulated logical bitstreams without the need to decode that bitstream's headers or data. o allows for extraction of a temporal interval of the Ogg physical bitstream while retaining the original start time offset of that interval. o allows for attachment of a real-world wall-clock time and a date to the Ogg physical bitstream, thus e.g. retaining creation date/ time or first broadcast date/time. o allows for temporal offset operations into an Ogg physical bitstream without a need to decode any data. Pfeiffer & Parker Expires May 4, 2008 [Page 5] Internet-Draft SKELETON November 2007 o allows generally for handling of content without a need to decode it, such as is necessary in a caching Web proxy. o allows for attachment of message header fields given as name-value pairs that contain some sort of protocol messages about the logical bitstream, e.g. the screen size for a video bitstream or the number of channels for an audio bitstream. 2.1. The format of the skeleton ident header The skeleton logical bitstream starts with an ident header containing information for the complete Ogg physical bitstream. The ident header has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'fishead\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version major | Version minor | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Presentationtime numerator | 12-15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 16-19 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Presentationtime denominator | 20-23 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 24-27 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Basetime numerator | 28-31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 32-35 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Basetime denominator | 36-39 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 40-43 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UTC | 44-47 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 48-51 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 52-55 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 56-59 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 60-63 Pfeiffer & Parker Expires May 4, 2008 [Page 6] Internet-Draft SKELETON November 2007 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields with more than one Byte length are encoded LSB (least significant Byte) first. The fields in the skeleton ident header have the following meaning: 1. Identifier: a 8 Byte field that identifies this bitstream as a skeleton. It contains the magic numbers: 0x66 'f' 0x69 'i' 0x73 's' 0x68 'h' 0x65 'e' 0x61 'a' 0x64 'd' 0x00 '\0' 2. Version major: 2 Byte unsigned integer signifying the major version number of the skeleton bitstream. This document specifies the major version 3. 3. Version minor: 2 Byte unsigned integer signifying the minor version number of the skeleton bitstream. This document specifies the minor version 0. 4. Presentationtime numerator & denominator: 8 Byte signed integer each. They represent together the time at which to start presenting the Ogg physical bitstream given as a rational number. The denominator represents the temporal resolution at which the presentationtime is given. E.g. 5 on 1000 results in a presentationtime of 0.005 sec. This enables a very high temporal resolution without having to store floating point numbers. In a newly created physical bitstream presentationtime and basetime are the same. When remultiplexing a subpart of the stream, this number MUST be adapted to the requested start time offset of the newly created stream. Presentationtime must always be larger or equal to zero. Pfeiffer & Parker Expires May 4, 2008 [Page 7] Internet-Draft SKELETON November 2007 5. Basetime numerator & denominator: 8 Byte signed integer each. They represent together the basetime of the Ogg physical bitstream given as a rational number like the presentationtime. This number is fixed once the physical bitstream is created and provides a mapping to time for the beginning of the physical bitstream when it starts with a granule position of 0. 6. UTC [ISO8601]: a 20 Byte string containing a UTC time in the form of YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a wall-clock time with the basetime. It is a sequence of 20 NUL Bytes if not in use, making this ident packet and thus the bos page of the skeleton bitstream constant length. Please note: The possible temporal resolution of the presentation- and basetime is on the order of 2^-64. For example, the time formats in use for media that are described in this document range from 1/24 to 1/60 for the different smpte formats [SMPTE]. This resolution is enough for any one of these. It is also expected to accommodate any future needs of time resolution for any other time format and time- continuously sampled data. Please note further: A denominator of 0 in either presentationtime or basetime is regarded as a special value and sets the respective time to 0, no matter what the value of the numerator. 2.2. The format of the skeleton secondary headers The skeleton secondary headers are a sequence of packets that each contain information about one of the time-continuous or time- instantaneous other logical bitstreams contained within the Ogg physical bitstream. A skeleton secondary header packet has the following format: Pfeiffer & Parker Expires May 4, 2008 [Page 8] Internet-Draft SKELETON November 2007 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identifier 'fisbone\0' | 0-3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 4-7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Offset to message header fields | 8-11 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Serial number | 12-15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of header packets | 16-19 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate numerator | 20-23 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 24-27 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granulerate denominator | 28-31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 32-35 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Startgranule | 36-39 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | 40-43 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Preroll | 44-47 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Granuleshift | Padding/future use | 48-51 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message header fields ... | 52- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields with more than one Byte length are encoded LSB (least significant Byte) first. The fields in a skeleton secondary header packet have the following meaning: 1. Identifier: a 8 Byte field that identifies this packet as a skeleton secondary header for identifying other logical bitstreams. It contains the magic numbers: 0x66 'f' Pfeiffer & Parker Expires May 4, 2008 [Page 9] Internet-Draft SKELETON November 2007 0x69 'i' 0x73 's' 0x62 'b' 0x6f 'o' 0x6e 'n' 0x65 'e' 0x00 '\0' 2. Offset to message header fields: 4 Byte unsigned integer that contains the number of Bytes used in this packet before the message header fields. For the version of the skeleton bitstream described in this document this number is fixed to 44. This field accommodates future changes to the skeleton bitstream allowing to parse message header fields even if more fields get inserted before them. 3. Serial number: 4 Byte unsigned integer containing the bitstream_serial_number of the Ogg logical bitstream described by this skeleton secondary header packet and thus connecting it to the logical bitstream. 4. Number of header packets: a 4 Byte unsigned integer that contains the number of header packets of that particular logical bitstream consisting of the bos page and the secondary header pages. 5. Granulerate numerator & denominator: 8 Byte signed integer each. They represent the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the basetime attribute above. 6. Startgranule: 8 Byte signed integer that represents the granule number with which this logical bitstream starts, which is originally 0, but will be a positive offset when only a subpart of the stream is requested. 7. Preroll: 4 Byte unsigned integer that contains the number of packets to pre-roll in order to decode a current packet correctly. This is for example the case with Ogg Vorbis, which requires a pre-roll of 2 packets. Pfeiffer & Parker Expires May 4, 2008 [Page 10] Internet-Draft SKELETON November 2007 8. Granuleshift: a 1 Byte unsigned integer describing whether to partition the granule_position into two for that logical bitstream, and how many of the lower bits to use for the partitioning. The upper bits signify a time-continuous granule position for an independently decodable and presentable data granule. The lower bits are generally used to specify the relative offset of dependent packets, such as predicted frames of a video. Hence these can be addressed, though not decoded without tracing back to the last fully decodable data granule. This is the case with Ogg Theora; the general procedure is given in section 3.2. 9. Padding/future use: 3 Bytes padding data that may be used for future requirements and are mandated to zero in this revision. 10. Message header fields: header fields, following the generic Internet Message Format defined in RFC 2822 [Headers]. Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive. The field value MAY be preceded by any amount of LWS, though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT. There is one mandatory Message header field for all of the logical bitstreams: the "Content-type" header field. For an application that is parsing the Ogg bitstream, this field contains the MIME type and the character encoding of the data in the logical bitstream. E.g. for a bitstream containing Ogg Vorbis data the value is "Content- type: audio/x-vorbis". The Content-type message header field MUST come first for all of the Message header fields such that it can be found at a fixed location in the skeleton fisbone packet. As per RFC 2277 [I18N], message header fields are considered protocol data, i.e. it is not expected to have human readable text in there, and they MUST be entirely encoded in UTF-8. In addition, the mandatory header fields MUST be encoded in US-ASCII and it is recommended to also use US-ASCII code points as much as possible for the optional header fields. User defined optional message header fields MUST follow the naming standard given in RFC2822. 2.3. Media mapping of skeleton into Ogg The media mapping for skeleton into Ogg is as follows: o The skeleton ident (fishead) header is mapped into the skeleton bos page. Pfeiffer & Parker Expires May 4, 2008 [Page 11] Internet-Draft SKELETON November 2007 o The secondary header pages of a skeleton logical bitstream consist of the fisbone header packets that each describe one particular logical data bitstream within the Ogg physical bitstream. o There are no content pages or data packets. As the skeleton eos page is included before the first data page of any logical bitstream, there actually cannot be any content data packets. o The skeleton eos page MUST contain one packet of length zero. When using a skeleton logical bitstream in Ogg, a further restriction on the order in which Ogg pages appear is introduced to allow for easier identification: 1. The skeleton bos page is the very first bos page. This allows its differentiation from other Ogg bitstreams that don't contain a skeleton logical bitstream. 2. The bos pages of the other logical bitstreams come next as is a requirement of the Ogg bitstream format. 3. The secondary header pages of all the logical bitstreams in the Ogg physical bitstream come next, as is also a requirement of Ogg. The skeleton secondary header pages are also included here. 4. Before any data pages of any of the logical bitstreams appear in the Ogg physical bitstream, the skeleton eos page MUST end the skeleton logical bitstream. This is necessary to end the control section of the bitstream. If an Ogg stream parser reaches the skeleton eos page, it knows that it has received all the bos and secondary header pages and can start setting up its decoding or parsing environment. Pfeiffer & Parker Expires May 4, 2008 [Page 12] Internet-Draft SKELETON November 2007 3. Handling time in an Ogg format bitstream With time-continuous data inside Ogg, one needs to handle data at four different levels: o at the Bytes level, upon seeking. o at the packets level, upon encapsulating. o at the granules level, upon recomposing. o at the time level, upon displaying and addressing. This section explains how they all fit together. 3.1. Conceptual overview Ogg bitstreams inherently represent one timeline only, where the different logical bitstreams can be thought of as content tracks on that timeline. All of these tracks relate to the same timeline which starts at a certain time point and ends when the last bitstream ends. An example bitstream can be seen in the following figure. It consists of an Ogg bitstream that contains 4 media bitstreams. The picture is a conceptual representation of the time intervals covered by the different logical bitstreams and the Ogg pages used to encapsulate the data. In the flat representation these are multiplexed such that the data packets of each of these bitstreams occur at the correct time. Pfeiffer & Parker Expires May 4, 2008 [Page 13] Internet-Draft SKELETON November 2007 t_url | t_0 v t_n |------------------------------------------------------------------->| ---------------------------------------------- | | | | | | | | | | |//| | | | | ---------------------------------------------- audio bitstream 1 ------------------------------------------------------------- | | | |/////| | | | | | | ------------------------------------------------------------- video bitstream 1 ---------------------------------------------------- | | | | |//| | | | | | | | | | | | | ---------------------------------------------------- audio bitstream 2 ------------------------------- | |/////| | | | ------------------------------- video bitstream 2 The time point at which an Ogg bitstream starts (t_0 in the above diagram) is called the "basetime" and represents the time in seconds associated with the granule position of 0 on all logical bitstreams. Typically, a newly created Ogg file starts all its logical bitstreams at granule position 0, and a typical extract of an Ogg bitstream, such as the one starting at t_url in the image above, starts each of its logical bitstreams at a different granule positions. These granule positions are stored in the "startgranule" field of the skeleton secondary header packets. The "basetime" of an Ogg bitstream may be 0, but it can also be any positive time. For example, in professional video production, the first frame of video of a program normally refers to a SMPTE basetime [SMPTE] of 01:00:00:00, not 00:00:00:00 (see also the temporal URI addressing [timedURI] specification). Associating such a practice to a digital video resource requires a way to store that basetime with the resource and interpreting it correctly when addressing offsets such as t_uri. Skeleton provides such a mapping through the basetime field in the skeleton ident header. Also associated with the basetime is a calendar date [ISO8601] and wall-clock time (a "UTC base") which represent a real-world time giving some meaningful calendar date association to the content such as the creation time or the first presentation time. The UTC base is specified in the UTC field of the skeleton ident header. Pfeiffer & Parker Expires May 4, 2008 [Page 14] Internet-Draft SKELETON November 2007 3.2. Mapping a granule position to a time position Each one of the encapsulated data bitstreams have their own temporal resolution at which they provide data to cover the given timeline. This temporal resolution is usually given through the sampling rate of the particular bitstream. For example, a raw audio bitstream at CD quality is sampled with a sampling rate of 44100 Hz. A video bitstream may be sampled with a frame rate of 25 frames per second. This temporal resolution is called the "granulerate". A granule is a data element that is based on a regular data rate specific to the content type, such as the frame rate for video or the sampling rate for audio. It even exists for bitstreams that are not sampled at a regular rate - then it is the highest resolution of any of the used sampling rates. The granulerate is specified in the skeleton secondary header packets for each logical bitstream. Each one of the bitstreams insert data into the Ogg bitstream through packets which have an associated temporal duration based on the encoder packaging. Packets are packaged into Ogg pages, which have a granule position associated with them. Not taking the special case of a granuleshift into account, the granule position specifies the number of granules that has been encapsulated since the implicit start of the original bitstream until and including the given Ogg page. The granule position together with the granulerate and granuleshift information of the skeleton secondary header packets for the particular logical bitstream are used for the calculation of the time position for which a data packet of the logical bitstream completes data. A granule position of -1 indicates a special case and MUST NOT be used for calculation of a mapping to time. In principle, the granule position of an Ogg page divided by the granulerate of this page's logical bitstream provides the time position that is reached in that bitstream after decoding all data packets finished on this page. However, the granule_position field in an Ogg page allows for a more finely-grained description of the temporal position. The following image explains the composition of the granule_position field in an Ogg page: granule_position ------------------------------------------------ | keyindex | keyoffset | ------------------------------------------------ The granuleshift field of the skeleton secondary header packets describes how many of the granule_position's 64 bits are being used Pfeiffer & Parker Expires May 4, 2008 [Page 15] Internet-Draft SKELETON November 2007 for the keyoffset. The keyoffset part of the granule_position is commonly used when the logical bitstream consists of packets that can only be fully decoded when referring back to a previous packet. For example, video streams often consist of inter and intra coded frames, where the intra frames are fully decodable and the inter frames are intermediate frames that require backtracking to the last inter frame for accurate decoding. Another example is a logical bitstream that is mapped as instantaneous information (i.e. their granuleposition represents the start time and the end time of the packet data), but actually has a duration associated to it, which is provided through a subsequent packet. CMML is such an example. The keyindex part of the granule_position is then used to provide the temporal position of the reference packet and the keyoffset part provides a counter for the data in between. The calculation of the temporal position of an Ogg page using Skeleton is thus specified through the following formula: t_page = basetime + ((keyindex + keyoffset) / granulerate) The basetime provides the time offset used at the beginning of the logical bitstream for the first data packet and thus MUST be added for a correct calculation of the temporal position. As an example regard an audio bitstream that has a granulerate of 44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0, and starts at 4 sec. When reaching a granule_position of 88200, this maps to a time position of 6 seconds: t_page = 4 + ((88200 + 0) / 44100) = 6 This signifies that the bitstream has reached the second sec of the audio bitstream after the end of decoding this page's packets, but maps to 6 seconds because of the basetime. As another example consider a video bitstream that has a granulerate of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because it encodes - say - 7 partial frames between each fully encoded frame), and starts at 0 sec. When reaching a granule_position of 997, i.e. a keyindex of 62 and a keyshift of 5, this maps to a fully decodable time position of 2.68 seconds: t_page = 0 + ((62 + 5) / 25) = 2.68 sec The granulerate of a time-instantaneous bitstream such as a CMML bitstream can be chosen arbitrarily by the bitstream multiplexer. Per default, a granulerate of 1000 is used, which is the resolution of npt. The resolution of all the time schemes is given as: Pfeiffer & Parker Expires May 4, 2008 [Page 16] Internet-Draft SKELETON November 2007 o npt: 1000 (milliseconds) o smpte-24: 24 (24 fps) o smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE) o smpte-25: 25 o smpte-30: 30 o smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE) o smpte-50: 50 o smpte-60: 60 o smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE) The granule position of the page finishing data of a time- instantaneous bitstream packet MUST signify the start time of that packet. For example, a CMML bitstream with a granulerate of 1000, a basetime of 0, and a clip that lasts from npt=12.020 till npt=15.0 will get a granule_position of 12020. In contrast, the granule_position of the page finishing data of e.g. an audio bitstream with granulerate 44100, basetime 0 and containing data from npt=12.020 to npt=15.0 will be 661500. A note about field overflows: an overflow of the granule position field can destroy the temporal integrity of the Ogg physical bitstream. In this case, a multiplexer MUST end the Ogg physical bitstream and restart a new one resetting the counter to 0 and adjusting the basetime appropriately. This is also called sequential multiplexing in Ogg. The same measure MUST be taken in case of an overflow of the page_sequence_number on one of the logical bitstreams. 3.3. Seeking into the bitstream Seeking to a time offset inside an Ogg logical bitstream is a fundamental activity frequently performed on media data. Time inside an Ogg with a Skeleton track is specified as a temporal offset from the "beginning" of the stream, making use of the basetime field. Time offsets can also be specified as calendar dates and times. The UTC base is then used as a basis for offsetting. The basetime allows to correctly map a temporal offset point such as a temporal URI to a Byte position in the stream. In the above figure take t_uri=npt:14.0 as the temporal offset addressed on a stream with Pfeiffer & Parker Expires May 4, 2008 [Page 17] Internet-Draft SKELETON November 2007 t_0=npt:5.0 as the basetime - this requires a stream offsetting of only 9 sec to the appropriate granule position in each of the bitstreams, in the figure marked through patterned pages. The seeking action is performed on the interleaved bitstream, in which the data packets occur in a temporally consecutive order based on the time at which their data ends. These times are represented in the granule positions of the Ogg pages, which are only allowed to monotonically increase within one logical bitstream. This implies that when having found an Ogg page with a granule position that maps to a given seek time (i.e. covers the time or ends at it), the seek has found the right location. This applies over all logical bitstreams. In the above example, this means that the Byte position of the first occurring page of the patterned pages has been found. There is a complication to the seeking: some logical bitstreams have backwards dependencies in their data packets and these have to be taken into account for seeking. For example, a logical bitstream may require several of its previous packets to allow a correct and complete decoding of the actual packet that occurs at the seektime. This is the case for Theora which requires to go back to the previous keyframe when decoding from a time offset. It is also the case for Vorbis which requires the previous 2 packets for accurate setup of the frequency transform - Speex needs approximately 2 packets for similar reasons. Even instantaneous bitstreams such as CMML may require to go back to a previous packet to recover the last state information - the currently active clip in the case of CMML. Therefore, once seeking has located the correct Byte position that refers to the given temporal offset, it MUST seek back. For logical bitstreams that have a non-zero "granuleshift" in the skeleton, it MUST seek back to the Ogg page that has a "keyindex" granule position. For logical bitstreams that have a non-zero "preroll" in the skeleton, it MUST seek back that many packets. The earliest Byte position that satisfies all these requirements is the correct seek position. A player that presents from an offset MUST take into account that the bitstream may contain some packets that are only there to allow accurate decoding of the seek time. When the backwards dependencies were resolved for a specific logical bitstream, several non-relevant Ogg pages of may also have ended up in the intermediate. These have to be skipped by a player. The time that a player MUST start presenting from is given in the "presentationtime" in the skeleton ident header. Pfeiffer & Parker Expires May 4, 2008 [Page 18] Internet-Draft SKELETON November 2007 3.4. Remultiplexing an Ogg bitstream using Skeleton Ogg with a Skeleton track allows for the creation of mashups of a file without actual decoding and re-encoding. A mashup in the sense used here is when a subpart of a Ogg physical bitstream is required, such as a temporal sub-interval from the whole file. Skeleton allows the creation of the mashup bitstream through recomposition and remultiplexing. There are several aims for performing the remultiplexing with as little effort and therefore as little delay as possible: o no decoding of the logical bitstreams is performed. o no changes to the pages, in particular to the granule positions are made. o changes occur only to the control section. The fields of the skeleton track allow achievement of all these aims. Remultiplexing is essentially achieved by seeking to the position as described above and then including from each logical bitstream only the relevant Ogg pages into the new stream. Changes to fields in the bitstream are restricted to the control section: o the "presentationtime" MUST be adjusted to the requested start time o the "startgranule" for each logical bitstream MUST be adjusted to the granule position at which each logical bitstream starts. This is not the first granule position of the Ogg pages included into the bitstream, but rather the last one that did not get included, as it represents the start time of the bitstream. Everything else, and in particular the Ogg pages, stay the same. This is important also to allow caching of such files as is required for Web proxies and described in temporal URI addressing [timedURI]. Pfeiffer & Parker Expires May 4, 2008 [Page 19] Internet-Draft SKELETON November 2007 4. Security considerations Ogg format bitstreams contain several multiplexed binary and non- binary data bitstream. There is no generic encryption or signing mechanism provided for the complete bitstream or anyone of its parts. As the format of the encapsulated media bitstreams is not prescribed and is identified through the "Content-type" Message header field in that bitstream's skeleton secondary header packet, it is possible to encrypt or sign that media bitstream and then mark it accordingly with a MIME type that signifies the encryption. It is up to the applications that use this bitstream to provide an appropriate codec to handle such bitstreams. As Ogg format bitstreams generally contain binary media bitstreams, it is possible to include executable content in them. This can be an issue with applications that decode these bitstreams, especially when they are used in a network scenario. Such applications MUST ensure correct handling of manipulated bitstreams, of buffer overflow and the like. Pfeiffer & Parker Expires May 4, 2008 [Page 20] Internet-Draft SKELETON November 2007 5. References [CMML] Pfeiffer, S., Parker, C., and A. Pang, "The Continuous Media Markup Language (CMML), Version 2.0 (work in progress)", I-D draft-pfeiffer-cmml-02.txt, March 2005, . [Headers] Resnick, P., "Internet Message Format", RFC 2822, April 2001, . [I18N] Alvestrand, H., "IETF Policy on Character Sets and Languages", RFC 2277, January 1998, . [ISO8601] ISO, TC154., "Data elements and interchange formats -- Information interchange -- Representation of dates and times", ISO 8601, 2000. [Ogg] Pfeiffer, S., "The Ogg encapsulation format version 0", RFC 3533, May 2003, . [SMPTE] The Society of Motion Picture and Television Engineers, "SMPTE STANDARD for Television, Audio and Film - Time and Control Code", ANSI 12M-1999, September 1999. [rfc2119] Bradner, S., "Key words for use in RFCs to Indicate Requirements Levels", RFC 2119, BCP 14, March 1997. [timedURI] Pfeiffer, S., Parker, C., and A. Pang, "Specifying time intervals in URI queries and fragments of time-based Web resources (work in progress)", I-D draft-pfeiffer-temporal-fragments-03.txt, March 2005, . Pfeiffer & Parker Expires May 4, 2008 [Page 21] Internet-Draft SKELETON November 2007 Appendix A. Acknowledgments The authors greatly acknowledge the contributions of Christopher Montgomery and Andre Pang in developing this specification. Pfeiffer & Parker Expires May 4, 2008 [Page 22] Internet-Draft SKELETON November 2007 Authors' Addresses Silvia Pfeiffer Annodex Association, Australia Phone: +61 2 8012 0937 Email: silvia@annodex.net URI: http://www.annodex.org/ Conrad D. Parker Annodex Association, Australia Email: conrad@annodex.net URI: http://www.annodex.org/ Pfeiffer & Parker Expires May 4, 2008 [Page 23] Internet-Draft SKELETON November 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Pfeiffer & Parker Expires May 4, 2008 [Page 24]