Editor's note: These minutes have not been edited. Minutes of the Audio/Video Transport Working Group Reported by Steve Casner 1. Working group status The primary output of the AVT working group is the Real-time Transport Protocol, which was published in January 1996 as a Proposed Standard RFC1889 along with the companion RTP profile for audio/video conferencing RFC1890. Progression to Draft Standard is discussed below. In addition, there are four Internet-Drafts awaiting publication which define the RTP payload formats for H.261, JPEG, MPEG and CellB video encodings. These drafts have just been passed by the IESG and will be sent to the RFC Editor for publication right after the IETF meeting. Work continues on the definition of additional proposed payload formats, one of which was presented at this meeting. AVT met for two sessions at this IETF. The first session was dedicated to the major topic, header compression for RTP applications running over low-speed lines. A portion of the second session was given over to presentation of a signaling protocol that is more relevant to the MMUSIC area but did not fit in that group's schedule. The miscellaneous RTP topics comprising the remainder of the session are detailed in later sections of this report. 2. Compression of RTP headers In a presentation to the AVT working group at the March 1996 IETF meeting, Scott Petrack explained the need for compression of RTP headers in order to allow low data rate applications such as Internet telephony over 28.8 kb/s modems to use RTP. He outlined some techniques that could be used between cooperating endpoints to reduce the size of the RTP header. However, at that meeting and in subsequent discussions, some have argued that compression should instead be applied at the endpoints of slow links so that the IP and UDP headers may also be compressed. 2.1. Hop-by-hop compression At this meeting, Steve Casner presented a proposal for hop-by-hop compression of IP/UDP/RTP headers developed with Van Jacobson and derived from RFC1144 TCP/IP compression. The basic idea comes from the observation that although there are several fields that vary from packet to packet in RTP, the differences are often constant from one packet to the next. For example, audio packets are often of constant duration, so the timestamp changes by a constant amount. For this case, all that must be communicated is an indication that the second-order difference is zero along with a small sequence number to detect packet loss between the compressor and decompressor. Additional bits are used to allow indication that individual fields have changed by an unexpected amount, in which case only the differences for those fields are appended in a compact encoding, rather than requiring the full uncompressed header be transmitted. This scheme compresses the 40 bytes of IP, UDP and RTP headers down to 2-4 bytes for most packets. The proposal was well accepted by the group. One question was whether this scheme is too dependent upon characteristics of audio and video, but the delta coding of sequence and timestamp fields seems generally applicable. Timestamps such as those in MPEG which are not monotonic can be handled because the delta is signed. Further details were given in a draft-casner-jacobson-crtp-00.txt, which was sent to the working group mailing list (rem-conf@es.net) but was somehow lost and failed to be officially posted. This draft is to be updated to include changes decided since it was sent to the list as well as completion of the protocol details left for finalization during initial implementation. The group agreed that this proposal should be taken as an AVT work item, so the updated draft will be titled draft-ietf- avt-crtp-00.txt. 2.2. Need for an interim solution The working group agreed that the hop-by-hop compression scheme should be completed and implemented as soon as possible. However, since publication, implementation and deployment of this scheme into the Internet infrastructure will take 12-18 months, vendors of Internet telephones and other applications had asked that AVT define an interim compression scheme that could be implemented right away in the endpoint applications alone. The tradeoff is that the compression gain is marginal (12 byte compress to 2 or 3) compared to compressing IP and UDP headers as well. Furthermore, the ability to measure packet loss and accurately reconstruct media timing would be reduced compared to the full RTP. As a strawman idea, Steve Casner presented a straightforward modification of the hop-by-hop scheme for use in compressing RTP alone end-to-end, but pointed out that the performance was likely to be unacceptable due to the higher loss rate and longer round-trip delay. Instead, Van Jacobson proposes that RTP be sent over TCP to take advantage of the installed base of RFC1144 TCP/IP compression. The RTP header could be compressed to an average of 1 byte if carried over TCP. The problem is that until congestion control algorithms such as Random Early Drop (RED) are deployed in routers, UDP traffic will displace TCP traffic, so vendors may be reluctant to use this TCP solution. Deployment of RED is expected within a few months. Scott Petrack presented some issues in defining an end-to-end protocol and making the transition from that interim solution to the complete solution. Since the end-to-end delay and loss rate are much higher than on a single link, the 4-bit sequence number of the hop-by-hop scheme would not be sufficient, but adding 8 more might be, assuming that the application is willing to proceed even when some packets are lost. An alternative would be to send RTP directly over IP to save the 8 bytes of UDP. However, this does not provide any means for multiplexing RTP and RTCP unless two IP protocol types were allocated (none have been). Scott noted that implementation of the IP/UDP/RTP compression scheme is elective for each applicable link and argued that applications would not be willing to transmit uncompressed RTP packets unless they could get a guarantee that compression was available on all slow links along the path. Carsten Bormann noted that an RSVP bandwidth guarantee provides sufficient information given traffic control that considers header compression in determining the available bandwidth. This is part of his ISSLOW proposal in the ISSLL working group. If a more relaxed guarantee of compression availability separate from bandwidth availability is required, that should be defined as an additional type of service to be provided via RSVP rather than having AVT define a new mechanism specific to this problem. Greg Minshall suggested that applications could measure the performance of the network to decide if sufficient bandwidth was available. Applications might start off using a lower-bandwidth encoding with full RTP for interoperability, but switch to a higher quality encoding if hop-by-hop compression were available or when communicating with another copy of the same program such that a proprietary protocol could be used in place of RTP. There was substantial discussion centering around practicality and timing of an interim solution. Carsten Bormann claimed that Internet Telephony would not really be effective without the latency reduction mechanisms underway in ISSLOW and V.80 modems expected in 1997. Francois Menard noted that even with agreement on the use of RTP (with or without compression), interoperation would not be possible without agreement on voice coding and call control protocols as well, which will take time. If the purpose of the interim solution is to get vendors to switch from proprietary protocols to RTP, then that goal will not have been achieved by defining a new, reduced version of RTP. Bob Webber felt that this would tend to cause confusion by presenting multiple solutions to vendors. Scott Petrack pointed out that suggesting a compressed form of RTP over compressed TCP could cause the same confusion, and that carrying full RTP on compressed TCP might therefore be preferable as an interim solution. Considering that the gain of compressing RTP alone would be relatively small and that it could not be standardized in the necessary timeframe, the prevalent position was that AVT should not define an interim solution. The consensus, supported by a straw poll of the meeting participants, was to move as quickly as possible with the complete solution of IP/UDP/RTP compression and to try to give the industry confidence that this solution will be put in place and will solve the problem. 3. Proposed new RTP payload formats In the second session, Walid Dabbous and Mark Handley presented the redundant audio encoding technique and payload format developed by UCL and INRIA. Walid graphed the results of packet loss studies showing that most packet losses are less than three packets in length, with single-packet losses predominating. Therefore, forward error correction via redundant audio appended to later packets can be effective, as demonstrated by intelligibility tests. The penalty is increased end-to-end delay since the receiver must allow time for the later packets carrying the redundant audio to arrive. Van Jacobson observed that the results of this study might be biased by the location of the test sites. Many paths on the MBone have shown a predominant loss pattern of 500 ms outages occurring at 30, 60 or 90 second intervals coincident with the routing updates in some routers. This would require the spacing between the original and redundant audio data to be increased beyond 3 or 4 packets. Mark Handley described the payload format used for redundant audio as defined in draft-perkins-rtp-redundancy-00.txt. This payload format is to be indicated by a single payload type of its own in the RTP header. Then, in the payload section of the packet, a separate block header is included for each encoding (original data and redundant encodings of earlier data). The block header includes the payload type of the individual block, the length of the block, and the offset of the timestamp for that block relative to the timestamp in the main RTP header. The original data occurs last and its block header includes only the payload type. The length is implied and may be greater than would fit in the 8-bit length field of the block header. No timestamp offset is needed since the RTP timestamp is used directly. Per the draft, the data for each redundant encoding follows immediately after its block header. Van Jacobson suggested appending the redundant encodings after the original data so that the first part of the packet would be in the same form as a packet without redundant encodings. However, that would still require parsing from the end to determine the length of the original data unless the redundant information was all included within a padding field as suggested by Henning Schulzrinne on the mailing list sometime earlier. Philip Lantz suggested that all the block headers be collected at the beginning of the payload section to simplify parsing; Mark Handley plans to make this modification. It was also suggested that the payload format could be generalized to allow multiple data types (such as audio and video) in a single packet, but there are two problems with that suggestion: the small length field in the block header depends upon the fact that compact encodings are used for redundant audio, and using a timestamp offset would not work for timestamps that are unrelated (as is the case for most RTP audio and video encodings). There was no presentation on the draft H.263 video payload format since there have been no significant changes. Presentation of a payload format for G.723 audio was anticipated, but was not ready yet. Another new submission is a proposal by Neil Harris to develop a profile for using RTP in professional audio and video production. Steve Casner put up an overview slide, but there was no presentation at this meeting since the author could not attend. However, working group participants are encouraged to read the draft which is available as draft-harris-rtp-pro-av-00.txt. 4. Progressing RTP to Draft Standard Sufficient time has elapsed so that the RTP spec may now be submitted for elevation from Proposed to Draft Standard status. Steve Casner brought up a few outstanding minor issues that must be addressed as part of this process. A wording change will be made to allow separate destination port numbers to be used for unicast RTP sessions, along with additional "rule changes" proposed by Michael Speer and Steve McCanne to support layered encoding schemes on multiple parallel RTP sessions. An update to the description of the SSRC loop/collision detection algorithm is needed to remove the restriction that the same source port be shared between the RTP and RTCP packets in a session. This is a simple change. However, the algorithm has not had sufficient operational experience in either its current form or with the proposed change. Assistance from implementers is solicited in testing the loss/collision detection algorithm in particular, but also in documenting overall interoperation of multiple independent RTP implementations as is required for progression to the Draft Standard stage. One collision detection issue posed by Karl Auerbach is the "hidden terminal" problem: two colliding sources A and B may not be able to hear each other due to multicast scope limits, but a third host C in between might be able to hear both. The use of network source addresses in the algorithm should allow C to distinguish the two sources and listen to only one. C could also intentionally cause a collision in both directions to induce A and B to change SSRCs. Since there will be edits to the RTP spec in progressing to Draft Standard, it will be necessary to issue the spec again as an updated Internet-Draft to allow comment on the changes. That draft will then be submitted for elevation. 5. Additions to RTCP A portion of the session was given over to an MMUSIC topic that did not fit into that group's schedule. Scott Petrack presented his proposal for a new Simple Internet Signaling Protocol to set up and control RTP sessions. SISP is based on extensions to RTCP to take advantage of the communication path that RTCP provides and save bandwidth by utilizing the source description (SDES) information already transmitted rather than repeating it via another channel. The extensions to RTCP include a new RDES (receiver description) packet type to identify the intended callee, a new RCAP packet type to negotiate capabilities, and a new CP (call progress) item in the SDES packet to indicate "Ringing", "Busy", etc. The RDES packet would be sent to a well-known port, separate from the RTP streams, to initiate the setup. There was substantial discussion of the overlap of this proposal with other protocols under development in MMUSIC (SIP, SCIP and SCCP). Another suggestion was that the subset of Q.931 used in H.323 conferencing would serve the same purpose. Others expressed concern that only providing separate control for each medium misses a user requirement to be able to control some aspects of the multimedia session all at once. Greg Minshall expressed concern that the RCAP function would not be adaptable to new signaling needs that were likely to arise. In short, there was not much support for SISP in its current form. However, the SISP proposal does point out the need for control functions during the course of a session which SIP, for example, does not address. A similar need arises for VCR-like controls when using RTP for video- on-demand. Steve Casner presented a slide on the use of RTCP for VCR controls based on a suggestion from Larry Rowe. In a later MMUSIC session and an after-hours BOF led by Jeff Smith, a new line of discussion was started to consider control functions during a session for purposes such as call control as contained in SISP as well as recording, playback, and other functions. This discussion will continue on the MMUSIC mailing list (confctrl@isi.edu). 6. Miscellaneous issues / logistics The AVT meeting ended with only a couple of minutes available to introduce a few miscellaneous issues but not discuss them: should the working group define an API and a MIB for RTP? The MIB may be a requirement for RTP as a standards-track protocol, but there has not been a strong need for it because RTP monitoring is provided via RTCP. Since there is continuing work to be done, and because the working group has reached the end of the existing charter, the charter must be revised. The chairman takes this task. New work includes: - Finishing work on new payload formats - Possibly adding variable reliability to RTP - Managing the standards track transitions The group agreed to address these issues on the mailing list and to meet again at the December IETF in San Jose.