File: png-anim-proposal-20070412 It is proposed to register an animated PNG chunk: anIM. In this proposal, the new chunk name is shown in its private form (second letter lowercase). Until the proposal is approved, the private form (or the previously proposed name, mpNG, must be used in any test implementations. It is proposed to document the anIM chunk in the Extensions to the PNG 1.2 Specification, Version 1.3.0, as follows: A. Change document version to 1.4.0. B. Add to the chunk ordering table: anIM No Before IDAT C. Add paragraph 3.7 3.7. anIM Animation Chunk 3.7.1 Overview The anIM chunk contains a compressed "play list" which provides instructions to the decoder for constructing animation frames from pieces of the main PNG image. A viewer that does not recognize the anIM chunk will simply display the entire PNG image as a fallback, which will have the appearance of a number of images combined. 3.7.2 Terminology The "canvas" is the area on the output device on which the images are to be written. The contents of the canvas are not considered to be available to the decoder. The "frame" is a pixel array with dimensions specified by the "frame_width" and "frame_height" parameters of the anIM chunk. Conceptually, each animation frame is constructed in the frame and then written to the canvas. The contents of the frame are available to the decoder. The corners of the frame are mapped to the corners of the canvas. The "PNG image" is the image contained in the IDAT chunks of the PNG datastream. It is the source of the tiles used to construct animation frames. 3.7.3 MIME type and file extension Since a PNG with the anIM chunk meets the PNG file syntax specification, authors can present them as "image/png" and use the ".png" extension. Authors who wish to distinguish clearly between static images and animations can use MIME type "video/x-png" and file extension ".apng" for animated PNGs. 3.7.4 anIM Chunk structure The anIM chunk contains 13 bytes giving the width and height of the frames, number of times the animation is played, frame delay denominator, compression method and followed by one or more 26-byte tile structures: byte 0 frame_width (unsigned int) Width of frames 4 frame_height (unsigned int) Height of frames 8 num_plays (unsigned short) Number of times to play this animation. 0: infinite 10 ticks_per_second (unsigned short) Frame delay denominator 12 compression_method (byte) 0: deflate 13 tile_structure_array[n] (n > 0; n * 26 bytes): tile (compressed) structures Each frame is a montage of one or more tiles which are combined to form a frame of size (frame_width, frame_height). This frame may then be displayed by an anIM capable viewer. Each frame is initialized to a transparent (RGBA(0,0,0,0)) rectangle of dimensions frame_width by frame_height. The num_plays value tells how many times the animation is to be played, including the first time around. If num_plays is zero, the animation iterates infinitely. When the animation iterates, each iteration will produce an identical set of frames, unless the PNG image has not been fully decoded, so it is always safe for viewers to cache them, once they have been fully decoded. A viewer should delay display of the frames in the final iteration until they are completely decoded and combined. The value of ticks_per_second, which is used to calculate frame delays, must not be zero. The tile structure array is always compressed according to the specified compression method. A tile is a structure of 26 bytes: byte 0 x (unsigned int) Left of tile pixels 4 y (unsigned int) Top of tile pixels 8 width (unsigned int) Width of tile pixels 12 height (unsigned int) Height of tile pixels 16 x_offset (signed int) X offset of tile destination 20 y_offset (signed int) Y offset of tile destination 24 ticks_delay (unsigned short) Frame delay numerator The tile data identifies a rectangle (x,y,width,height) in the PNG image containing pixels of the tile. This rectangle is composited on the frame at top left position (x_offset, y_offset). Any part of the tile outside the PNG image must be treated as transparent. Any part of the tile which extends outside the frame must be ignored. The tile width or height may be zero, in which case the tile contains no pixels. 3.7.5 Delays The first tile with ticks_delay different from zero marks the end of the frame. The viewer should display the frame after handling this tile and leave the frame displayed for a frame delay equal to (ticks_delay/ticks_per_second) seconds. If ticks_delay is 65535 (the largest possible value, 0xffff), the frame delay is infinite. A viewer may make the frame delay interruptible or controllable by the user. When a delay is interrupted, an interactive viewer can give the user a choice of advancing to the next frame, quitting, or other options. If composition of the current frame takes longer than the frame delay from the previous frame the viewer should display the current frame immediately. This frame should, itself, be displayed for a duration given by its own frame delay. A viewer should not catch up delays in preceding frames. If the last tile described in the array has a zero delay, then that tile marks the end of the final frame, and that frame is not shown if the display is iterating. This frame must be displayed after the final iteration, when the animation comes to rest. Decoders should use the final frame for any purpose where a static image is desired instead of an animation. If an animation has only one frame, a viewer should display it and stop, ignoring the num_plays value and the frame delay. Even if all of a frame's tiles are entirely outside the frame, the delay is handled in the same way as if they were inside the frame. 3.7.6 Interaction with other PNG chunks Composition must be performed by alpha composition taking account of any pixels in the frame defined by previous tiles in the same frame. Notice that this may result in pixels with partial alpha (neither 0 nor 1); the composition must take this into account and is expected to interpret the color space chunks (in particular gAMA) correctly when doing this. Any gamma or color correction will need to be done during the frame composition step. Once the frame has been constructed, it is handled in the same manner as a regular PNG image after decoding. It can be composited on the canvas, which is the same physical size as the frame. Information from the pHYs chunk may be used in this step, when the canvas pixels and the frame pixels are not the same size. Note that the pHYs chunk conveys the physical size of a pixel in the PNG image. The physical size of a pixel in the animation can be determined by scaling the pHYs 'pixels per unit' values by the frame_height/IHDR_height or frame_width/IHDR_width. Information from the oFFs chunk may be used to locate the canvas with respect to the outside world. The color from the bKGD chunk can be used if the viewer needs a background and cannot get one from the canvas. 3.7.7 Error Handling If the anIM chunk contains an invalid value the whole chunk must be ignored. 3.7.8 anIM recommendations for encoders Tall vs wide vs 2D layouts Typically the best compression is obtained when the PNG image is very wide and not very tall, with similar frames adjacent; however, this layout makes it necessary for the viewer to decode the entire PNG image before it can display even the first frame. If the PNG image is taller and less wide, and earlier frames appear closer to the top, it becomes more possible for the viewer to display while it decodes, but the compression is likely to suffer. A two-dimensional layout of tiles, with about half of the PNG image area filled with a large representative image, has been demonstrated to provide a useable fallback image when scaled down to the desired frame dimensions. 3.7.9 anIM recommendations for decoders interlacing and handling frames that time out before completed D. Add to the Appendix: Revision History * 15 May 2007 (version 1.4.0): * Added the anIM animation chunk.