VCN (Video, Compression, Networking) Glossary

(The original of this page is also available)

This is a collection of often used and misused technical terms regarding video, compression and networking.

Many sources contributed to this list.

If you wish to contribute, correct any mistake or just send your comments and impressions please contact :

Luigi.Filippini@crs4.it

ATV (Advanced TV)

Although sometimes used interchangeably, advanced and high-definition television (HDTV) are not one and the same. Advanced television (ATV) would distribute wide-screen television signals with resolution substantially better than current systems. It requires changes to current emission regulations, including transmission standards. In addition, ATV would offer at least two-channel, CD-quality audio.

A:B:C notation

The a:b:c notation for sampling ratios, as found in the CCIR-601 specifications, has the following meaning : Not only is this notation not internally consistent, but it is incapable of being extended to represent any unusual sampling ratios, eg different ratios for the Cb and Cr channels.

Arithmetic Coding

Perhaps the major drawback to each of the Huffman encoding techniques is their poor performance when processing texts where one symbol has a probability of occurrence approaching unity. Although the entropy associated with such symbols is extremely low, each symbol must still be encoded as a discrete value.

Arithmetic coding removes this restriction by representing messages as intervals of the real numbers between 0 and 1. Initially, the range of values for coding a text is the entire interval [0, 1]. As encoding proceeds, this range narrows while the number of bits required to represent it expands. Frequently occurring characters reduce the range less than characters occurring infrequently, and thus add fewer bits to the length of an encoded message.

ATM

ATM (Asynchronous Transfer Mode) is a switching/transmission technique where data is transmitted in small, fixed sized cells (5 byte header, 48 byte payload). The cells lend themselves both to the time-division- multiplexing characteristics of the transmission media, and the packet switching characteristics desired of data networks. At each switching node, the ATM header identifies a virtual path or virtual circuit that the cell contains data for, enabling the switch to forward the cell to the correct next-hop trunk. The virtual path is set up through the involved switches when two endpoints wish to communicate. This type of switching can be implemented in hardware, almost essential when trunk speed range from 45Mb/s to 1Gb/s.

B-Y R-Y

The human visual system has much less acuity for spatial variation of colour than for brightness. Rather than conveying RGB, it is advantageous to convey luma in one channel, and colour information that has had luma removed in the two other channels. In an analog system, the two colour channels can have less bandwidth, typically one-third that of luma. In a digital system each of the two colour channels can have considerably less data rate (or data capacity) than luma.

Green dominates the luma channel: about 59% of the luma signal comprises green information. Therefore it is sensible, and advantageous for signal-to-noise reasons, to base the two colour channels on blue and 1red. The simplest way to remove luma from each of these is to subtract it to form the difference between a primary colour and luma. Hence, the basic video colour-difference pair is (B-Y), (R-Y) [pronounced "B minus Y, R minus Y"].

The (B-Y) signal reaches its extreme values at blue (R=0, G=0, B=1; Y=0.114; B-Y=+0.886) and at yellow (R=1, G=1, B=0; Y=0.886; B-Y=-0.886). Similarly, the extrema of (R-Y), +-0.701, occur at red and cyan. These are inconvenient values for both digital and analog systems. The colour spaces YPbPr, YCbCr, PhotoYCC and YUV are simply scaled versions of (Y, B-Y, R-Y) that place the extrema of the colour difference channels at more convenient values.

Bridge

Bridges are devices that connect similar and dissimilar LANs at the data link layer (OSIlayer 2), regardless of the physical layer protocols or media being used. Bridges require that the networks have consistent addressing schemes and packet frame sizes. Current introductions have been termed learning bridges since they are capable of updating node address (tracking) tables as well as overseeing the transmission of data between two Ethernet LANs.

Brouter

Brouters are bridge/router hybrid devices that offer the best capabilities of both devices in one unit. Brouters are actually bridges capable of intelligent routing and therefore are used as generic components to integrate workgroup networks . The bridge function filters information that remains internal to the network and is capable of supporting multiple higher-level protocols at once.

The router component maps out the optimal paths for the movement of data from one point on the network to another. Since the brouter can handle the functions of both bridges and routers, as well as bypass the need for the translation across application protocols with gateways, the device offers significant cost reductions in network development and integration.

CCITT

Commite' Consultatif International de Telecommunications et Telegraphy A committee of the International Telecommunications Union responsible for making technical recommendations about telephone and data communication systems for PTTs and suppliers. Plenary sessions are held every four years to adopt new standards.

CD-DA

CD-DA (Compact Disc-Digital Audio), are standard music CDs. CD-DA began CD-ROM when people realized that you could store a whole bunch of computer data on a 12cm optical disc (650mb). CD-ROM drives are simply another kind of digital storage media for computers, albeit read-only. They are peripherals just like hard disks and floppy drives. (Incidentally, the convention is that when referring to magnetic media, it is spelled disk. Optical media like CDs, LaserDisc, and all the other formats are spelled disc)

CD-I

CD-I means Compact Disc Interactive. It is meant to provide a standard platform for mass consumer interactive multimedia applications. So it is more akin to CD-DA, in that it is a full specification for both the data/code and standalone playback hardware: a CD-I player has a CPU, RAM, ROM, OS, and audio/video/(MPEG) decoders built into it. Portable players add an LCD screen and speakers/phonejacks. It has limited motion video and still image compression capabilities. It was announced in 1986, and was in beta test by Spring 1989

This is a consumer electronics format that uses the optical disc in combination with a computer to provide a home entertainment system that delivers music, graphics, text, animation, and video in the living room. Unlike a CD-ROM drive, a CD-I player is a standalone system that requires no external computer. It plugs directly into a TV and stereo system and comes with a remote control to allow the user to interact with software programs sold on discs. It looks and feels much like a CD player except that you get images as well as music out of it and you can actively control what happens. In fact, it is a CD-DA player and all of your standard music CDs will play on a CD-I player; there is just no video in that case.

For a CD-I disk, there may be as few as 1 or as many as 99 data tracks. The sector size in the data tracks of a CD-I disk is approximately 2 kbytes. Sectors are randomly accessible, and, in the case of CD-I, sectors can be multiplexed in up to 16 channels for audio and 32 channels for all other data types. For audio these channels are equivalent to having 16 parallel audio data channels instantly accessible during the playing of a disk.

If you want information about Philips CD-I products, you can call these numbers:

US: Consumer hotline: 800-845-7301 For nearest store: 800-223-7772 Developers hotline: 800-234-5484 UK: Philips CD-I hotline: 0800-885-885 Some useful references about CD-I are : "Discovering CD-I" available for $45 from: "Discovering CD-I" Microware Systems Corporation 1900 NW 114th Street Des Moines, IA 50325-7077 1-800-475-9000 Other books by Philips IMS and published by Addison Wesley: "Introducing CD-I" ISBN 0-201-62748-5 "The CD-I Production Handbook" ISBN 0-201-62750-7 "The CD-I Design Handbook" ISBN 0-201-62749-3

CD-ROM

CD-ROM means "Compact Disc Read Only Memory". A CD-ROM is physically identical to a Digital Audio Compact Disc used in a CD player, but the bits recorded on it are interpreted as computer data instead of music. You need to buy a CD-ROM Drive and attach it to your computer in order to use CD-ROMs.

A CD-ROM has several advantages over other forms of data storage, and a few disadvantages. A CD-ROM can hold about 650 megabytes of data, the equivalent of thousands of floppy discs. CD-ROMs are not damaged by magnetic fields or the xrays in airport scanners. The data on a CD-ROM can be accessed much faster than a tape, but CD-ROMs are 10 to 20 times slower than hard discs.

You cannot write to a CD-ROM. You buy a disc with the data already recorded on it. There are thousands of titles available.

CD-XA

CD-XA is a CD-ROM extension being designed to support digital audio and still images.

Announced in August 1988 by Microsoft, Philips, and Sony, the CD-ROM XA (for Extended Architecture) format incorporates audio from the CD-I format. It is consistent with ISO 9660, (the volume and the structure of CD-ROM), is an application extension of the Yellow Book, and draws on the Green Book.

CD-XA defines another way of formatting sectors on a CD-ROM, including headers in the sectors that describe the type (audio, video, data) and some additional info (markers, resolution in case of a video or audio sector, file numbers, etc).

The data written on a CD-XA can still be in ISO9660 file system format and therefore be readable by MSCDEX and Unix CD-ROM file system translators. A CD-I player can also read CD-XA discs even if its own `Green Book' file system only resembles ISO9660 and isn't fully compatible. However, when a disc is inserted in a CD-I player, the player tries to load an executable application from the CD-XA, normally some 68000 application in the /CDI directory. Its name is stored in the disc's primary volume descriptor. CD-XA bridge discs, like Kodak's Photo CDs, do have such an application, ordinary CD-XA discs don't.

A CD-DA drive is a CD-ROM drive but with some of the compressed audio capabilities found in a CD-I player (called ADPCM). This allows interleaving of audio and other data so that an XA drive can play audio and display pictures (or other things) simultaneously. There is special hardware in an XA drive controller to handle the audio playback. This format came from a desire to inject some of the features of CD-I back into the professional market.

Cell Compression (from Sun Microsystem Inc.)

Cell is a compression technique developed by SMI. The compression algorithms, the bit-stream definition, and the decompression algorithms are open. That is Sun will tell anybody who is interested about them . Cell compression is similar to MPEG and H.261 in that there is a lot of room for value-add on the compressor end. Getting the highest quality image from a given bit count at a reasonable amount of compute is an art. In addition the bit-stream completely defines the compression format and defines what the decoder must do and there is less art in the docoder.

There are two flavors of Cell: the original called Cell or CellA, and a newer flavor called CellB. CellA is designed for use many times video, where one does not mind that the encoder runs at less than real time. For example, CD-ROM playback, distance learning, video help for applications. CellB is designed for use once video where the encoder must run at real time (interactive) rates. For example, video mail and video conferencing.

Both flavors of cell use the same basic technique of representing each 4x4 pixel block with a 16-bit bitmask and two 8-bit vector quantized codebook indices. This produces a compression of 12-1 (or 8-1) since each 16 pixel block is represented by 32 bits (16-bit mask, and two 8-bit codebook indices). In both flavors, further compression is accomplished by checking current blocks against the spatially equivalent block in the previous frame. If the new block is "close enough" to the old block, the block is coded as a skip code. Consecutive skip codes are run-length encoded for further compression. Modifying the definition of close enough allows one to trade off quality and compression rates. Both version of Cell typically compress video images down to about .75 to .5 bits/pixel.

Both flavors have many similar steps in the early part of compression. For each 4x4 block, the compressor calculates the average luma of the 16 pixels. It then partions the pixels into two groups, those whose luma is above the average and those whose luma is below the average. The compressor sets the 16-bit bitmask based on which pixels are in each partition. The compressor then calculates a color to represent each partition.

In Cell, the compressor calculates an average color of each partion, it then does a vector quantization against the Cell codebook (which is just a color-map). The encoded block is the 16-bit mask and the two 8-bit colormap indices. The compressor maintains statistics about how much error each codebook entry is responsible for and how many times each codebook entry is used. It uses these numbers to adaptively refine the codebook on each frame. Changed codebooks are sent in the bitstream.

In CellB, the compressor calculates the average luma for each partition and the average chroma for the entire block. This gives two colors [Y_lo, Cb_ave, Cr_ave] and [Y_hi, Cb_ave, Cr_ave]. The pair [Y_lo, Y_hi] is vector quantized against the Y/Y codebook and the pair [Cb_ave, Cr_ave] is vector quantized against the Cr/Cb codebook. Here the encoded block is the 16-bit mask and the two 8-bit VQ indices. Both of CellB's codebooks are fixed. This allows both the compressor and decompressor to run at high-speed by using table lookups. Both codebooks are designed with the human visual system in mind. They are not just uniform partition of the Y/Y or Cr/Cb space. Each codebook has fewer than 256 entries.

Cell (or CellA) is supported in XIL 1.0 from SMI. It is part of Solaris 2.2. CellB is supported in XIL 1.1 from SMI. It will be part of Solaris 2.3 when that becomes available. Complete bitstream definitions for both flavors of cell are in the XIL 1.1 programmer's guide. There is some discussion of the CellA bitstream in the XIL 1.0 programmer's guide.

CellB was used for the SMI Scott McNealy holiday broadcast, where he talk to the company in real-time over Sun Wide Area Network. This broadcast reach from Tokyo Japan to Munich Germany with over 3000 known viewers.

CIF

Common Image Format. The standardization of the structure of the samples that represent the picture information of a single frame in digital HDTV, independent of frame rate and sync/blank structure.

The uncompressed bit rates for transmitting CIF at 29.97 frames/sec is 36.45 Mbit/sec.

DPCM (Differential Pulse Code Modulation)

Differential pulse code modulation (DPCM) is a source coding scheme that was developed for encoding sources with memory.

The reason for using the DPCM structure is that for most sources of practical interest, the variance of the prediction error is substantially smaller than that of the source.

DVI (Digital Video Interactive)

Digital Video Interactive (DVI) technology brings television to the microcomputer. DVI's concept is simple: information is digitized and stored on a random-access device such as a hard disk or a CD-ROM, and is accessed by a computer. DVI requires extensive compression and real-time decompression of images. Until recently this capability was missing. DVI enables new applications. For example, a DVI CD-ROM disk on twentieth-century artists might consist of 20 minutes of motion video; 1,000 high-res still images, each with a minute of audio; and 50,000 pages of text. DVI uses the YUV system, which is also used by the European PAL color television system. The Y channel encodes luminance and the U and V channels encode chrominance. For DVI, we subsample 4-to-1 both vertically and horizontally in U and V, so that each of these components requires only 1/16 the information of the Y component. This provides a compression from the 24-bit RGB space of the original to 9-bit YUV space.

The DVI concept originated in 1983 in the inventive environment of the David Sarnoff Research Center in Princeton, New Jersey, then also known as RCA Laboratories. The ongoing research and development of television since the early days of the Laboratories was extending into the digital domain, with work on digital tuners, and digital image processing algorithms that could be reduced to cost-effective hardware for mass-market consumer television.

EACEM

European Association of Consumer Electronics Manufacturers

EDTV

Extended [or Enhanced] Definition Television. A television system that offers picture quality substantially improved over conventional 525-line or 625-line receivers, by employing techniques at the transmitter and at the receiver that are transparent to (and cause no visible quality degradation to) existing 525-line or 625-line receivers. One example of EDTV is the improved separation of luminance and colour components by pre-combing the signals prior to transmission, using techniques that have been suggested by Faroudja, Central Dynamics and Dr William Glenn

Entropy

Entropy, the average amount of information represented by a symbol in a message, is a function of the model used to produce that message and can be reduced by increasing the complexity of the model so that it better reflects the actual distribution of source symbols in the original message.

Entropy is a measure of the information contained in message, it's the lower bound for compression.

ESAC

Economics and Statistics Advisory Committee

ESPRIT

European Strategic Programme for Research and Development in Information Technology

ETSI

European Telecommunication Standard Institute

FFT

Fast Fourier Transform

Gateway

Gateways provide functional bridges between networks by receiving protocol transactions on a layer-by-layer basis from one protocol (SNA) and transforming them into comparable functions for the other protocol (OSI). In short, the gateway provides a connection with protocol translation between networks that use different protocols. Interestingly enough, gateways, unlike the bridge, do not require that the networks have consistent addressing schemes and packet frame sizes. Most proprietary gateways (such as IBM SNA gateways) provide protocol converter functions up through layer six of the OSI, while OSI gateways perform protocol translations up through OSI layer seven.

H.261

Recognizing the need for providing ubiquitous video services using the Integrated Services Digital Network (ISDN), CCITT (International Telegraph and Telephone Consultative Committee) Study Group XV established a Specialist Group on Coding for Visual Telephony in 1984 with the objective of recommending a video coding standard for transmission at m x 384 kbit/s (m=1,2,..., 5). Later in the study period after new discoveries in video coding techniques, it became clear that a single standard, p x 64 kbit/s (p = 1,2,..., 30), can cover the entire ISDN channel capacity. After more than five years of intensive deliberation, CCITT Recommendation H.261, Video Codec for Audiovisual Services at p x 64 kbit/s, was completed and approved in December 1990. A slightly modified version of this Recommendation was also adopted for use in North America.

The intended applications of this international standard are for videophone and videoconferencing. Therefore, the recommended video coding algorithm has to be able to operate in real time with minimum delay. For p = 1 or 2, due to severely limited available bit rate, only desktop face-to-face visual communication (often referred to as videophone) is appropriate. For p>=6, due to the additional available bit rate, more complex pictures can be transmitted with better quality. This is, therefore, more suitable for videoconferencing.

HDTV

High-Definition Television. A television system with approximately twice the horizontal and twice the vertical resolution of current 525-line and 625-line systems, component colour coding (e.g. RGB or YCbCr) a picture aspect ratio of 16:9 and a frame rate of at least 24 Hz. Currently there are a number of proposed HDTV standards, including HD-MAC, HiVision and others.

Hybrid Coder

In the archetypal hybrid coder, an estimate of the next frame to be processed is formed from the current frame and the difference is then encoded by some purely intraframe mechanism. In recent years, the most attention has been paid to the motion-compensated DCT coder where the estimate is formed by a two-dimensional warp of the previous frame and the difference is encoded using a block transform (the Discrete Cosine Transform).

This system is the basis for international standards for videotelephony, is used for some HDTV demonstrations, and is the prototype from which MPEG was designed. Its utility has been demonstrated for video sequence, and the DCT concentrates the remaining energy into a small number of transform coefficients that can be quantized and compactly represented.

The key feature of this coder is the presence of a complete decoder within it. The difference between the current frame as represented as the receiver and the incoming frame is processed. In the basis design, therefore, the receiver must track the transmitter precisely, the decoder at the receiver and the decoder at the transmitter must match. The system is sensitive to channel errors and does not permit random access. However, it is on the order of three to four times as efficient as one that uses no prediction.

In practice, this coder is modified to suit specific application. The standard telephony model uses a forced update of the decoded frame so that channel errors do not propagate. When a participant enters the conversation late or alternates between image sources, residual errors die out and a clear image is obtained after a few frames. Similar techniques are used in versions of this coder being developed for direct satellite television broadcasting.

Huffman Coding

For a given character distribution, by assigning short codes to frequently occurring characters and longer codes to infrequently occurring characters, Huffman's minimum redundancy encoding minimizes the average number of bytes required to represent the characters in a text.

Static Huffman encoding uses a fixed set of codes, based on a representative sample of data, for processing texts. Although encoding is achieved in a single pass, the data on which the compression is based may bear little resemblance to the actual text being compressed.

Dynamic Huffman encoding, on the other hand, reads each text twice; once to determine the frequency distribution of the characters in the text and once to encode the data. The codes used for compression are computed on the basis of the statistics gathered during the first pass with compressed texts being prefixed by a copy of the Huffman encoding table for use with the decoding process.

By using a single-pass technique, where each character is encoded on the basis of the preceding characters in a text, Gallager's adaptive Huffman encoding avoids many of the problems associated with either the static or dynamic method.

IDTV

Improved Definition Television. A television system that offers picture quality substantially improved over conventional receivers, for signals originated in standard 525-line or 625-line format, by processing that involves the use of field store and/or frame store (memory) techniques at the receiver . One example is the use of field or frame memory to implement de-interlacing at the receiver in order to reduce interline twitter compared to that of an interlaced display . IDTV techniques are implemented entirely at the receiver and involve no change to picture origination equipment and no change to emission standards

IEC

International Electrotechnic Committee. A standardisation body at the same level as ISO

Interactive videodisc

Interactive video-disc is another video related technology, using an analog approach. It has been available since the early 1980s, and is supplied in the U.S. primarily by Pioneer, Sony, and IBM.

ISDN

ISDN stands for "Integrated Services Digital Networks", and it's a CCITT term for a relatively new telecommunications service package. ISDN is basically the telephone network turned all-digital end to end, using existing switches and wiring (for the most part) upgraded so that the basic call is a 64 kbps end-to-end channel, with bit-diddling as needed (but not when not needed!). Packet and maybe frame modes are thrown in for good measure, too, in some places. It's offered by local telephone companies, but most readily in Australia, France, Japan, and Singapore, with the UK and Germany somewhat behind, and USA availability rather spotty.

A Basic Rate Interface (BRI) is two 64K bearer ("B") channels and a single delta ("D") channel. The B channels are used for voice or data, and the D channel is used for signaling and/or X.25 packet networking. This is the variety most likely to be found in residential service. Another flavor of ISDN is Primary Rate Interface (PRI). Inside the US, this consists of 24 channels, usually divided into 23 B channels and 1 D channel, and runs over the same physical interface as T1. Outside of the US then PRI has 31 user channels, usually divided into 30 B channels and 1 D channel. It is typically used for connections such as one between a PBX and a CO or IXC.

Letter-box

A television system that limits the recording or transmission of useful picture information to about three-quarters of the available vertical picture height of the distribution format (e.g. 525-line) in order to offer program material that has a wide picture aspect ratio

Luma (Y)

Video originates with linear-light (tristimulus) RGB primary components, conventionally contained in the range 0 (black) to +1 (white). From the RGB triple, three gamma-corrected primary signals are computed; each is essentially the 0.45-power of the corresponding tristimulus value, similar to a square-root function.

In a practical system such as a television camera, however, in order to minimize noise in the dark regions of the picture it is necessary to limit the slope (gain) of the curve near black. It is now standard to limit gain to 4.5 below a tristimulus value of +0.018, and to stretch the remainder of the curve to place the Y-intercept at -0.099 in order to maintain function and tangent continuity at the breakpoint:

Rgamma = (1.099 * pow(R,0.45)) - 0.099 Ggamma = (1.099 * pow(G,0.45)) - 0.099 Bgamma = (1.099 * pow(B,0.45)) - 0.099 Luma is then computed as a weighted sum of the gamma-corrected primaries: Y = 0.299*Rgamma + 0.587*Ggamma + 0.114*Bgamma The three coefficients in this equation correspond to the sensitivity of human vision to each of the RGB primaries standardized for video. For example, the low value of the blue coefficient is a consequence of saturated blue colours being perceived as having low brightness.

The luma coefficients are also a function of the white point (or chromaticity of reference whitex). Computer users commonly have a white point with a colour temperature in the range of 9300 K, which contains twice as much blue as the daylight reference CIE D65 used in television. This is reflected in pictures and monitors that look too blue.

Although television primaries have changed over the years since the adoption of the NTSC standard in 1953, the coefficients of the luma equation for 525 and 625 line video have remained unchanged. For HDTV, the primaries are different and the luma coefficients have been standardized with somewhat different values.

Lempel-Ziv Welch (LZW) Compression

Algorithm used by the Unix compress command to reduce the size of files, eg. for archival or transmission. The algorithm relies on repetition of byte sequences (strings) in its input. It maintains a table mapping input strings to their associated output codes. The table initially contains mappings for all possible strings of length one. Input is taken one byte at a time to find the longest initial string present in the table. The code for that string is output and then the string is extended with one more input byte, b. A new entry is added to the table mapping the extended string to the next unused code (obtained by incrementing a counter). The process repeats, starting from byte b. The number of bits in an output code, and hence the maximum number of entries in the table is usually fixed and once this limit is reached, no more entries are added.

Model-Based Coder

Communicating a higher-level model of the image than pixels is an active area of research. The idea is to have the transmitter and receiver agree on the basic model for the image; the transmitter then sends parameters to manipulate this model in lieu of picture elements themselves. Model-based decoders are similar to computer graphics rendering programs.

The model-based coder trades generality for extreme efficiency in its restricted domain. Better rendering and extending of the domain are research themes.

Modem (Modulator/demodulator)

An electronic device for converting between serial data (typically RS-232) from a computer and an audio signal suitable for transmission over telephone lines. The audio signal is usually composed of silence (no data) or one of two frequencies representing 0 and 1. Modems are distinguished primarily by the baud rates they support which can range from 75 baud up to 19200 and beyond.

Data to the computer is sometimes at a lower rate than data from the computer on the assumption that the user cannot type more than a few characters per second. Various data compression and error algorithms are required to support the highest speeds. Other optional features are auto-dial (auto-call) and auto-answer which allow the computer to initiate and accept calls without human intervention.

NAB

National Association of Broadcasters

NHK

Nippon Hoso Kyokai, principal japanese broadcaster

NTSC (National Television System Committee)

USA video standard with image format 4:3, 525 lines, 60 Hz and 4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses YIQ

OSI

The Open Systems Interconnection Reference Model was formally initiated by the International Organization for Standardization (ISO) in March, 1977, in response to the international need for an open set of communications standards. OSI's objectives are: The model is similar in structure to that of SNA. It consists of seven architectural layers: the physical layer; the data link layer, the network layer; the transport layer; the session layer; the presentation layer; the application layer.

The physical and data link layers provide the same functions as their SNA counterparts (physical control and data link control layers). The network layer selects routing services, segments blocks and messages, and provides error detection, recovery, and notification.

The transport layer controls point-to-point information interchange, data packet size determination and transfer, and the connection/disconnection of session entities.

The session layer serves to organize and synchronize the application process dialog between presentation entities, manage the exchange of data (normal and expedited) during the session, and monitor the establishment/release of transport connections as requested by session entities.

The presentation layer is responsible for the meaningful display of information to application entities.

More specifically, the presentation layer identifies and negotiates the choice of communications transfer syntax and the subsequent data conversion or transformation as required. The application layer affords the interfacing of application processes to system interconnection facilities to assist with information exchange. The application layer is also responsible for the management of application processes including initialization, maintenance and termination of communications, allocation of costs and resources, prevention of deadlocks, and transmission security.

PAL (Phase Alternating Line)

European video standard with image format 4:3, 625 lines, 50 Hz and 4 Mhz video bandwidth with a total 8 Mhz of video channel width. PAL uses YUV.

QCIF

Quarter Common source Intermediate Format (1/4 CIF , e.g. 1180*144)

The uncompressed bit rates for transmitting QCIF at 29.97 frames/sec is 9.115 Mbit/s.

Region Coding

Region Coding has received attention because of the ease with which it can be decoded and the fact that a coder of this type is used in Intel's Digital Video Interactive system (DVI), the only commercially available system designed expressly for low-cost, low-bandwidth multimedia video. Its operation is relatively simple. The basic design is due to Kunt.

Envision a decoder that can reproduce certain image primitives well. A typical set might consist of rectangular areas of constant color, smooth shaded patches and some textures. The image is analyzed into regions that can be expressed in terms of these primitives. The analysis is usually performed using a tree-structured decomposition where each part of the image is successively divided into smaller regions until a patch that meets either the bandwidth constraints or the quality desired can be fitted. Only the tree description and the parameters for each leaf need then be transmitted. Since the decoder is optimized for the reconstruction of these primitives, it is relatively simple to build.

To account for image data that does not encode easily using the available primitives, actual image data can also be encoded and transmitted, but this is not as efficient as fitting a patch.

This coder can also be combined with prediction (as it is in DVI), and the predicted difference image can then be region coded. A key element in the encoding operation is a region growing step where adjacent image patches that are distinct leaves of the tree are combined into a single patch. This approach has been considered highly asymmetric in that significantly more processing is required for encoding/analysis than for decoding. It is harder to grow a tree than to climb one.

While hardware implementations of the hybrid DCT coder have been built for extremely low bandwidth teleconferencing and for HDTV, there is no hardware for a region coder. However, such an assessment is deceptive since much of the processing used in DVI compression is in the motion predictor, a function common to both methods. In fact, all compression schemes are asymmetric, the difference is a matter of degree rather than one of essentials.

Repeater

Repeaters are transparent devices used to interconnect segments of an extended network with identical protocols and speeds at the physical layer (OSI layer 1). An example of a repeater connection would be the linkage of two carrier sense multiple access/collision detection (CSMA/CD) segments within a network.

Router

Routers connect networks at OSI layer 3. Routers interpret packet contents according to specified protocol sets, serving to connect networks with the same protocols (DECnet to DECnet, TCP/IP (Transmission Control Protocol/Internet Protocol) to TCP/IP). Routers are protocol-dependent; therefore, one router is needed for each protocol used by the network. Routers are also responsible for the determination of the best path for data packets by routing them around failed segments of the network.

SECAM (Sequentiel Coleur A Memoire)

European video standard with image format 4:3, 625 lines, 50 Hz and 6 Mhz video bandwidth with a total 8 Mhz of video channel width.

SMPTE

SMPTE is the Society of Motion Picture and Television Engineers. There is an SMPTE time code standard (hr:min:sec:frame) used to identify video frames.

SNA

Systems network Architecture entered the market in 1974 as a hierarchical, single-host network structure. Since then, SNA has developed steadily in two directions. The first direction involved tying together mainframes and unintelligent terminals in a master-to-slave relationship. The second direction transformed the SNA architecture to support a cooperative-processing environment, whereby remote terminals link up with mainframes as well as each other in a peer-to-peer relationship (termed Low Entry Networking (LEN) by IBM). LEN depends on the implementation of two protocols: Logical Unit 6.2, also known as APPC, and Physical Unit 2.1 which affords point-to-point connectivity between peer nodes without requiring host computer control.

The SNA model is concerned with both logical and physical units. Logical units (LUs) serve as points of access by which users can utilize the network. LUs can be viewed as terminals that provide users access to application programs and other services on the network. Physical units (PUs) like LUs are not defined within SNA architecture, but instead, are representations of the devices and communication links of the network.

Standard bodies

Any country have national standard body where experts from industry and universities develop standards for all kinds of engineering problems. Among them are, for instance, ANSI American National Standards Institute USA DIN Deutsches Institut fuer Normung Germany BSI British Standards Institution United Kingdom AFNOR Association francaise de normalisation France UNI Ente Nazionale Italiano di Unificatione Italy NNI Nederlands Normalisatie-instituut Netherlands SAA Standards Australia Australia SANZ Standards Association of New Zealand New Zealand NSF Norges Standardiseringsforbund Norway DS Dansk Standard Denmark and about 80 others.

The International Organization for Standardization, ISO, in Geneva is the head organization of all these national standardization bodies. Together with the International Electrotechnical Commission, IEC, ISO concentrates its efforts on harmonizing national standards all over the world. The results of these activities are published as ISO standards. Among them are, for instance, the metric system of units, international stationery sizes, all kinds of bolt nuts, rules for technical drawings, electrical connectors, security regulations, computer protocols, file formats, bicycle components, ID cards, programming languages, International Standard Book Numbers (ISBN), ... Over 10,000 ISO standards have been published so far and you surely get in contact with a lot of things each day that conform to ISO standards you never heard of. By the way, "ISO" is not an acronym for the organization in any language. It's a wordplay based on the English/French initials and the Greek-derived prefix "iso-" meaning "same".

Within ISO, ISO/IEC Joint Technical Committee 1 (JTC1) deals with information technology.

The International Telecommunication Union, ITU, is the United Nations specialized agency dealing with telecommunications. At present there are 164 member countries. One of its bodies is the International Telegraph and Telephone Consultative Committee, CCITT. A Plenary Assembly of the CCITT, which takes place every few years, draws up a list of 'Questions' about possible improvements in international electronic communication. In Study Groups, experts from different countries develop 'Recommendations' which are published after they have been adopted. Especially relevant to computing are the V series of recommendations on modems (e.g. V.32, V.42), the X series on data networks and OSI (e.g. X.25, X.400), the I and Q series that define ISDN, the Z series that defines specification and programming languages (SDL, CHILL), the T series on text communication (teletext, fax, videotext, ODA) and the H series on digital sound and video encoding.

Since 1961, the European Computer Manufacturers Association, ECMA, has been a forum for data processing experts where agreements have been prepared and submitted for standardization to ISO, CCITT and other standards organizations.

Sub band coding

Sub-band coding for images has roots in work done in the 1950s by Bedford and on Mixed Highs image compression done by Kretzmer in 1954. Schreiber and Buckley explored general two channel coding of still pictures where the low spatial frequency channel was coarsely sampled and finely quantized and the high spatial frequency channel was finely sampled and coarsely quantized. More recently, Karlsson and Vetterli have extended this to multiple subbands. Adelson et al. have shown how a recursive subdivision called a pyramid decomposition can be used both for compression and other useful image processing tasks.

A pure sub-band coder performs a set of filtering operations on an image to divide it into spectral components. Usually, the result of the analysis phase is a set of sub-images, each of which represents some region in spatial or spatio-temporal frequency space. For example, in a still image, there might be a small sub-image that represents the low-frequency components of the input picture that is directly viewable as either a minified or blurred copy of the original. To this are added successively higher spectral bands that contain the edge information necessary to reproduce the original sharpness of the original at successively larger scales. As with DCT coder, to which it is related, much of the image energy is concentrated in the lowest frequency band.

For equal visual quality, each band need not be represented with the same signal-to-noise ratio; this is the basis for sub-band coder compression. In many coders, some bands are eliminated entirely, and others are often compressed with a vector or lattice quantizer. Succeedingly higher frequency bands are more coarsely quantized, analogous to the truncation of the high frequency coefficients of the DCT. A sub-band decomposition can be the intraframe coder in a predictive loop, thus minimizing the basic distinctions between DCT-based hybrid coders and their alternatives.

T1Q1.5

The T1Q1.5 Video Teleconferencing/Video Telephony (VTC/VT) ANSI Subworking Group (SWG) was formed to draft a performance standard for digital video. Important questions were asked, relating to video digital performance characteristics of video teleconferencing/video telephony : The VTC/VT Subworking Group's goal is to answer these questions. It has become a first step to the process of constructing the performance standard.

Trellis Coding

Trellis coding is a source coding technique that has resulted in numerous publications and some very effective source codes. Unfortunately, the computational burden of these codes is tremendous and grows exponentially with the encoding rate.

A trellis is a transition diagram, that takes time into account, for a finite state machine. Populating a trellis means specifying output symbols for each branch, specifying an initial state yields a set of allowable output sequences.

A trellis coder is defined as follows: given a trellis populated with symbols from an output alphabet and an input sequence x of length n, a trellis coder outputs the sequence of bits corresponding to the output sequence x that maximizes the SNR of the encoding.

X.25

A standard networking protocol suite approved by the CCITT and ISO. This protocol suite defines standard physical, link, and networking layers (OSI layers 1 through 3). X.25 networks are in use throughout the world.

X.400

The set of CCITT communications standards covering mail services provided by data networks.

YCC (Kodak PhotoCD [tm])

Kodak's PhotoYCC colour space (for PhotoCD) is similar to YCbCr, except that Y is coded with lots of headroom and no footroom, and the scaling of Cb and Cr is different from that of Rec. 601-1 in order to accommodate a wider colour gamut: Y_8bit = (255/1.402) * Y C1_8bit = 156 + 111.40 * (Bgamma - Y) C2_8bit = 137 + 135.64 * (Rgamma - Y) The C1 and C2 components are subsequently subsampled by factors of two horizontally and vertically, but that subsampling should be considered a feature of the compression process and not of the colour space.

YCbCr

The international standard CCIR-601-1 specifies eight-bit digital coding for component video, with black at luma code 16 and white at luma code 235, and chroma in eight-bit two's complement form centred on 128 with a peak at code 224. This coding has a slightly smaller excursion for luma than for chroma: luma has 219 risers compared to 224 for Cb and Cr. The notation CbCr distinguishes this set from PbPr where the luma and chroma excursions are identical.

For Rec. 601-1 coding in eight bits per component,

Y_8b = 16 + 219 * Y Cb_8b = 128 + 112 * (0.5/0.886) * (Bgamma - Y) Cr_8b = 128 + 112 * (0.5/0.701) * (Rgamma - Y) Some computer applications place black at luma code 0 and white at luma code 255. In this case, the scaling and offsets above can be changed accordingly, although broadcast-quality video requires the accommodation for headroom and footroom provided in the CCIR-601-1 equations.

CCIR-601-1 Rec. calls for two-to-one horizontal subsampling of Cb and Cr, to achieve 2/3 the data rate of RGB with virtually no perceptible penalty. This is denoted 4:2:2. A few digital video systems have utilized horizontal subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG normally subsample Cb and Cr two-to-one horizontally and also two-to-one vertically, to get 1/2 the data rate of RGB. No standard nomenclature has been adopted to describe vertical subsampling. To get good results using subsampling you should not just drop and replicate pixels, but implement proper decimation and interpolation filters.

YCbCr coding is employed by D-1 component digital video equipment.

YPbPr

If three components are to be conveyed in three separate channels with identical unity excursions, then the Pb and Pr colour difference components are used: Pb = (0.5/0.886) * (Bgamma - Y) Pr = (0.5/0.701) * (Rgamma - Y) These scale factors limit the excursion of EACH colour difference component to -0.5 .. +0.5 with respect to unity Y excursion: 0.886 is just unity less the luma coefficient of blue. In the analog domain Y is usually 0 mV (black) to 700 mV (white), and Pb and Pr are usually +- 350 mV.

YPbPr is part of the CCIR Rec. 709 HDTV standard, although different luma coefficients are used, and it is denoted E'Pb and E'Pr with subscript arrangement too complicated to be written here.

YPbPr is employed by component analog video equipment such as M-II and BetaCam; Pb and Pr bandwidth is half that of luma.

YIQ

The U and V signals above must be carried with equal bandwidth, albeit less than that of luma. However, the human visual system has less spatial acuity for magenta-green transitions than it does for red-cyan. Thus, if signals I and Q are formed from a 123 degree rotation of U and V respectively [sic], the Q signal can be more severely filtered than I (to about 600 kHz, compared to about 1.3 MHz) without being perceptible to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with a 33 degree rotation and an axis flip in the UV plane. The first edition of W.K. Pratt "Digital Image Processing", and presumably other authors that follow that bible, has a matrix that erroneously omits the axis flip; the second edition corrects the error.

Since an analog NTSC decoder has no way of knowing whether the encoder was encoding YUV or YIQ, it cannot detect whether the encoder was running at 0 degree or 33 degree phase. In analog usage the terms YUV and YIQ are often used somewhat interchangeably. YIQ was important in the early days of NTSC but most broadcasting equipment now encodes equiband U and V.

The D-2 composite digital DVTR (and the associated interface standard) conveys NTSC modulated on the YIQ axes in the 525-line version and PAL modulated on the YUV axes in the 625-line version.

YUV

In composite NTSC, PAL or S-video systems, it is necessary to scale (B-Y) and (R-Y) so that the composite NTSC or PAL signal (luma plus modulated chroma) is contained within the range -1/3 to +4/3. These limits reflect the capability of composite signal recording or transmission channel. The scale factors are obtained by two simultaneous equations involving both B-Y and R-Y, because the limits of the composite excursion are reached at combinations of B-Y and R-Y that are intermediate to primary colours. The scale factors are as follows: U = 0.493 * (B - Y) V = 0.877 * (R - Y) U and V components are typically modulated into a chroma component: C = U*cos(t) + V*sin(t) where t represents the ~3.58 MHz NTSC colour sub-carrier. PAL coding is similar, except that the V component switches Phase on Alternate Lines (+-1), and the sub-carrier is at a different frequency, about 4.43 MHz.

It is conventional for an NTSC luma signal in a composite environment (NTSC or S-video) to have 7.5% setup :

Y_setup = (3/40) + (37/40) * Y A PAL signal has zero setup.

The two signals Y (or Y_setup) and C can be conveyed separately across an S-video interface, or Y and C can be combined (encoded) into composite NTSC or PAL:

NTSC = Y_setup + C PAL = Y + C U and V are only appropriate for composite transmission as 1-wire NTSC or PAL, or 2-wire S-video. The UV scaling (or the IQ set, described below) is incorrect when the signal is conveyed as three separate components. Certain component video equipment has connectors labelled YUV that in fact convey YPbPr signals.


The following is a list of persons whose material contributed to the creation of this list :

Andrew Davidson andy@aimla.com Tom Lane tgl@cs.cmu.edu Charles A. Poynton poynton@sun.com Lee Westover lee.westover@sun.com and many, many more others ....
You might also want to look at the following documents :

comp.dsp.faq

jpeg.faq

standards.faq