MessageWay BOF (MSGWAY) Reported by Danny Cohen/Myricom The MSGWAY BOF, chaired by Danny Cohen, was held on Tuesday, 4 April, at the 32nd IETF meeting in Danvers, MA. Sixteen people attended. Danny presented the problem and a proposed approach (see slides). A discussion of MsgWay and of the working group followed. The Problem While the speed of computing circuits increases with time, the speed of light is unchanged. As a result, distances shrink. For example, the diameter of Ethernets has shrunk from 2Km to 0.2Km as their speed grew from 10 to 100Mbps. Similarly, buses that used to dominate the inter-board communication (e.g., VME) are useful nowadays mainly for intra-board communication (e.g., PCI). Modern computing systems in general, and MPPs (Massively Parallel Processing systems) in particular, use ``I/O fabric'' (MPP-networks) in stead of the traditional I/O buses. Most MPP-networks are built to handle variable length packets, are made of very short point-to-point FDX links of high performance (high data rates, low latency, and very low BER), have error detection and flow control, and use cut-thru (aka ``wormhole'') switches with source routing. In spite of these common similarities, each MPP network is typically an island unto itself, incapable of interoperating with other MPP-networks. There is a need to use several homogeneous MPPs, and clusters (or networks) of workstations, as a single MPP, without losing the high performance communication native to them. A Proposed Approach The interoperability between heterogeneous computing systems should be handled at Level 3, like IP, not just at the lower levels. IP, that has successfully served the Internet for over 20 years as the basic tool for interoperability among heterogeneous computers, is not appropriate for MsgWay because many tradeoffs were made sacrificing high performance for generality and scalability. In addition, IP does not address individual processors in MPPs. (However, IPv6 could have been modified to fix this problem.) The MsgWay approach is to define a Level 3 protocol that is similar in its philosophy to IP, but has implementation details geared toward high-performance, possibly at some cost of generality and wide area scalability. MsgWay will be a Level 3 protocol, like IP, which could support IP (by encapsulation). The MsgWay protocol will have both an EEP (end-to-end protocol, like IP) and an RRP (router-to-router protocol, like GGP). Among the tradeoffs that made IP general, but proved to be deficiencies for high performance are: o Long addresses (32 going on 128 bits) o Addressing ``hosts'' only (not individual processors) o No support of source routing o Need for routers with global knowledge o Hierarchical de-muxing o No flow control o No error detection o No fault recovery o No support of DMA o No support of byte alignment o Fields not sorted by need MsgWay will alleviate these deficiencies by having addresses of 16-bits, that could be dynamically assigned for sessions. MsgWay will support both source routing (for Level 2 forwarding) and Level 3 addressing (for Level 3 forwarding). The use of the source routing would allow the MsgWay switches to operate without any routing knowledge that has to be loaded to them. MsgWay will have format to support zero-copy operation (i.e., direct copy from the network interface into the destination user area). MsgWay would have flow control based on the flow control of the participating networks. Similarly, MsgWay would have error indication in trailers, to allow the use of various CRC hardware. Even though each participating network may use a different technique for error detection, MsgWay would have a uniform way to indicate errors. MsgWay will address the alignment issue, to allow computers with different chunks (such as the Paragon's 8B-chunks, RACEway's 4B-chunks, and Myrinet's 2B-chucks) to efficiently communicate. In addition, in order to minimize the wormhole latency, the fields in the MsgWay protocol header will be sorted by their need (e.g., starting with the destination address that is always needed). MsgWay would support dynamic mapping and discovery required for automatic fault recovery. Like IP, MsgWay does not define performance figures, connectors, communication media, address assignment, routing and discovery, APIs, and so on. Following the IP philosophy, all these issues will be defined separately. MsgWay defines a Level 3 protocol for interoperability of heterogeneous multi-processors at high performance. Discussion A discussion about the proposed MsgWay activity followed the above presentation. Several questions were raised by the participants in the BOF. o Why should MsgWay be an IETF activity? It is proposed to conduct the MsgWay activity as an IETF working group because of the firm belief that interoperability should be handled at Level 3 (not just at Levels 1 and 2), and because of the recognition that MPP-networks are computer communication networks with much in common with the networks that the IETF community is dealing with. MsgWay is a small computer network, not an extended computer bus. o Why not use IP ``as is'' with slight modifications, as needed for high performance? It is believed that this is the proposed approach. o What about transport level issues, like reliability (a la TCP)? It is left for higher level protocols, as/if needed (note that this is exactly IP's approach). o Must MsgWay hosts use source-route? No. MsgWay will support both Level 2 forwarding (by source routes) and Level 3 forwarding (by addresses). o Must processors in the same host (say a Paragon) use MsgWay among each other? No. They may use their native communication system. For generality the API may look the same but there is no need to use MsgWay for internal communication within a system. This is similar to the use of IP between hosts on the same LAN. (Hosts on the same ethernet could communicate by raw ethernet packets, without IP - but using IP has some advantages.) o How is the Source Route handled? It is consumed along the way (not an incremented pointer). This allows each network along the path to be presented with exactly the optimal bit pattern for its use. Note that this requires recomputing the checksum. o MTU? The maximum packet size will be configured for the entire MsgWay (probably not exceeding a few KBytes). It is assumed that each participating network can handle large packets. There is no need to legislate that all MsgWay's always have the same MTU. It is expected that the mapping process will automatically discover the MTU and disseminate it. o Interconnection of separate MsgWay-islands? MsgWay-islands could be interconnected via IP. They could be either (1) interconnected by using IP as a tunnel encapsulating MsgWay, or (2) connected by using IP and having the MsgWay-islands independent of each other (treating MsgWay as a LAN). Once IP is used over WANs the high-performance of MsgWay is most likely to be lost. o Up to how many stages of source-route make sense (rather than addresses)? This is a runtime binding. No need to decide at committee time. Msgway should be able to handle both. The MSGWAY Working Group Mailing list information for the MSGWAY group: General Discussion: MsgWay@myri.com To Subscribe: MsgWay-request@myri.com Archive: ftp://ftp.isi.edu/msgway/msgway.mail Danny will work with Frank Kastenholz, one of the Internet Area co-Directors, on a draft charter for the proposed working group and will post it on the mailing list. The MSGWAY Working Group is expected to conduct its work over e-mail, to meet at IETF meetings, and to possibly have additional meetings between IETF meetings. Danny reported that in addition to those who participated in the 32nd IETF BOF meeting, there are about 20 other people from academia, government, and industry (see list below) who expressed interest in participating in defining MsgWay. Most of them had already participated in two meetings discussing MsgWay (January 1995 in Utah, and March 1995 in Florida). Most of these people expressed interest in participating in the IETF MSGWAY Working Group. Given that both Jon Postel and Danny Cohen were already scheduled to be in this IETF BOF meeting, the others were advised that their presence at this meeting was not necessary. Among those who participated in the earlier MsgWay meetings are people from Intel, Mercury and Myricom that are committed to implement and to demonstrate interoperability among Intel's Paragon, Mercury's RACEway, and Myricom's Myrinet, using the format that will be adopted for MsgWay by the MSGWAY Working Group. Academia Jon Postel Postel@isi.edu USC/ISI Tony Skjellum tony@aurora.cs.msstate.edu Mississippi State University Al Davis ald@cs.utah.edu Univ of Utah/CSD Barney Maccabe maccabe@cs.unm.edu UNM/CS + Sandia Stu Tewksbury skt@msrc.wvu.edu West Virginia University Andy White abw@lanl.gov Los Alamos National Lab Government Mike. Masters mmaster@ariel.nswc.navy.mil Naval Surface Warfare Center Jose L. Munoz munoz@arpa.mil ARPA/CSTO Bob Parker rparker@arpa.mil ARPA/CSTO Industry Danny Cohen Cohen@myri.com Myricom Chuck Seitz Chuck@myri.com Myricom Craig Lund clund@mc.com Mercury Computer Systems Alan L. Pool alp@mc.com Mercury Computer Systems Bob Graybill graybill@mmlgrf.mml.mmc.com Martin Marietta Laboratories Greg Chesson greg@sgi.com Silicon Graphics Glenn. Ladd. gladd@msmail4.hac.com Hughes Lloyd Lewins llewins@msmail4.hac.com Hughes Phil Sementilli sement@igate1.hac.com Hughes Missiles Dave Dunning ddunning@ssd.intel.com Intel SSD Paul Pierce prp@ssd.intel.com Intel SSD Stephen Wheat srwheat@ssd.intel.com Intel SSD Joe Brewer JoeEBrewer@aol.com Westinghouse Bob Means rwm@hnc.com HNC, Inc. Marc Campbell campbellm@aol.com Northrop-Grumman Schedule It is expected to have a rough draft of the minimal MsgWay protocol by the 33rd IETF meeting in mid-July. It is expected that the first interoperability demonstration will take place no later than October 1995. Myrinet, RACEway, and Intel's Paragon are expected to participate in that interoperability demonstration. Legalities Frank Kastenholz of FTP Software brought up legal issues. It was suggested that Danny should check with Carl Malamud about related patents that may be in the way of MsgWay. (Already done.) It was reported that Myricom has trademarked both MessageWay and Msgway, for free use by this activity. By including the slides and this text in the proceedings of the IETF we are establishing MsgWay prior-art at least for April 95.