MAY Version 2.10 ================ A Distributed Processing Package for Easing the Developemnt of Algorithms with Coarse-Grained Parallelism --------------------------------------------------------- Linas Vepstas, 1993 ------------------- Intro ===== MAY is a system with a simple software API that allows programmers to develop coarse-grained parallel algorithms that execute across multiple distributed networked computers. The MAY system provides for the remote execution of programs, and a simple-to-use API that implements an asynchornous message-passing service. Built on top of this simple remote-execution/messaging service, MAY also provides a simple interface to a "processor farm". This allows for the easy development of applications that wish to have roughly equivalent pieces of work automatically dispatched to multiple processors, with the results automatically gathered back together. This processor farm implements load balancing with a variable FIFO-length monitoring algorithm. History ======= This C/UNIX/TCP-IP implementation of MAY was inspired by Larry Henson's U-MAY Distributed Processing Environment [Hen87]. U-MAY was a UNIX / PC Network implementation of Holck's PC-MAY [Hol85], which in turn was based on Bagrodia & Chandy's MAY Simulation Language [Bag84], an implementation of the inter-process communication concepts developed by Chandy & Misra [Cha81]. Many thanks to Larry Henson for introducing me to the MAY primitives and encouraging work on a TCP/IP implementation of MAY. The current implementation of MAY was placed in the public domain in January 1993 The MAY System ============== The MAY system is implemented in C for a networked UNIX environment. Message passing is implemented on top of both TCP/IP and UDP/IP. Remote process creation is supported through the remotely executing maydaemon. Currently, MAY has been ported to the following platforms: -- IBM RS/6000 w/ AIX 3.1.5 & AIX 3.2.x -- IBM RT w/ AIX 2.2.1 -- SGI Personal Iris w/ IRIX 3.1, IRIX 3.2.2 -- SGI Iris Indigo w/ IRIX 4.0.5 -- HP 9000 w/ HPUX 8.0.7 -- SparcStation 1 w/ SUNOS v4.1.1 -- DEC ULTRIX 4.2A Perhaps the easiest and quickest way to understand MAY is to look at the programmer's API. The basic API consists of five routines: -- mayCreate (program name, machine name) This routine starts execution of the process "program name" on the machine "machine name". The caller of this routine is refered to as the "parent", and the process created is called the "child". During process creation, an IP (TCP or UDP) socket is opened up between the parent and child process. The details of establishing the IP link is hidden by the API. -- mayGetParent() This routine returns the address of the parent process to the child. It is typically invoked during the child processes' initialization. -- maySend (process adress, message ID, message) This routine transmits a message to the indicated process. Messages can be assigned ID's to aid out-of-sequence message retreival at the receiving end. A message is no more than a sequence of bytes: thus, a message can be a string, it can be a C struct that has been cast into a flat list of bytes, an array of floating point or integer values, a binary dump, ... any kind of byte sequence. -- mayReceive (process address, message ID, message, timeout) This routine receives a message from the indicated sender. The message ID can be used to retreive a particular message from the collection of unreceived messages. Thus messages can be received in an order that is independent of how they were transmitted. This routine blocks until the indicated message arrives, or until the timeout period has expired. -- mayTerminate() This routine frees all internally used storage, and closes all open connections. With these basic primitives, parent processes can spawn child processes, which in turn can spawn other processes, which can then communicate with one-another. By allowing processes to be created on remote machines, multiple processing units can be used to decrease the overall execution time of of CPU-intensive algorithms. An example of such parallel speedup can be found in the concept of the processor farm. An algorithm that requires roughly similar work to be performed on many rougly equivalent pieces is ripe for parallelization. Each piece can be distributed, "farmed out", to a different processor, where the actual computations are performed. The results of each computation are then to be brought back together again, from which the final result can be assembled. The MAY system provides an interface that allows for the easy implementation of such algorithms. Again, the concepts are easiest to understand by examining the API. -- mayCreateFarm (list of machine names, list of program names) This routine starts up the indicated processes on the indicated machines. It is more than just a mere mayCreate() that allows multiple creation, because it performs vital initialization that will allow the farm to identify idle and busy machines, and parcel out work accordingly. -- mayProcFarm (list of work pieces, (*callback)() ) This routine automatically dispatches the work pieces to idle machines. The work pieces are treated as messages -- byte sequences that are sent with maySend to the "farmhand" processes. As each piece of work is completed (as each "farmhand" responds) the subroutine pointed at by (*callback)() is called, thus allowing the completed work to be gathered back together. The ProcFarm routine hides considerable technology for distributing work evenly -- a concept refered to a "load balancing". This routine monitors the status of the remote processes, determining which are busy and which are (almost) idle, and need to be given more work. It does so by monitoring the amount of queued up work at the remote machines. Each remote machine is assigned a "low water mark" and a "high water mark" -- machines whose queues have gotten below the low water mark are fed more work, and machines whose queues have gotten above the high water mark are left alone. The low and high water marks are kept individually for each machine, and are adjusted dynamically. That is, if a machine keeps repeatedly draining below its low water mark, both marks are raised, so that the possiblity of the machine going idle is minimized. Similarly, the marks are lowered for a machine that is being excessively slow in responding. Thus, heterogeneous machine can be incorporated in the farm -- faster machine simply end up doing more work, while slower machines are not overburdened with impossibly large quantities of work. In addition to high and low water marks, the processor farm also has a "panic mode" -- if it detects that any machine has gone idle, it drops all other activities and feeds that machine more work. The processor farm also has a "backlog mode" -- if it detects that too many completed work items have accumulated for which the callback routine hasn't been called back, it goes ahead and processes these. The "panic" and "backlog" modes are primarily useful when processing small, rapidly completed work requests, where the work requests are completed as fast or faster than the local, "master" process can dispatch and collect back up. Additional discussion of the farm algorithms can be found below, although the interested reader is refered directly to the source code in "pfarm.c", which is heavily commented. Incidentally, the code is instrumented to gather statistics about response times, load distributions and queue lengths. Instrumentation can be disabled by undefining #define TRACE. Guide to the Remainder of this Document ======================================= In the remainder of this document, we provide additional details of the API, including full manual pages, installation and use guidelines, and a simple source code example (additional examples can be found in the may/examples directory). This is preceeded by a discussion of some of the strengths and limitations of network computing, and is followed by a discussion of some of the pitfalls of the current implementation, and a list of interesting potential enhancments that could be made. A bibliography, list of trademarks, and authors blurb can be found at the end. The reader is encouraged to skip directly to the section of greatest interest. The UDP/IP Library ================== MAY is implemented in both a UDP/IP and a TCP/IP form. The UDP routines are meant to used together, and are not meant to be mixed with the TCP routines. This limitation could potentially be removed by adding additional routines to MAY. UDP/IP, the User Datagram Protocol/Ineternet Protocol, has the following characteristics: -- UDP provides a more direct access to the underlying network therefore, it should run more efficently and with lower overhead. -- UDP messages are limited to 1K bytes in length. -- Delivery of UDP datagrams are not gaurenteed -- that is, they can be lost on the network -- discarded at busy nodes, discarded from overflowing device driver buffers. From pratical experience, the percentage of undelivered datagrams can range from very low to astronomical. -- UDP messages are not sequenced -- that is, there is no guarentee that they will arrive in the same order as sent. These characteristics are shared by the UDP MAY library. The following routines compromise the UDP library: -- mayInitUdp -- mayCreate -- mayGetParentUdp -- maySend -- mayReceive -- mayTerminate -- mayCreateFarm -- mayProcFarm Note that only two of the eight routines are unique to the UDP library; the architecture of the remaining routines is such that they apply to TCP MAY as well. The TCP/IP Library ================== TCP/IP has the following characteristics, which are shared by the TCP MAY library: -- TCP provides a reliable, connection-based, sequenced communications medium. Although TCP introduces greater overhead into path lengths and delay times, TCP does deploy sophisticated technology to ease network congestion. -- TCP messages are unlimited in length (although the current implementation of MAY limits messages to a 2048 byte length. This limit can be changed with a #define.) -- TCP messages are guarenteed to be delivered -- TCP messages are guarenteed to be sequenced -- that is, they will arrive in the same order as sent. The following routines implement TCP MAY: -- mayInitTcp -- mayCreate -- mayGetParentTcp -- maySend -- mayReceive -- mayTerminate -- mayCreateFarm -- mayProcFarm As before, use of the two routines mayInitTcp and mayGetParentTcp distinguish a TCP MAY network from a UDP MAY network. Security ======== MAY comes with a large and dangerous security hole built into it. If you do not understand this hole and how to manage it, do NOT use MAY. The author assumes no liability for any damages or loss in connection with the use of MAY. You are hereby forwarned, and assume all risks. Description of Security Hole ---------------------------- The MAY system allows programs to be started and run remotely on any system on which there is a running Maydaemon. In the current design of MAY, the programs are started with, and run with the same priveledges, access rights, and authority as the Maydaemon. Furthermore, in the current design, there is no attempt made to verify the authority or access rights of any remote requests presented to the maydaemon. The maydaemon will attempt to honour any request presented to it. What does this mean? If you run the Maydaemon with root priveledges, and a malicious or careless user asks the Maydaemon to run /bin/sh, and feeds the string "cd /; rm -r *" to it, then it WILL happen: every file in the file system will be erased. If the user requests that "/etc/shutdown" be run, then it will run, and your system WILL shutdown. Furthermore, it takes little or no brains to figure out how to do this, based on the MAY documentation & example programs. So -- be forewarned. Therefore, you do NOT want to run MAY with root priveledges in a hostile environment (such as computer systems accessible to undergraduates, or on machines not protected from the internet). Note that starting the maydaemon from a boot script (such as /etc/rc) or from the inetd (/etc/inetd) will automatically give the maydaemon root priveledges. What To Do About It ------------------- Create a separate user account for the maydaemon, making sure that it has no group priveledges that you wouldn't want a general user to have. This should provide sufficient security for most environments. The author beleives that running MAY in this fashion does not introduce any security holes that are not already present in your machine. Apologies --------- The author regrets this short-fall in today's era of security consiousness. The author welcomes any suggestions for how to improve security, and is interested in donations of code implementing security measures for MAY -- e.g. a Kerberos based MAY. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Notes: To be Done: (This documentation is incomplete, and the following items need to be addressed.) discussion of max length of packets for UDP discussions of network bandwidth discussion of load balancing compare to LINDA discussion of how farm workers should be implemented as polling loops. discussion of how select could be used to gether data from other pipes. discussion of blocking/non-blocking Point out that in current design, there is no way for TCP siblings to communicate, and what could be done to fix this. This means no cyclic dependency graphs. However, UDP sibling can send messages to one-another (provided they obtain the procper addresses of each other ...) This system does not support byte swapping. There is lack of shutdown mechanism in the farm. mayCreate hostname string could also be turned into a class name, so that a "dont care what host" creation a takes effect. Weird Bugs: seems like maySend appears to block in send() -- because TCP send() subroutine blocks if remote tcp queues are full !!?? This might not be a bug, but is normal tcp operation, resulting from TCP windowing & pacing mechanism!!?? Enhancement: maydaemon should mail off error messages and errno's back to the client whenever possible .... Discuss how to implement following: A machanism for transmitting pointers over the wire, by creating a wrapper for malloc (to keep track of relative offsets, & blocksizes) and a translating function that converts pointers from one address space to another. Discuss how to implement following: A mechanism to maintain synchronization of multiple copies of a "global" variable. How to avoid cross-network spin-locks. Talk about following: One problem with the design of BSD sockets is the lack of query to see how full the message buffers are. This prevents ability to implement FIFO pacing & load balancing without having to read & empty the queues. This, and extra copy is introduced into the system, which hurts performance in high-bandwdith/high-rate systems. Is this fixed with OSI or maybe some UNIX variants support this & I don't know it ? Discuss alternate proc farm algorithms: --- attempt to hold rate of message arrival constant Currently, the following have been implemented using MAY: a Mandelbrot Fractal explorer the radiosity algorithm -- (3D PhotoRealistic rendering) rayshade -- a popular, publically available ray tracer Goals ----- The MAY system is intended to serve as a simple, portable and practical API for developing distributed, parrallel algorithms. both as a research vehicle and as a practical API that Have BIGBIGBIG discussion of how may can represent a security hole and should not be used in hostile environments, and how it could be made secure (encrypt, kerberos, other). XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ftp Sites ========= Europe: ftp.ifi.unizh.ch (130.60.48.8) --- pub/may-2.8.tar.Z (CS Dept. University of Zurich, Switzerland -- courtesy of Silvano Maffeis) US: ??? Revision History ================ Version 1.00 -- May 1989 -- basic implementation of UDP-based MAY Version 1.50 -- April 1990 -- addition of TCP interface Version 2.00 -- May 1990 -- addition of processor farm interface Version 2.07 -- January 1993 -- bug fixes, improved doumentation Version 2.10 -- March 1993 -- bug fixes, improved doumentation Trademarks ========== UNIX is a tradmark of Unix Systems Laboratories. IBM is a trademark of International Business Machines. RISC System/6000 is a trademark of International Business Machines. RT-PC is a trademark of International Business Machines. AIX is a trademark of International Business Machines. Irix is a tradmark of Silicon Graphics Computer Systems. SparcStation 1 is a tradmark of Sun Microsystems. ULTRIX is a tradmark of Digital Equipment Corporation. Bibliography ============ [Bag84] Bagrodia, Rajive. "A Micro-kernel for Distributed Applications", XDepartment of Computer Sciences, The University of Texas at Austin. June 1983. [Cha81] Chandy, K. M. and J. Misra. "Proofs of Network Processes", IEEE Transactions on Software Engineering, Vol. 7, No. 4, July 1981. [Hen87] Henson, Larry William. "U-MAY, A Distributed Programming Interface", Master's Thesis, University of Texas at Austin, December 1987. [Hol85] Holck, Timothy M. "A Disributed Programming Language Implementation for Networks of Workstations", Master's Thesis, The University of Texas at Austin, December 1985. The Author ========== This implementation of MAY was developed by Linas Vepstas during 1989-1990. Linas Vepstas holds a PhD in theoretical physics from SUNY at Stony Brook, and is currently employed by IBM Advanced Workstations and Systems Division, developing graphics architectures. Linas' other areas of active interest include chaotic systems and artifical life. Linas can be currently reached at: 1518 Enfield Road Austin, Texas 78703-3424 (Home) 1-(512)-499-8246 (Work) 1-(512)-838-1116 linas@innerdoor.austin.ibm.com linas@austin.ibm.com ~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@~@~%_@