MAY Version 2.10 ================ A Distributed Processing Package for Easing the Developemnt of Algorithms with Coarse-Grained Parallelism --------------------------------------------------------- Linas Vepstas, 1993 ------------------- Steps To Installing MAY ======================= 1) Find a home directory for MAY. This directory needs approx 1 MB of space for the source, examples, documenation, libraries and binaries. Since executables will have to be transmitted to remote machines, it is usually most convenient to do so by means of a network-capable file system. Thus, the home directory should be conveniently located on a network-mountable mount point. 2) Edit the Makefiles and uncomment the appropriate stanzas for your machine. Be sure to indicate the home directory path correctly. Make. 3) Edit /etc/services to add the maydaemon service port number. (See below for details). You will need root priveledges to do this. 4) Start the maydaemon. No command line flags are needed (or supported). 5) cd to examples/primer, and explore that as you read the example program below. 6) Have Fun! An Example Program ================== Below we list out an example program that provides the most basic illustration on how to use MAY. First, lets take a look at the "slave program". It does six things: 1) initializes MAY 2) Sends a "greeting" back to its "parent" (or "master") 3) Waits to receive a collection of floating point numbers, 4) Upon receipt, it sums these numbers together, 5) Transmits a reply containing the answers back to the parent, 6) Termintes. may/examples/primer/slave.c =========================== #include #include "may.h" main () { struct mayProcId *my_addr, *rents_addr; struct msghead getit; int len; struct quiz { int ia; int ib; int ians; float a; float b; float sum; double da; double db; double dasum; } quizling; my_addr = mayInit (); rents_addr = mayGetParent (); maySend (rents_addr, 685, 1, "greetings from Linas", sizeof ("greetings from Linas")); /* below, we recieve message type 43, which is the "quiz" structure */ getit.msgptr = (char *) &quizling; len = sizeof (struct quiz); mayReceive (NULL, 43, 0, 30, &getit, &len); /* perform some CPU intensive calculations */ quizling.ians = quizling.ia + quizling.ib; quizling.sum = sqrt (quizling.a); quizling.dasum = 2.0 * acos (0.0); /* send the results of the calculations back */ len = sizeof (struct quiz); maySend (rents_addr, 43, 22, (char *) &quizling, len); mayTerminate (); } The Parent ========== Except for the contortions one needs to go through with C to get stuff packed into structures just right, the "child" (or "slave") program should be easy to understand. Now, lets take a look at the "master", which controls and makes use of the "slave". It goes through the following steps: 1) Initializes the MAY subsystem 2) Starts up a local copy of the "slave" 3) Waits to receive a greeting from the local "slave". 4) Starts up two different remote "slaves" 5) Sends work to first remote slave 6) Waits to receive completed work from first slave 7) Receives greeting message from first slave 8) Sends work to second remote slave 9) Waits to receive completed work from second slave 10) Receives greeting message from second slave 11) Terminates As with the slave, the C code looks messy because C has no native support to construct network-transparent structures. may/examples/primer/master.c ============================ #include "may.h" main () { /* these are the addresses of the processes we will send to */ struct mayProcId *my_addr, *childs_addr, *ibmrt_addr, *sgipi_addr; /* this is the message header (required) */ struct msghead getit; char buff [200]; int len; /* quiz is one of the messages we will be sending and receiving */ struct quiz { int ia; int ib; int ians; float a; float b; float sum; double da; double db; double dasum; } quizling; my_addr = mayInit (); /* first, we create a process on this machine, and wait for it */ /* to send us something (which we print) */ childs_addr = mayCreate (NULL, "/src/linas/may/demos/slave"); len = sizeof (buff); getit.msgptr = buff; mayReceive (childs_addr, 1, 0, 3, &getit, &len); printf (" This is what I heard: %s \n", buff); /* Now, we create processes on two remote machines, one being an */ /* IBM RT and the other an SGI Personal Iris. */ ibmrt_addr = mayCreate ("thehill", "/u/linas/src/may/demos/slave"); sgipi_addr = mayCreate ("eldorado", "/usr/people/linas/may/demos/slave"); /* Here, we build the message that we are going to send */ quizling.ia = 5; quizling.ib = 6; quizling.a = 4.0; quizling.b = 3.0; quizling.da = -2.718281828e-8; quizling.db = 3.14159265358979; len = sizeof (struct quiz); maySend (ibmrt_addr, 43, 22, (char *) &quizling, len); /* Here, we receive two messages -- one being the structure "quiz" */ /* which we (arbitrarily) call message type 43, and another */ /* message, which is an ASCII string (message type 685) */ getit.msgptr = (char *) &quizling; len = sizeof (struct quiz); mayReceive (ibmrt_addr, 43, 0, 3, &getit, &len); len = sizeof (buff); getit.msgptr = buff; mayReceive (ibmrt_addr, 685, 0, 3, &getit, &len); printf (" This is what I heard: %s \n", buff); printf (" I heard that %d + %d = %d \n", quizling.ia, quizling.ib, quizling.ians ); printf (" and I heard sqrt (%f) = %f \n", quizling.a, quizling.sum); printf (" and I that %.18g = %.18g \n", quizling.db, quizling.dasum); /* Here, we build and send and receive basically the same stuff as */ /* above, but to a different machine (the Personal Iris) */ quizling.ia = 4; quizling.ib = 3; quizling.a = 2.0; quizling.b = 3.0; quizling.da = -2.718281828e-8; quizling.db = 3.14159265358979; len = sizeof (struct quiz); maySend (sgipi_addr, 43, 22, (char *) &quizling, len); getit.msgptr = (char *) &quizling; len = sizeof (struct quiz); mayReceive (sgipi_addr, 43, 0, 3, &getit, &len); len = sizeof (buff); getit.msgptr = buff; mayReceive (sgipi_addr, 685, 0, 3, &getit, &len); printf (" This is what I heard: %s \n", buff); printf (" I heard that %d + %d = %d \n", quizling.ia, quizling.ib, quizling.ians ); printf (" and I heard sqrt (%f) = %f \n", quizling.a, quizling.sum); printf (" and I that %.18g = %.18g \n", quizling.db, quizling.dasum); mayTerminate (); } Programming Hints ----------------- When trying to transmit structures containing double precision data as above (and sometimes floating, long int, int, and even short) between machines of different architectures there is one bug that will likely jump out and bite you. This is call :hp1.word alignment:ehp1.. Some machines require that double precision numbers start on word boundaries, others that they start on even numbered addresses, and so on. Thus, the same structure, when compiled by different compilers on different machines, may end up being of different sizes-- and thus, when shipping around these structures with MAY, as above, one may find that the elements of the structure are misaligned. This problem is easy to spot: you transmit good data, and you receive seeming garbage. Chances are, its misalignment. Fix this by adding padding bytes, as appropriate, or by rearranging the structure. For example, the following is likely to cause problems: struct quiz { int ia; int ib; int ians; double da; double db; double dasum; } quizling; This is easily fixed by adding padding: struct quiz { int ia; int ib; int ians; int padding; double da; double db; double dasum; } quizling; These precautions do not need to be taken when communicating between like machines. Unaddressed problems: Big Endian Little Endian More Hints ---------- filling up the receive buffer and purging it -- how to Common Errors: ============== But This Piece of Junk Doesn't Work! ==================================== If the example programs don't work out of the box, check for these common problems: --) Cryptic message about being unable to connect to remote host -- 1) Is the daemon running? If not, see section below on how to start it. 2) Are port numbers in /etc/services consistent for all machines? --) Daemon prints error msg: maydaemon error: getservbyname: Bad file number Ha! You forgot to add entries to your /etc/services file! See below for details. --) Daemon prints error message: maydaemon error: getservbyname: Permission denied maydaemon error: getservbyname: Invalid argument Ha! Your site is probably running YP, and you failed to get YP to understand that changes to /etc/services have occured. See below for details. --) Things seem to work, but lots of error messages get printed. MAY is very verbose with regard to warnings and errors. If you think everything is working OK, but error messages get printed anyway, chances are its because some remote machine is not in the state you think its in. Typically, some remote process has exited prematurely, and the local process is attempting to write to it. --) primer/slave.tcp program prints following error message: > Hello, world! > mayGetId: error: unable to open file /tmp/may17230 > : A file or directory in the path name does not exist. > mayGetParentTcp: error on get socket id: A file or directory in the path > name does not exist. > maySend error: sendto: A file descriptor does not refer to an open file. Ha! You didn't read the instructions! You should not be trying to start the slave program up by hand. It will be started automatically for you, by the daemon, at the request of the master. Just run the master program; everything else should run fine. --) primer/master.tcp program prints the follwoing error message: > parent about to remote maycreate > mayCreate: Error connecting to daemon stream socket on host voodoo: > : A remote host refused an attempted connect operation. > mayCreate: Error connecting to daemon stream socket on host eldorado: > : A route to the remote host is not available. > parent about to exit Ha! You didn't read the instructions! You need to have the mayd running on the remote host "voodoo" and "eldorado". Also, be sure you have put a copy of slave.tcp where the daemon can find it. If you don't understand how to do this, you should take some time and read the source code for master.tcp, and try to understand what it does. If you don't/can't do this, you shouldn't be using MAY. --) The maydaemon prints the follwoing error message: > maydaemon: maySpawnTcp: error: execv: No such file or directory Ha! Now we are making progress! This one is simple! The maydaemon is trying to start the slave (child) program automatically for you. But ... it can't find the slave! Where is it looking for the slave? Its looking where the master (parent) told it to look with the mayCreate routine: mayCreate(machine, path) The pathname "path" has to be the filename of the slave on the machine named "machine" (and, of course, a mayd must be running on the machine "machine"). The maydaemon ============= The daemon receives messages from the mayCreate routine to start up a user process. The daemon does some house-keeping at that time, but does not get involved in the communications between the creating and created process. What the mayDaemon does not do (and maybe should do): -- Clean up -- if you crash your process and fail to notify your children, they will blissfully keep on executing (or waiting for input, or whatever) on any machines you may have started them on. Other users will get mad at you from glomming up the process table. Therefore, user beware: write child processes so that they look for the MAY_CLOSED_CONNECTION error returned from mayReceive(), and exit when a connection has shut down. Better yet, make sure they exit after some period of inactivity has passed. This is one of the most serious drawbacks MAY has. We're working on it. In the meanwhile, the user should use the Timeout option with mayReceive, and write code that automatically terminates whenever a message has not been received in a reasonable amount of time. New and Improved ---------------- Ping function, internally will ping other machines for MAY hosts based on /etc/hosts -- which should not be too big. A way to designate a "stateless" child. Need a set max buffer and a purge buffer call. Advanced Programming Hints -------------------------- To write good programs that make use of a large number of processors, distribute work efficiently among them, take into account the fact that some of the remote machines are down, or that messages are getting lost, is not trivial. Indeed, people still write PhD theses on such topics. A more advanced programming example, which incorporates a simple algorithm for load balancing, and shows how to be fault tolerant with respect to message loss, is available from the authors upon request. (Its is the Mandelbrot explorer -- which is based on the fact that the Mandelbrot set takes a large number of cycles to generate, and is also easily partitioned into a large number of repetative tasks which are independent of one-another and can be performed in any order.) Installing the Maydaemon ======================== To be able to use MAY, you must have the maydaemon running on every machine on which you intend to use MAY. MAY uses a "well-known port number" to query and establish connections between local processes and the remote daemon. This port number is obtained by querying the /etc/services file. Two lines must be added to this file: maydaemon 3856/udp # MAY distributed processing maydaemon 3856/tcp # MAY distributed processing You need system administration priveleges to modify this file. You should be able to easily convince you system administrator that this is a harmless modification. Security problems arise only if you attempt to run the daemon with root priveledges. Modifying this file does NOT give the maydaemon root privledges. If your location runs YellowPages (YP), then you have to make sure that YP knows about these new service port numbers. After updating /etc/services, you should cd /var/yp; and make. /var/yp should have a Makefile in it that will get YP to recognize the new port numbers. Some machines may have /var/yp in a different location; consult with your system administrator if you can't find this. (When YP is running, the system routine getservbyname() no longer goes to /etc/services to get port numbers. Thus, a hallmark of failing to update the YP tables are cryptic messages involving "getservbyname: Permission denied" or some-such). To automatically start up the maydaemon on system reboot, edit your systems /etc/rc and/or /etc/rc.tcpip files appropriately. Remeber to put an & at the end of the line. Alternately, you may wish to have the maydaemon lie dormant until needed; this can be done by having it started by the inetd daemon. This can be done by appropriately modifying the /etc/inetd file. Either way of starting the maydaemon will give it (and thus, MAY child processes) root priveledges. System Tuning Considerations ---------------------------- If you are really transmitting lots of messages (thousands) in a breif period of time (seconds), you may discover that you are jamming up certain machines with network traffic. Although one should always write algorithms that generate the minimum amount possible of network traffic, some amount is non-the-less unavoidable. Thus, you may want to increase your buffer pool size. In AIX v.2.2.1 you do this with the "devices" command (or you can edit /etc/ddi/token by hand-- just make SURE you edit the right stanza). Go to the token0 stanza and set * Number of buffers in buffer pool nobibp 1000 (or more) * Number of buffers on device ring nobodr 150 (or more) * Number of buffers on SLIH (Second Level Interrupt Handler) ring norbosr 150 (or more) * SLIH ring buffer threshold srbt 60 (or more) Security ======== MAY comes with a large and dangerous security hole built into it. If you do not understand this hole and how to manage it, do NOT use MAY. The author assumes no liability for any damages or loss in connection with the use of MAY. You are hereby forwarned, and assume all risks. Description of Security Hole ---------------------------- The MAY system allows programs to be started and run remotely on any system on which there is a running Maydaemon. In the current design of MAY, the programs are started with, and run with the same priveledges, access rights, and authority as the Maydaemon. Furthermore, in the current design, there is no attempt made to verify the authority or access rights of any remote requests presented to the maydaemon. The maydaemon will attempt to honour any request presented to it. What does this mean? If you run the Maydaemon with root priveledges, and a malicious or careless user asks the Maydaemon to run /bin/sh, and feeds the string "cd /; rm -r *" to it, then it WILL happen: every file in the file system will be erased. If the user requests that "/etc/shutdown" be run, then it will run, and your system WILL shutdown. Furthermore, it takes little or no brains to figure out how to do this, based on the MAY documentation & example programs. So -- be forewarned. Therefore, you do NOT want to run MAY with root priveledges in a hostile environment (such as computer systems accessible to undergraduates, or on machines not protected from the internet). Note that starting the maydaemon from a boot script (such as /etc/rc) or from the inetd (/etc/inetd) will automatically give the maydaemon root priveledges. What To Do About It ------------------- Create a separate user account for the maydaemon, making sure that it has no group priveledges that you wouldn't want a general user to have. This should provide sufficient security for most environments. The author beleives that running MAY in this fashion does not introduce any security holes that are not already present in your machine. Apologies --------- The author regrets this short-fall in today's era of security consiousness. The author welcomes any suggestions for how to improve security, and is interested in donations of code implementing security measures for MAY -- e.g. a Kerberos based MAY. ----------------------- END OF FILE ---------------------------