MAY Version 2.10 ================ A Distributed Processing Package for Easing the Developemnt of Algorithms with Coarse-Grained Parallelism --------------------------------------------------------- Linas Vepstas, 1993 ------------------- Programmer's Reference ====================== mayCreate ========= Starts a remote process Syntax: ------- struct mayProcId * mayCreate (requested_host_name, requested_program_name requested_connection_type); char *requested_host_name; char *requested_program_name; int requested_connection_type; Arguments: ---------- -- requested_host_name pointer to a NULL terminated character string. It can be a NULL pointer; this indicates that the process is to be started on the local machine. The host name is assumed to be a valid internet hostname. Standard name resolution techniques are used to locate the host (gethostbyname(), which queries nameservers, examines /etc/hosts, etc.) -- requested_program_name pointer to a NULL terminated character string. This must be a valid, full file pathname to the program to be executed, including possible arguments and flags. For example: /u/johniac/myprog -ac or /bin/rm -r *. (execv() is used to run remote program.) -- requested_connection_type takes one of two values: MAY_CREATE_TCP or MAY_CREATE_UDP. Depending on which value is passed, the child proccess is created with the indicated connection type. Returns: -------- Upon return, mayCreate returns a pointer to the "address" of the child process. This address can be used with the maySend command to send messages to the child. The memory area for this address is allocated by MAY and is unique (you do not need to copy it to use it). You should not modify its contents or free it. Description: ------------ The mayCreate routine starts the indicated process on the indicated machine. A maydaemon must be running on the indicated machine. The file path name must indicate a valid file in the remote machine's file system. This file must have execute priviledges (chmod ugo+x ). If a maydaemon on the remote machine is not responding, mayCreate will return with NULL after a twenty second timeout. Error Returns: -------------- NULL is returned if the create failed. An error message is printed on stderr. Ideas for Future Enhancements: ------------------------------ 1) In the current implementation, mayCreate() does not handle dotted-decimal notation IP addresses. Such a capability should be added in case a network nameserver is unable to located a named machine. 2) Error messages are currently printed to stderr. A more useful and elegant error handling policy would use the errno mechanism provided by ANSI C (or a similar service), so that users of MAY could design more robust error recovery proceedures. 3) Currently, mayCreate() only offers a synchronous creation service. That is, mayCreate does not return until the indicated program has started on the remote machine, and an acknowledgement has been returned. This can introduce excessive overhead into the system if multiple mayCreates need to be performed. An asynchronous interface could be implemented by providing mayCreateRequest() and mayCreateReply() routines. This could, in fact, be done quite esily by taking the existing mayCreate() code and splitting it up into two pieces. mayCreate() itself could then be trivially implemented as a mayCreateRequest() followed by a mayCreateReply(). 4) Find, document, and provide an interface that would allow the creation timeout to be set by a MAY user. mayCreateFarm ============= Initializes a processor farm Syntax: ------- void mayCreateFarm (num_machines, machine_names, file_names, connection_type) int num_machines; /* number of available processors */ char *machine_names[]; /* array containing mamachine names */ char *file_names[]; /* array containing file paths */ int connection_type; /* MAY_CREATE_UDP or MAY_CREATE_TCP */ Arguments: ---------- -- num_machines The number of machine and file names in the arrays machine_names, file_names. -- machine_names a pointer to an array of strings naming the machines on which processes are to be started. Machine names must be valid internet machine names. -- file_names a pointer to an array of strings specifying full file pathnames on the indicated machines. The files must be executable. The strings may include command-line flags. -- connection_type takes one of two values: MAY_CREATE_TCP or MAY_CREATE_UDP. Depending on which value is passed, the child proccesses are created with the indicated connection type. Description: ------------ This routine performs initialization for the mayProcFarm routine. Error Returns: -------------- None. Failure to connect to remote daemons is indicated by error messages printed to stderr. Internal Usage: --------------- Although this routine is intended for use with the mayProcFarm routine, it can be used as a convenience for performing multiple mayCreates. This usage is discouraged for the casual user, but can be useful for anyone intending to enhance the MAY system. A linked list ring of the resultant creates are hung off of the global variable current_farm. This variable points to the following structures: extern struct mayFarm *current_farm; struct mayChild { struct mayChild *next; /* pointer to next child */ char *program_name; /* name of child program, incl. arguments */ struct mayProcId *rpaddr; /* network address of child */ int alive; /* TRUE if machine is responding */ }; struct mayFarm { int num_workers; /* number of workers in the ring */ int num_active_workers; /* number of active workers in the ring */ struct mayChild *workers; /* pointer to ring */ }; Ideas for Future Enhancements: ------------------------------ 1) The current implementation of mayCreateFarm() is written on top of mayCreate(). Since mayCreate() does not return until either an acknowledgement of the remote create has been received, or a timeout has occured, a considerable amount of time can be lost in this routine. In particular, timeouts are very expensive. Rather than performing serial creates, a parallel version could be developed that would first send out a whole series of create requests, and then gather up the create replies. Such a change would have to understand and use the internal CreateRequest and CreateReply messages. Perhaps the most elegant way of implementing this would be to add a mayCreateRequest() and a mayCreateReply() routine to the MAY interface, and build mayCreateFarm on top of these two routines. Thus, all users could obtain the benefits of an asynchronous create mechanism. 2) the num_machines argument is not really required. It could be eliminated by having mayCreateFarm() look for a NULL pointer as the last entry in the array machine_names. mayProcFarm =========== Dispatch work requests to remote processes and gather up completed work Syntax: ------- int mayProcFarm (work, num_pieces, piecesize, stride, callback) char * work; /* pointer to array of pieces to be done */ int num_pieces; /* number of pieces to be done */ int piecesize; /* size (in bytes) of each piece */ int stride; /* spacing (in bytes) between subsequent pieces */ int ((*callback)()); /* routine called back every time a piece is done */ Arguments: ---------- -- work pointer to an array of work items to be distributed between processors participating in the processor farm. These will be sent as messages to each member of the farm. -- num_pieces The number of work pieces in the array. -- piecesize The size of each piece of work, in bytes. -- stride The spacing, in bytes, between pieces of work. This allows work pieces ot be embedded in a larger array; thus stride need not be equal to piecesize. -- callback Pointer to a subroutine that will be called back as each pice of work is completed. This routine will be called back with a pointer to the returned data as an argument. Description: ----------- The mayProcFarm() routine implements a processor farm for distributing work for parallel execution across multiple processors. A "work item" consists of two parts: a "work request", which is a binary message that will be transmitted to remote worker processes, and a "work reply", which is the reply message that a worker sends back upon completion of a work request. This routine accepts a pointer to an array of work requests, and automatically dispatches these requests to remote workers. It also automatically gathers up replies, calling a callback routine, allowing each reply to be processed. This routine attempts to minimize the actual, elapsed time to finish processing all work requests. It does so by implementing several algorithms for load balancing, for fault-tolerant distribution of the work, and a fast-feed mode for exceptionally fast processors (or exceptionally small jobs). These algorithms assume that work items can be processed in arbitrary order, that work items are roughly equal in size (within an order of magnitude, or so), and that remote workers are roughly equivalent in turn-around time (to within an order of magnitude, or so). In the attempt to minimize the overall elapsed time, the same work request may be dispatched to several machines, psossibly resulting in work duplication. However, the callback routine will be called only once once for each work item (duplicate work replies will be discarded). These restrictions on equivalence of work items and speed of workers can be loosened when more than several thousand work items are to be processed. This routine assumes that "worker" processes have already been started on remote machines, and that these are waiting for work items to be processed. Use the mayCreateFarm() routine to start up remore workers. This routine runs most efficiently when there are at least five times as many work requests as there are workers. Efficiency also increases as the number of cpu cycles needed to process each work item outweigh the overhead of dispatching/transmitting/receiving work items. Currently, it is suggested that work items take at least one-hundred thousand CPU cycles to process, in order to minimze network overhead. Efficiency is also increased by keeping the ratio of message lengths to work-item processing times low. It can easily happen that a large collection of fast workers can outpace the ability of the master process to collect and process completed work. Care should be taken to minimize the amount of time spent in the callback routine when large farms are being implemented. The mayProcFarm() routine itself introduces relatively small amounts of overhead compared to the inherent overhead of maySend(), mayReceive(), and the underlying IP support. Information on the algorithms employed by this routine can be found elsewhere in this document, and the algotihms themselves are detailed at length in the source code. When using the UDP-based processor farm, work request and work reply message lengths are limited by the underlying transmit/receive mechanism. Please refer to the maySend() and mayReceive() routines to determine the largest supported UDP message length. There are not message length limitations for a TCP-based farm. Ideas for Future Enhancements: ------------------------------ 1) In the current implementation, mayProcFarm() uses mayReceive() to gather back up completed work. mayReceive() returns messages in the order in which they were actually received. Since farm FIFO lengths are buried in the reply messages, this leads to somewhat stale FIFO data being gathered, and thus could lead to idle machines if the work dispatches are extremely small. Providing an alternative mayReceive that returned the last message first would help keep FIFO data most current. However, care would then be required so that earlier messages containing earlier FIFO data would not over-write the current FIFO data with stale data. This could be implemented by adding time-stamp info to the internal reply. Ideas for Future Research: -------------------------- 1) Characterize the network overhead of the processor farm. In particular, measure the number of cycles spent in TCP/UDP/IP code, the number of cycles spent in maySend() and mayReceive(), and the overhead introduced by mayProcFarm(). From this data, construct a theoretical model of the processor farm. Using this model, predict how well the dispatch mechanism responds to large variations in work size and worker speed. Compare theoretical predictions to measured data. Construct a chart showing the region of near-linear speedup for this system. Discuss methods and algorithms that can be used to extend the region of near-linear speedup. mayGetParentTcp =============== Returns the address of the parent Syntax: ------- struct mayProcId * mayGetParentTcp (); Description: ------------ This call returns the address of the parent that spawned this child. This routine should only be called by child processes to find thier parents. The returned address can be used with maySend() to send messages to the parent. This routine should only be used by children whose parent created them with a TCP connection -- i.e. by calling mayCreate(,,MAY_CREATE_TCP) This routine mallocs the space for the returned structure. The returned structure should not fiddled with or freed. Error Returns: -------------- NULL if error occured. Error text printed to stderr. mayGetParentUdp =============== This call returns the address of the parent. This address can be used in the maySend command to send messages to the parent. Syntax: ------- struct mayProcId * mayGetParentUdp (); Description: ------------ This call returns the address of the parent that spawned this child. This routine should only be called by child processes to find thier parents. The returned address can be used with maySend() to send messages to the parent. This routine should only be used by children whose parent created them with a UDP connection -- i.e. by calling mayCreate(,,MAY_CREATE_UDP) This routine mallocs the space for the returned structure. The returned structure should not fiddled with or freed. Error Returns: -------------- NULL if error occured. Error text printed to stderr. mayInitTcp ========== Initialization for a TCP system. Must be called before any other MAY routines. Syntax: ------- void mayInitTcp () Description: ------------ Initializes a process for use in a TCP based communications system. Error Returns: -------------- None. mayInitUdp ========== Initialization for a UDP system. Must be called before any other MAY routines. Syntax: ------- struct mayProcId * mayInitUdp (); Description: ------------ Opens a communications port for this process. Returns a pointer to the "address" of this process; any processes wishing to send mail to this process must send them to this address. This routine mallocs the space for the returned structure. The returned structure should not fiddled with or freed. Error Returns: -------------- NULL if error occured. Error text printed to stderr. mayReceive ========== This routine allows a user to receive a message. Syntax: ------- int mayReceive (sprocid, messtype, subtype, timeout, header, messlen); struct mayProcId *sprocid; long messtype; short subtype; int timeout; struct msghead *header; int *messlen; Arguments: ---------- -- sprocid pointer to structure identifying connection from which to receive message. If NULL is specified, all connections are checked for messages; the first message with a matching messtype and subtype is returned. -- messtype indicates the type of message to search for. If type 0 (zero) is specified, the first message with a matching sprocid and subtype is returned. -- subtype indicates the subtype of message to search for. If subtype 0 (zero) is specified, the first message with a matching sprocid and messtype is returned. -- timeout The maximum number of seconds to wait for a message to arrive, if a message matching the sprocid, messtype and subtype is not already present. -- header Must point to a struct msghead structure. This structure will be filled out with values to indicate the type of message that was actually returned (if any). -- messlen pointer to an int that specifies the size (in bytes) of the data area into which the returned message will be copied. Upon return, this value will be changed to the actual length (in bytes) of the received message. Description: ------------ This routine returns received messages. An attempt is made to find a message that matches the indicated sprocid, messtype and subtype. If any of these fields are NULL or zero, then the respective fields are not used to determine a match. If no matching message is found, then this routine will block (not return), and wait up to a timeout period of seconds to match the message. (See also "Known Bugs" below). If no matching message is received within the timeout peroid, this routine will return with a value of MAY_TIMED_OUT (see "may.h"). If timeout is set to zero, mayReceive() will return immediately either with a message (if there was one) or with MAY_TIMED_OUT. The structure pointed at by *header will be filled out with info about the message. The pointer *header must point at the following structure (see "may.h"): struct msghead { struct mayProcId sproc_id; /* sending process id */ long msgtype; short subtype; long msglen; void *msgptr; long queue_len; }; Note that header->msgptr must point at a data area sufficiently large to contain the received message. The received message will be copied into the data area pointed at by header->msgptr. mayReceive() does NOT allocate this area -- you must do so. You can make this pointer point at any structure or array desired; you can reuse this are from message to message, if desired. The size of the available area must be pointed at by the argument messlen. If this area is overflowed, the message will be truncated, and the truncated portions discarded. The value of messlen will be unchanged, and a value of MAY_TRUNCATED_MSG will be returned to indicate that truncation has occured. The queue_len field will be filled out with the number of cached (pending, unread) messages that the sender process has. This field is useful for implementing FIFO pacing mechanisms. Notes: ------ For the IBM RT, it is suggested that the receiving process be compilesd with the -a option, esp. if many and/or large messages are to be received. UDP messages limited to xxx bytes ..... where xxx is system dependent, but generally less than 1500 bytes. Returns: -------- If no error occured, then the number of pending, unreceived messages (of all types, from all sources) is returned. If the returned value is negative, then one of the following errors has occured: MAY_GENERAL_ERROR unidentified error condition MAY_TRUNCATED_MSG incomplete message recv'd MAY_NO_MEM out of memory error MAY_CLOSE_CONNECTION socket died MAY_TIMED_OUT timeout expired MAY_NO_OPEN_SOCKS all sockets are dead In certain cases, an error message will be printed to stderr. Known Bugs: ----------- In the current implementation, if a message arrives during the timout period, but it does not match the indicated sprocid, messtype and subtype, then the timout period will be reset. Thus, a continuous stream of incoming messages that fail to match the desired fields can cause this routine to wait for an arbitrarily long period of time, and give the appearence of being "hung". Ideas for Future Enhancements: ------------------------------ 1) In the current implementation, mayReceive() performs numerous mallocs of several different sizes of data structures. This could lead to severe memory fragmentation problems. Memory fragmentation and performance could be significantly improved by modifying MAY to use bulk-malloc techniques. 2) In the current implementation, mayReceive() stores pending messages internally in a long linked list. Search time for stored messages could be improved significantly by converting this to a hash table. However, for users who are specifying message ID's of NULL, it would be freindlier if messages were still returned in the order in which they were received. This could be preserved by time stamping messages in the order in which they were received. 3) In some cases, it could be useful to a user of MAY to know when a received message was transmitted (rather than when it was received). In particular, such information could be useful for alternate designs of the processor farm. Thus, time-stamp data could be added to the internal message headers. A simple routine should be provided to pull the time-stamp data out of the message headers. 4) Fix the bug indicated above. This can be done by fetching the system time when the first attempt to timout is made, and fetching it again after the select() routine returns. If the elapsed time is greater than the timeout peroid, then a timeout error should be returned. 5) Design another routine, similar to mayReceive(), that does not require the message header and message body to have been previously allocated by the user. Such a routine is interesting because it could save the CPU cycles/overhead of copying the received data from internal structures into the user sturctures. maySend ======= Sends a message to an address. Syntax: ------- int maySend (process_id, message_type, subtype, message, messlen); struct mayProcId *process_id; long message_type; short subtype; void *message; int messlen; Arguments: ---------- -- process_id location to which a message is to be sent. -- message_type The sent message will be identified as being of this type. -- subtype The sent message will be identified as being of this subtype. -- message pointer to message to be sent. -- messlen length of message to be sent, in bytes. Description: ------------ This routine sends a message of the indicated type and subtype to the indicated process. The message_type and subtype can be set to arbitrary values; they provide a means for the user to help sort out his messages. Note that message_type of zero is reserved. NULL messages can be sent. (at the receiving end, a header is received, and the message field is set to NULL). Currently, UDP messages are limited to be 1800 or fewer bytes long. (This depends on the size of udp datagram packets and is a limitation that aught eventually be removed). There is no limit for TCP messages. Like the udp protocol, messages are not gaurenteed to be reliably delivered or to be non-duplicated. This may be eventually fixed. This routine returns the length of the message actually sent. Error Returns: -------------- This routine returns a negative number if an error has occured. The following values are returned: MAY_GENERAL_ERROR -- error has occured in writing to socket. This error usually occurs when the remote end has disconnected. MAY_TRUNCATED_MSG -- sent message was truncated in the sending. Ideas for Future Enhancements: ------------------------------ 1) In the current implementation, messages are treated as unstructured byte streams. A more useful messaging interface could be provided by supporting an XDR (or similar) type interface and protocol. (XDR is the interface provided to RPC's.) 2) If the process_id is NULL, it is assumed that the message is intended for the parent of this process. (not implemented). mayTerminate ============ Terminates a MAY session Syntax: ------- void mayTerminate (); Description: ------------ This function deallocates internally used storage and closes all connections. In the current implementation, an attempt is made to free most memory, but this routine has not been thoroughly tested/debugged. -------------------------- END OF FILE ------------------------------