[H-GEN] Managing multiple child processes with perl

ben.carlyle at invensys.com ben.carlyle at invensys.com
Mon May 13 03:12:56 EDT 2002


[ Humbug *General* list - semi-serious discussions about Humbug and     ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]

Michael,

We had a similar problem at my place of employment which was solved using 
a fairly simple yet usally quite reasonable algorithm. We used a simple 
ksh script that took a list of jobs then broke the list into n lists where 
n is the number of cpus. Assuming each job takes a resonably consistent 
time to complete each cpu works as hard as any other cpu and load 
balancing is achieved. After spawning off the sub-shells the main shell 
simply waits for all children to return (ie, wait()) and does not 
contribute to the actual calculation at all.

If instead of giving the subshells one job to do and trying to work out 
which one will return first so that you can give it the next job you give 
each one a list of jobs and let them all run at their own pace you might 
be able to simplify the problem.

This idea only works, however if the jobs are reasonably consistently 
sized (timewise). If you have a couple of big jobs and a couple of smaller 
jobs then you may have to think more carefully about how you want to break 
them up across processors. We actually ran into this problem as well (the 
jobs in our case were CM databases who's sizes varied depending on the 
kind of data they stored and the rate of change they were managing). We 
started the "big" stream alongside the "little" stream and allocated the 
jobs so that it sort-of came out evenly :)

If you want to do the (slightly) more complicated feat of assigning jobs 
to "worker" processes whenever they finish the last job then I can think 
of a couple of other ways to go about it:

1) Use SIGCHILD to your advantage.
        I don't know whether perl has facilities to catch and notify you 
of sigchild events, but I strongly suspect that it does (hell, sh does ;). 
You may be able to write a handler to catch this signal to see individual 
child deaths regardless of which child dies first. If you get lost in the 
perl of this you could always fall back to C... though with the various 
meanings of sigchild over various platforms and the quirkyness of signals 
generally you may wish to steer clear of this option.
2) Use a form of Inter-Process Communciation to learn that a child is done 
and assign it another job without killing it.
        You could just something as simple as the creation of a file, or 
pipes, or TCP/IP.... :) I don't know any perl, but I'm sure you'll find 
modules to manage all of these forms of communication and more on CPAN or 
more likely as part of your default install.

Hope this gives you some ideas... :)

Benjamin.





Michael Anthon <michael at anthon.net>
Sent by: Majordomo <majordom at caliburn.humbug.org.au>
13/05/02 16:23
Please respond to general

 
        To:     general at lists.humbug.org.au
        cc: 
        Subject:        [H-GEN] Managing multiple child processes with perl


I have a desire to speed up my nightly backup process on our main database
server.  The process at the moment goes something like this (it's a perl
script that I wrote)

[...snip...]

This means that I am gzipping all 30G or so twice, which seems terribly
inefficient to me.  So.. to get to my questions.  I am wanting to find a 
way
to run the gzip processes in parallel (it's a dual CPU E250 running
solaris).  The only possible way to do this that I can see is to fork the
gzip processes, getting the PID of each one as it starts, then just watch
for those processes to not be there any more (using ps or something). This
seems a little... icky and I was hoping someone could advise me on a 
better
way to manage this.  If I can get the gzip processes running in parallel
then it should shorten the time taken for the compression stage enough 
that
I can do the compression only once, then use tar to write the compressed
files to tape.


--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'.  See http://www.humbug.org.au/



More information about the General mailing list