[H-GEN] Managing multiple child processes with perl
ben.carlyle at invensys.com
ben.carlyle at invensys.com
Mon May 13 03:12:56 EDT 2002
[ Humbug *General* list - semi-serious discussions about Humbug and ]
[ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
Michael,
We had a similar problem at my place of employment which was solved using
a fairly simple yet usally quite reasonable algorithm. We used a simple
ksh script that took a list of jobs then broke the list into n lists where
n is the number of cpus. Assuming each job takes a resonably consistent
time to complete each cpu works as hard as any other cpu and load
balancing is achieved. After spawning off the sub-shells the main shell
simply waits for all children to return (ie, wait()) and does not
contribute to the actual calculation at all.
If instead of giving the subshells one job to do and trying to work out
which one will return first so that you can give it the next job you give
each one a list of jobs and let them all run at their own pace you might
be able to simplify the problem.
This idea only works, however if the jobs are reasonably consistently
sized (timewise). If you have a couple of big jobs and a couple of smaller
jobs then you may have to think more carefully about how you want to break
them up across processors. We actually ran into this problem as well (the
jobs in our case were CM databases who's sizes varied depending on the
kind of data they stored and the rate of change they were managing). We
started the "big" stream alongside the "little" stream and allocated the
jobs so that it sort-of came out evenly :)
If you want to do the (slightly) more complicated feat of assigning jobs
to "worker" processes whenever they finish the last job then I can think
of a couple of other ways to go about it:
1) Use SIGCHILD to your advantage.
I don't know whether perl has facilities to catch and notify you
of sigchild events, but I strongly suspect that it does (hell, sh does ;).
You may be able to write a handler to catch this signal to see individual
child deaths regardless of which child dies first. If you get lost in the
perl of this you could always fall back to C... though with the various
meanings of sigchild over various platforms and the quirkyness of signals
generally you may wish to steer clear of this option.
2) Use a form of Inter-Process Communciation to learn that a child is done
and assign it another job without killing it.
You could just something as simple as the creation of a file, or
pipes, or TCP/IP.... :) I don't know any perl, but I'm sure you'll find
modules to manage all of these forms of communication and more on CPAN or
more likely as part of your default install.
Hope this gives you some ideas... :)
Benjamin.
Michael Anthon <michael at anthon.net>
Sent by: Majordomo <majordom at caliburn.humbug.org.au>
13/05/02 16:23
Please respond to general
To: general at lists.humbug.org.au
cc:
Subject: [H-GEN] Managing multiple child processes with perl
I have a desire to speed up my nightly backup process on our main database
server. The process at the moment goes something like this (it's a perl
script that I wrote)
[...snip...]
This means that I am gzipping all 30G or so twice, which seems terribly
inefficient to me. So.. to get to my questions. I am wanting to find a
way
to run the gzip processes in parallel (it's a dual CPU E250 running
solaris). The only possible way to do this that I can see is to fork the
gzip processes, getting the PID of each one as it starts, then just watch
for those processes to not be there any more (using ps or something). This
seems a little... icky and I was hoping someone could advise me on a
better
way to manage this. If I can get the gzip processes running in parallel
then it should shorten the time taken for the compression stage enough
that
I can do the compression only once, then use tar to write the compressed
files to tape.
--
* This is list (humbug) general handled by majordomo at lists.humbug.org.au .
* Postings to this list are only accepted from subscribed addresses of
* lists 'general' or 'general-post'. See http://www.humbug.org.au/
More information about the General
mailing list