www.bundesbrandschatzamt.de
Babblings about Systems Administration.

Perl Fork

One of the reasons why i still stick with Perl are the database modules around DBI. I just like the beauty of the syntax. Many years ago i learned that you can bring processes to speed with process forking. Even though i used the fork function every now and then i wasn’t concerned much about the childs outcome. If something failed the next run of the program would take care of it. The child processes worked independent of each other.

My latest use case was around syncing some tables between 2 databases. If you have to transfer GBs of data time can be an issue. Have i mentioned the cron job gets triggered every 5 minutes? The first incarnation looked like

#!/usr/bin/perl
use strict;
use warnings;
use POSIX ":sys_wait_h";

sub REAPER {
    my $stiff;
    while ( ( $stiff = waitpid( -1, &WNOHANG ) ) > 0 ) {

        # do something with $stiff if you want
    }
    $SIG{CHLD} = \&REAPER;      # install *after* calling waitpid
}

sub main{
    $SIG{CHLD} = \&REAPER;
    my $child_started = 0;
    my @childs;

    while ( $child_started == 0 ) {
        if ( $#childs + 1 < 4 ) {
            my $pid = fork();

            if ( $pid == 0 ) {

                # Child process

            } elsif ( $pid > 0) {
                # master has a new child
                $child_started = 1;
                push @childs, $pid;
            }
            else {
                # master
                die "could not fork.";
            }
        }
        sleep 42;

        for ( my $i = 0 ; $i <= $#childs ; $i++ ) {
            my $return = kill 0, $childs[$i];

            if ( $return == 0 ) {
                splice( @childs, $i, 1 );
            }
        }
    }

    # wait till all childs have finished there work

    while ( @childs > 0 ) {
        for ( my $i = 0 ; $i <= $#childs ; $i++ ) {
            my $return = kill 0, $childs[$i];

            if ( $return == 0 ) {
                splice( @childs, $i, 1 );
            }
        }
        sleep 20;
    }
}

This fire and forget approach works and is similiar to the code examples you can find in different OReilly books related to Perl. It is recommended to close all file handles before forking. But in my experience you can even keep a logfile open in your master process before forking. Thus every child can write information back. On the other hand database handles you have to close in the master process and open again in the child.

my $LOGFILE;
my $logfilename = "/var/log/foo.log";
open( $LOGFILE,">>", $logfilename) || die "could not open $logfilename";
flock( $LOGFILE, LOCK_EX|LOCK_NB) || die "apparently program is already running as logfile is already locked.";
$LOGFILE->autoflush;

# fork

close $LOGFILE;

But sometimes your master process has to know what’s going on. In my case it was a final step executed by the master process itself. If one of the childs fails the master would just produces the same outcome as before. This puts unnecessary burden on the database. Think of table updates, redo logs and maybe synchronizations within your database cluster.

my %kid_status;
my %kid_task;

sub REAPER {
    local ( $!, $? );
    while ( ( my $stiff = waitpid( -1, WNOHANG ) ) > 0 ) {
        # do something with $stiff if you want
        $kid_status{$stiff} = $?;
    }
  $SIG{CHLD} = \&REAPER;    # install *after* calling waitpid
}


sub main{
    $SIG{CHLD} = \&REAPER;
    my @childs;
    my @tasks = [ "foo", "barf"];

    for my $task ( @tasks) {
    my $child_started = 0;
    while ( $child_started == 0 ) {
        if ( $#childs + 1 < 4 ) {
            my $pid = fork();

            if ( $pid == 0 ) {

                # Child process

            } elsif ( $pid > 0) {
                # master has a new child
                $child_started = 1;
                push @childs, $pid;
                $kid_task{$pid} = $task;
            }
            else {
                # master
                die "could not fork.";
            }
        }
        sleep 42;

        for ( my $i = 0 ; $i <= $#childs ; $i++ ) {
            my $return = kill 0, $childs[$i];

            if ( $return == 0 ) {
                splice( @childs, $i, 1 );
            }
        }
    }
}

    # wait till all childs have finished there work

    while ( @childs > 0 ) {
        for ( my $i = 0 ; $i <= $#childs ; $i++ ) {
            my $return = kill 0, $childs[$i];

            if ( $return == 0 ) {
                splice( @childs, $i, 1 );
            }
        }
        sleep 20;
    }

        # check what happened in our childs.

    foreach  my $childpid (keys %kid_task ) {
        if ( defined $kid_status{$childpid} ) {
            my $exitcode = $kid_status{$childpid} >> 8;
            my $signal = $? & 127;

            if ( $exitcode > 0 ) {
                print "WARN $kid_task{$childpid} terminated with $exitcode and signal $signal";
            }
        }
    }

}

Here you can find additional information around this topic: