When TWiki::Sandbox::sysCommand forks a new process to exec the command, it may fail. If it fails then the parent (original) process remains blocked and child continues the execution flow. When TWiki is running as CGI it doesn't leads to great problems, since child process will eventually terminate. But under some persistent execution mechanism (I noted this error when I was testing HTTP engine from
TWiki:Codev.TWikiStandAlone) the parent process remains blocked forever (since child doesn't terminate) and as time passes a chain of blocked processes are created until memory is completely filled.
From TWiki::Sandbox::sysCommand:
if ( $pid ) { # Parent - read data from process filehandle
local $/ = undef; # set to read to EOF
$data = <$handle>;
close $handle;
$exit = ( $? >> 8 );
} else { # Child - run the command
open (STDERR, '>'.File::Spec->devnull()) || die "Can't kill STDERR: '$!'";
exec( $path, @args ) ||
throw Error::Simple( 'exec failed: '.$! );
# can never get here
}
I think the child should return the error to the parent and this one should handle it:
my $key = int(rand(255)) + 1; # A random key, known to both parent and child
if ( $pid ) { # Parent - read data from process filehandle
local $/ = undef; # set to read to EOF
$data = <$handle>;
close $handle;
$exit = ( $? >> 8 );
if ( $exit == $key && $data =~ /$key: (.*)/ ) {
throw Error::Simple( 'exec failed: ' . $1 );
}
} else { # Child - run the command
open (STDERR, '>'.File::Spec->devnull()) || die "Can't kill STDERR: '$!'";
unless ( exec( $path, @args ) ) { # If command fails then terminate and let parent handle the error
syswrite(STDOUT, $key . ": $!\n");
exit($key);
}
# can never get here. If command succeeds the exec doesn't return and if it fails we guarantee that the child terminates.
}
This way, no matter the execution mechanism used, no extra processes are kept. The
$key
is the returned and printed, so we can be sure that it's an error (and not a legitimate return status from child)
Just waiting for feedback to check in the fix
--
TWiki:Main/GilmarSantosJr - 17 Mar 2008
How to see the effect of this bug:
Configure {RCS}{EgrepCmd} and {RCS}{FgrepCmd} to something that doesn't exist. When a search is made no results are displayed, and no error is reported (actually a log entry is added, but it happens because the child process, that handles the exception, has access to the log file). With the fix, the error is displayed on the browser, as a normal Error::Simple should.
Another way to reproduce is to use TWiki withou rcs installed (and configured to use RcsWrap). When trying to save a topic the error displayed is about failing to create file history, and not an error about not being able to run rcs.
If plugins need external commands something similar would happen (error silently discarded and unexpected behavior).
--
TWiki:Main.GilmarSantosJr - 30 Mar 2008
Excellent work, Gilmar. This explains an effect I was seeing on a long-running mod_perl installation. But why use a random key? Why not just use the child PID?
--
TWiki:Main.CrawfordCurrie - 30 Mar 2008
Thanks, Crawford.
The exit value must be smaller than 256 (as far as I could test) and child pid is often greater than that. I used a random key and an specially formated output to ensure that
exec
really failed. The chance that the output and exit value of a program is
exactly like that is very low and the generation of a random key is pretty fast.
--
TWiki:Main.GilmarSantosJr - 30 Mar 2008
Fair enough. There
are a few programs around that use non-zero exit statuses incorrectly, but as you observe they are unlikely to format their output exactly as you have done.
Don't forget to flip this to "Waiting for Release" status when you are done.
--
TWiki:Main.CrawfordCurrie - 31 Mar 2008