LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Lehan (jlehan_at_[hidden])
Date: 2006-10-11 15:21:30


Hi. I sent this message to the lam-devel list a few days ago, but that
list appears dead. I'm assuming the developers have moved on to
OpenMPI. So, a repost here.

In LAM 7.1.2, I found a segfault in lamd when "lamhalt" is used to
tear down a LAM network.

It happens if the "tkill" executable is not found.

It's in the appropriately named function diediedie() in
otb/sys/haltd/haltd.c

I traced it out, and what's going on is this:

It's building a list of locations to search for "tkill", and passes that
to sfh_path_findv().

The result is the tkillpath string. It is not checked before passing it
along into sfh_argv_add() later, when it's building up a command line
for tkill to execute with.

Problem is, sfh_argv_add() can't accept a NULL string, as it attempts to
do strlen() on it, and segfaults there.

The underlying problem is that $PATH is not being searched correctly.

The sfh_path_findv() function will expand environment variables, so
$PATH gets expanded as /bin:/usr/bin:/whatever = not the correct
behaviour for $PATH. We need to further break up $PATH, and iterate
through each of its components.

I have a small patch that does this:

1) Use a different function, sfh_path_env_find(), which *does* correctly
break up $PATH, if tkillpath isn't found earlier.

2) If this also fails to find tkillpath, then punt, by doing exit(1).
There's nothing more that can be done anyway, as the execution of tkill
is guaranteed to fail if it can't be found. This exit call is what
would be done anyway if tkill's fork/exec fails.

This patch works for me, no more segfault.

Josh Lehan
Scyld

diff -urN OLD/lam-7.1.2/otb/sys/haltd/haltd.c NEW/lam-7.1.2/otb/sys/haltd/haltd.c
--- OLD/lam-7.1.2/otb/sys/haltd/haltd.c 2006-02-23 15:26:55.000000000 -0800
+++ NEW/lam-7.1.2/otb/sys/haltd/haltd.c 2006-10-06 21:05:42.000000000 -0700
@@ -214,8 +217,27 @@
   sfh_argv_add(&pathc, &pathv, "$LAMHOME/bin");
   sfh_argv_add(&pathc, &pathv, LAM_BINDIR);
 
- tkillpath = sfh_path_findv(fname, pathv, R_OK, environ);
+ tkillpath = sfh_path_findv(fname, pathv, X_OK, environ);
   sfh_argv_free(pathv);
+
+ if (NULL == tkillpath)
+ {
+ tkillpath = sfh_path_env_find(fname, X_OK);
+
+ if (NULL == tkillpath)
+ {
+ exit(1);
+ }
+ }
+
   sfh_argv_add(&argc, &argv, tkillpath);
 
   if (ao_taken(lam_daemon_optd, "d")) {