LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Lehan (jlehan_at_[hidden])
Date: 2006-10-07 00:41:27


Hi. In LAM 7.1.2, I found a segfault in lamd when "lamhalt" is used to
tear down a LAM network.

It happens if the "tkill" executable is not found.

It's in the appropriately named function diediedie() in
otb/sys/haltd/haltd.c

I traced it out, and what's going on is this:

It's building a list of locations to search for "tkill", and passes that
to sfh_path_findv().

The result is the tkillpath string. It is not checked before passing it
along into sfh_argv_add() later, when it's building up a command line
for tkill to execute with.

Problem is, sfh_argv_add() can't accept a NULL string, as it attempts to
do strlen() on it, and segfaults there.

The underlying problem is that $PATH is not being searched correctly.

The sfh_path_findv() function will expand environment variables, so
$PATH gets expanded as /bin:/usr/bin:/whatever = not the correct
behaviour for $PATH. We need to further break up $PATH, and iterate
through each of its components.

I have a small patch that does this (disclaimer: not fully tested yet):

1) Use a different function, sfh_path_env_find(), which *does* correctly
break up $PATH, if tkillpath isn't found earlier.

2) If this also fails to find tkillpath, then punt, by doing exit(1).
There's nothing more that can be done anyway, as the execution of tkill
is guaranteed to fail if it can't be found.

Josh Lehan
Scyld

diff -urN OLD/lam-7.1.2/otb/sys/haltd/haltd.c NEW/lam-7.1.2/otb/sys/haltd/haltd.c
--- OLD/lam-7.1.2/otb/sys/haltd/haltd.c 2006-02-23 15:26:55.000000000 -0800
+++ NEW/lam-7.1.2/otb/sys/haltd/haltd.c 2006-10-06 21:05:42.000000000 -0700
@@ -214,8 +217,27 @@
   sfh_argv_add(&pathc, &pathv, "$LAMHOME/bin");
   sfh_argv_add(&pathc, &pathv, LAM_BINDIR);
 
- tkillpath = sfh_path_findv(fname, pathv, R_OK, environ);
+ tkillpath = sfh_path_findv(fname, pathv, X_OK, environ);
   sfh_argv_free(pathv);
+
+ if (NULL == tkillpath)
+ {
+ tkillpath = sfh_path_env_find(fname, X_OK);
+
+ if (NULL == tkillpath)
+ {
+ exit(1);
+ }
+ }
+
   sfh_argv_add(&argc, &argv, tkillpath);
 
   if (ao_taken(lam_daemon_optd, "d")) {