LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: jette1_at_[hidden]
Date: 2007-03-08 19:16:00


I'd like to request the following change be made to
Lam's interface to Slurm. The logic is used to parse
the SLURM_NODELIST env var. On most clusters, SLURM_NODELIST
contains something like this "tux[1-15]", but if the
node name format does not contain a common prefix and
a numeric suffix then the env var can contain something
like "foo,bar". This change permits such an env var.
Also, this same bug exists in OpenMPI. Does the same
developer handle this code in both systems?

>$ cd share/ssi/boot/slurm/src
>$ diff ssi_boot_slurm_hostlist.c ssi_boot_slurm_hostlist.c.new
>72c72,80
>< lam_arr_append(a, &base);
>---
> > char *begin = base;
> > for (i = 0; i < len; ++i) {
> > if (base[i] == ',') {
> > base[i] = '\0';
> > lam_arr_append(a, &begin);
> > begin = &base[i] + 1;
> > }
> > }
> > lam_arr_append(a, &begin);

Here is what happens without the change:

>bash-3.00$ export SLURM_NODELIST=tdev5,tdev6
>bash-3.00$ ./lamboot
>
>LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>
>-----------------------------------------------------------------------------
>Could not resolve the hostname "tdev5,tdev6" that was in the host file.
>
>Things to check:
>
> - is "tdev5,tdev6" in /etc/hosts?
> try "grep tdev5,tdev6 /etc/hosts"
> - is "tdev5,tdev6" resolvable by DNS (or some other naming service)?
> try "ping tdev5,tdev6" or "dig tdev5,tdev6" or "nslookup
>tdev5,tdev6"
>-----------------------------------------------------------------------------

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Morris "Moe" Jette       jette1_at_[hidden]                 925-423-4856
Integrated Computational Resource Management Group   fax 925-423-6961
Livermore Computing            Lawrence Livermore National Laboratory
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++