I'd like to request the following change be made to
Lam's interface to Slurm. The logic is used to parse
the SLURM_NODELIST env var. On most clusters, SLURM_NODELIST
contains something like this "tux[1-15]", but if the
node name format does not contain a common prefix and
a numeric suffix then the env var can contain something
like "foo,bar". This change permits such an env var.
Also, this same bug exists in OpenMPI. Does the same
developer handle this code in both systems?
>$ cd share/ssi/boot/slurm/src
>$ diff ssi_boot_slurm_hostlist.c ssi_boot_slurm_hostlist.c.new
>72c72,80
>< lam_arr_append(a, &base);
>---
> > char *begin = base;
> > for (i = 0; i < len; ++i) {
> > if (base[i] == ',') {
> > base[i] = '\0';
> > lam_arr_append(a, &begin);
> > begin = &base[i] + 1;
> > }
> > }
> > lam_arr_append(a, &begin);
Here is what happens without the change:
>bash-3.00$ export SLURM_NODELIST=tdev5,tdev6
>bash-3.00$ ./lamboot
>
>LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>
>-----------------------------------------------------------------------------
>Could not resolve the hostname "tdev5,tdev6" that was in the host file.
>
>Things to check:
>
> - is "tdev5,tdev6" in /etc/hosts?
> try "grep tdev5,tdev6 /etc/hosts"
> - is "tdev5,tdev6" resolvable by DNS (or some other naming service)?
> try "ping tdev5,tdev6" or "dig tdev5,tdev6" or "nslookup
>tdev5,tdev6"
>-----------------------------------------------------------------------------
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Morris "Moe" Jette jette1_at_[hidden] 925-423-4856
Integrated Computational Resource Management Group fax 925-423-6961
Livermore Computing Lawrence Livermore National Laboratory
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|