Thanks for the patch! I'll bet that the LAM guys will apply this,
but I don't know how excited they'll be to do another LAM
release... :-\ (see http://www.lam-mpi.org/MailArchives/lam-
announce/2007/02/0025.php)
We actually fixed this in Open MPI and it's slated for Open MPI
v1.2.1 (it was too late for v1.2). See https://svn.open-mpi.org/trac/
ompi/ticket/835.
It's amazing that we went all these many years without it ever coming
up (the code was initially ported from LAM to Open MPI, which is why
both systems had the problem).
On Mar 8, 2007, at 7:16 PM, jette1_at_[hidden] wrote:
> I'd like to request the following change be made to
> Lam's interface to Slurm. The logic is used to parse
> the SLURM_NODELIST env var. On most clusters, SLURM_NODELIST
> contains something like this "tux[1-15]", but if the
> node name format does not contain a common prefix and
> a numeric suffix then the env var can contain something
> like "foo,bar". This change permits such an env var.
> Also, this same bug exists in OpenMPI. Does the same
> developer handle this code in both systems?
>
>> $ cd share/ssi/boot/slurm/src
>> $ diff ssi_boot_slurm_hostlist.c ssi_boot_slurm_hostlist.c.new
>> 72c72,80
>> < lam_arr_append(a, &base);
>> ---
>>> char *begin = base;
>>> for (i = 0; i < len; ++i) {
>>> if (base[i] == ',') {
>>> base[i] = '\0';
>>> lam_arr_append(a, &begin);
>>> begin = &base[i] + 1;
>>> }
>>> }
>>> lam_arr_append(a, &begin);
>
>
> Here is what happens without the change:
>
>> bash-3.00$ export SLURM_NODELIST=tdev5,tdev6
>> bash-3.00$ ./lamboot
>>
>> LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>>
>> ---------------------------------------------------------------------
>> --------
>> Could not resolve the hostname "tdev5,tdev6" that was in the host
>> file.
>>
>> Things to check:
>>
>> - is "tdev5,tdev6" in /etc/hosts?
>> try "grep tdev5,tdev6 /etc/hosts"
>> - is "tdev5,tdev6" resolvable by DNS (or some other naming
>> service)?
>> try "ping tdev5,tdev6" or "dig tdev5,tdev6" or "nslookup
>> tdev5,tdev6"
>> ---------------------------------------------------------------------
>> --------
> --
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Morris "Moe" Jette jette1_at_[hidden] 925-423-4856
> Integrated Computational Resource Management Group fax 925-423-6961
> Livermore Computing Lawrence Livermore National Laboratory
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> lam-devel mailing list
> lam-devel_at_[hidden]
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems
|