Moe -
Thanks for the patch. I've committed it into our subversion
repository. I've also made a 7.1.4b2 tarball and posted it on our
web page at:
http://www.lam-mpi.org/beta/
I'm not entirely sure I want to do yet another LAM/MPI release, so
unless this becomes a large issue, I'll probably just leave the patch
in the beta for now. If more bugs crop up, we may do another bug fix
release.
Thanks again,
Brian
On Mar 8, 2007, at 5:16 PM, jette1_at_[hidden] wrote:
> I'd like to request the following change be made to
> Lam's interface to Slurm. The logic is used to parse
> the SLURM_NODELIST env var. On most clusters, SLURM_NODELIST
> contains something like this "tux[1-15]", but if the
> node name format does not contain a common prefix and
> a numeric suffix then the env var can contain something
> like "foo,bar". This change permits such an env var.
> Also, this same bug exists in OpenMPI. Does the same
> developer handle this code in both systems?
>
>> $ cd share/ssi/boot/slurm/src
>> $ diff ssi_boot_slurm_hostlist.c ssi_boot_slurm_hostlist.c.new
>> 72c72,80
>> < lam_arr_append(a, &base);
>> ---
>>> char *begin = base;
>>> for (i = 0; i < len; ++i) {
>>> if (base[i] == ',') {
>>> base[i] = '\0';
>>> lam_arr_append(a, &begin);
>>> begin = &base[i] + 1;
>>> }
>>> }
>>> lam_arr_append(a, &begin);
>
>
> Here is what happens without the change:
>
>> bash-3.00$ export SLURM_NODELIST=tdev5,tdev6
>> bash-3.00$ ./lamboot
>>
>> LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
>>
>> ---------------------------------------------------------------------
>> --------
>> Could not resolve the hostname "tdev5,tdev6" that was in the host
>> file.
>>
>> Things to check:
>>
>> - is "tdev5,tdev6" in /etc/hosts?
>> try "grep tdev5,tdev6 /etc/hosts"
>> - is "tdev5,tdev6" resolvable by DNS (or some other naming
>> service)?
>> try "ping tdev5,tdev6" or "dig tdev5,tdev6" or "nslookup
>> tdev5,tdev6"
>> ---------------------------------------------------------------------
>> --------
> --
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Morris "Moe" Jette jette1_at_[hidden] 925-423-4856
> Integrated Computational Resource Management Group fax 925-423-6961
> Livermore Computing Lawrence Livermore National Laboratory
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> lam-devel mailing list
> lam-devel_at_[hidden]
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel
|