Following up on this issue, it appears as though the problem was how
I specified the hosts in the lamhosts file that I was using to boot
the LAM. Instead of having the master node first, I had it listed
last. I still don't understand why this is a problem, but it works
now.
Howard
At 9:57 PM -0500 10/3/05, Howard Butler wrote:
>Dear List,
>
>Please excuse my newbiness. I am attempting to use Rmpi
><http://cran.r-project.org/src/contrib/Descriptions/Rmpi.html>, which
>is library for the statistical processing software R
><http://www.r-project.org/>. In the past I had been using PVM with
>much (but slow and flaky) success. I am having trouble spawning
>processes to the cluster using the prescribed methods, and I think it
>has something to do with how I boot the cluster with lamboot.
>
>I am working on a dual 2.5 ghz Apple G5 and a dual 2.3 ghz Xserve,
>with the Xserve acting as the master node (both are running Tiger
>10.4.2). I am using the binaries provided on the site, and ssh'ing
>works without passwords. When I issue a lamboot, all appears well
>and recon gives me the w00t.
>
>When I attempt to invoke things from within R, this error message is returned:
>
>>It seems that [at least] one of the child processes that was started
>>by MPI_Comm_spawn* chose a different RPI than the parent MPI
>>application. For example, one (of the) child process(es) that
>>differed from the parent is shown below:
>>
>> Parent application: MPI_Comm_spawn
>> Child MPI_COMM_WORLD rank crtcp (v1.1.0): 0
>
>
>Taking the extra computer out of the cluster and only running on the
>master allows the spawning to complete successfully. I have been
>struggling to research this error (it appears that google hasn't
>caught up with a recent maillist archive move -- links to google's
>results like <http://www.lam-mpi.org//MailArchives/lam/mail20.php>
>are 404).
>
>It is clear to me that either I am not configuring the slave node
>properly when I issue the lamboot, or the rmpi.c code is spawning
>things improperly.
>
>Looking at the rmpi.c code, it appears that it is invoking the spawn
>command with MPI_Comm_spawn:
>> mpi_errhandler(MPI_Comm_spawn (CHAR (STRING_ELT
>>(sexp_slave, 0)), argv, nslave,
>> info[infon], root,
>>MPI_COMM_SELF, &comm[intercommn],
>> slaverrcode));
>
>I also compiled my own LAM/MPI with gcc4 and gfortran and had the
>same results. The lamboot FAQ
><http://www.lam-mpi.org/faq/category4.php3> doesn't appear to have
>any questions related to my problem. If it is just a case of my bad
>google foo, please point me to any information that I should look at.
>Any other ideas would be greatly welcome.
>
>Thanks
>
>Howard
>
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|