- Next message: Adams Samuel D Contr AFRL/HEDR: "Re: LAM: redirection"
- Previous message: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- In reply to: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- Next in thread: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- Reply: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
I'm wondering if there is a problem with the LAM/MPI installation or
environment setup on one or more of the compute nodes.
What version of LAM/MPI are you using on all of these machines? Is it
the same across all of the nodes?
To help pinpoint the machines that are being troublesome try the
enclosed extension to your program:
/* ========== */
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]){
int rank, size, len;
char hostname[256] = "";
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(hostname, &len);
printf("Hello World! I am %d of %d running on %s\n",
rank, size, hostname);
MPI_Finalize();
printf("After final... %d of %d running on %s\n",
rank, size, hostname);
return 0;
}
/* ========== */
This will print out the rank, and returned hostname identifier
(should be the UNIX hostname) for each node running the application.
This should tell you which nodes are working properly. To find the
nodes that are not working properly use the difference between:
$ lamnodes
and
$ mpirun N a.out
The 'N' will tell LAM/MPI to run on all available nodes, so this will
cover all of the nodes from lamnodes (except those marked
'no_schedule').
Once you know the subset of machines that are not returning stdout,
try ssh'ing to those mahcines and running the MPI program as a
'singleton':
$ ./a.out
This will run the MPI program as if you had a lamnodes of just the
localhost with a '-np 1' specified on the command line to mpirun.
If this doesn't give you anything on stdout, then I would try a non-
MPI program to see if behaves differently. If it does then there may
be other problems.
Let me know if this helps pinpoint your problem.
Josh
On Feb 27, 2006, at 5:30 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
> Tried MPI_Barrier(MPI_COMM_WORLD) before MPI_Finalize(). It
> didn't help. Same output as before.
>
> On 2/27/06, Elie Choueiri <elie.choueiri_at_[hidden]> wrote:
> Would an MPI_Barrier before the MPI_Finalize make a difference?
>
>
> On 2/27/06, Soumya,Teena, Ranjana,Liss,Navya <
> bytecode.compression_at_[hidden]> wrote: Thank you for the prompt
> suggestions. But it did not help. The outputs are just the same
> as before except that the message comes on separate lines.
>
> On 2/26/06, Jeff Squyres <jsquyres_at_[hidden] > wrote: I *think*
> that you are seeing the canonical parallel output issue --
> that output from different nodes shows up at seemingly random times
> because, at least in part, all nodes are not exactly synchronized.
>
> However, you should probably still see the "Hello world" messages
> from the MPI processes on node-3 (note that you killed node-1, so you
> won't see anything from node-1). These are silly suggestions, but
> try them anyway:
>
> - Output is not guaranteed to be flushed unless you trail the message
> with a \n -- put a \n in your "After final" message.
> - Put a "sleep(1);" before MPI_Finalize to give the output time to
> flush over to mpirun before closing everything down.
>
> I'm sure your application is running properly and that for some
> reason the stdout is not getting flushed over to mpirun.
>
>
> On Feb 24, 2006, at 4:26 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
>
> > hi,
> >
> >
> > hello.c program is as follows:
> > #include<stdio.h>
> > #include<mpi.h>
> > int main(int argc,char* argv[]){
> > int rank,size;
> > MPI_Init(&argc,&argv);
> > MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> > MPI_Comm_size(MPI_COMM_WORLD,&size);
> >
> > printf("Hello World %d of %d \n",rank,size);
> > MPI_Finalize();
> > printf("After final...");
> > return 0;
> > }
> >
> >
> > Lam-mpi is configured on 4 systems.
> > The command lamnodes gave:
> > n0 master.mec.ac.in:1:origin,this_node
> > n1 node-1.mec.ac.in:1:
> > n2 node-2.mec.ac.in:1:
> > n3 node-3.mec.ac.in:1:
> >
> > When we do mpirun -np 4 a.out after mpicc test.c -lmpi having
> > compiled successfully we are getting the following output:
> > Hello World 0 of 4
> > Hello World 2 of 4
> > After final...Afterfinal...
> >
> >
> > Then we did mpirun -np 8 a.out and got output as follows
> > Hello World 0 of 8
> > Hello World 4 of 8
> > After final...Hello World 2 of 8
> > Hello World 6 of 8
> > After final...After final...After final...
> >
> > Then we did lamshrink n1 .
> > Then mpirun -np 8 a.out
> > Hello World 3 of 8
> > Hello World 1 of 8
> > Hello World 4 of 8
> > Hello World 7 of 8
> > After final...After final...After final...After final...Hello World
> > 0 of 8
> > After final...Hello World 6 of 8
> >
> > It seems that the nodes node-1 and node-3 is not printing any
> > message although communication between the nodes by means of
> > message parsing is taking place and lam is properly installed in
> > all 4 nodes.
> >
> > What could be the reason? Please suggest a solution.
> >
> > Thank u.
> > --
> > CE 2002-06
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
> --
> CE 2002-06
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
>
> --
> CE 2002-06
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/
- Next message: Adams Samuel D Contr AFRL/HEDR: "Re: LAM: redirection"
- Previous message: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- In reply to: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- Next in thread: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
- Reply: Soumya,Teena, Ranjana,Liss,Navya: "Re: LAM: please suggest a solution."
|