On running the given hello.c program , the following output is obtained:
Hello world 0 of 4 on ltsp
Hello world 2 of 4 on debian 205
After Final... 0 of 4 running on ltsp
After Final... 2 of 4 runnng on debian 205
lamnodes gave
n0 master.mec.ac.in:1:origin,this_node
n1 node-1.mec.ac.in:1:
n2 node-2.mec.ac.in:1:
n3 node-3.mec.ac.in:1:
ssh node-1
./a.out
Hello world 0 of 1 on debain-204
After Final... 0 of 1 running on debian-24
Hence we have obtained output from node-1 on doing ./a.out
The problem is still not pinpointed.
The same version of LAM/MPI is installed on all nodes.
On 2/27/06, Josh Hursey <jjhursey_at_[hidden]> wrote:
>
> I'm wondering if there is a problem with the LAM/MPI installation or
> environment setup on one or more of the compute nodes.
>
> What version of LAM/MPI are you using on all of these machines? Is it
> the same across all of the nodes?
>
> To help pinpoint the machines that are being troublesome try the
> enclosed extension to your program:
>
> /* ========== */
> #include <stdio.h>
> #include <mpi.h>
>
> int main(int argc, char *argv[]){
> int rank, size, len;
> char hostname[256] = "";
>
> MPI_Init(&argc, &argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> MPI_Get_processor_name(hostname, &len);
>
> printf("Hello World! I am %d of %d running on %s\n",
> rank, size, hostname);
>
> MPI_Finalize();
> printf("After final... %d of %d running on %s\n",
> rank, size, hostname);
>
> return 0;
> }
> /* ========== */
>
> This will print out the rank, and returned hostname identifier
> (should be the UNIX hostname) for each node running the application.
> This should tell you which nodes are working properly. To find the
> nodes that are not working properly use the difference between:
> $ lamnodes
> and
> $ mpirun N a.out
>
> The 'N' will tell LAM/MPI to run on all available nodes, so this will
> cover all of the nodes from lamnodes (except those marked
> 'no_schedule').
>
> Once you know the subset of machines that are not returning stdout,
> try ssh'ing to those mahcines and running the MPI program as a
> 'singleton':
> $ ./a.out
> This will run the MPI program as if you had a lamnodes of just the
> localhost with a '-np 1' specified on the command line to mpirun.
>
> If this doesn't give you anything on stdout, then I would try a non-
> MPI program to see if behaves differently. If it does then there may
> be other problems.
>
> Let me know if this helps pinpoint your problem.
>
> Josh
>
> On Feb 27, 2006, at 5:30 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
>
> > Tried MPI_Barrier(MPI_COMM_WORLD) before MPI_Finalize(). It
> > didn't help. Same output as before.
> >
> > On 2/27/06, Elie Choueiri <elie.choueiri_at_[hidden]> wrote:
> > Would an MPI_Barrier before the MPI_Finalize make a difference?
> >
> >
> > On 2/27/06, Soumya,Teena, Ranjana,Liss,Navya <
> > bytecode.compression_at_[hidden]> wrote: Thank you for the prompt
> > suggestions. But it did not help. The outputs are just the same
> > as before except that the message comes on separate lines.
> >
> > On 2/26/06, Jeff Squyres <jsquyres_at_[hidden] > wrote: I *think*
> > that you are seeing the canonical parallel output issue --
> > that output from different nodes shows up at seemingly random times
> > because, at least in part, all nodes are not exactly synchronized.
> >
> > However, you should probably still see the "Hello world" messages
> > from the MPI processes on node-3 (note that you killed node-1, so you
> > won't see anything from node-1). These are silly suggestions, but
> > try them anyway:
> >
> > - Output is not guaranteed to be flushed unless you trail the message
> > with a \n -- put a \n in your "After final" message.
> > - Put a "sleep(1);" before MPI_Finalize to give the output time to
> > flush over to mpirun before closing everything down.
> >
> > I'm sure your application is running properly and that for some
> > reason the stdout is not getting flushed over to mpirun.
> >
> >
> > On Feb 24, 2006, at 4:26 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
> >
> > > hi,
> > >
> > >
> > > hello.c program is as follows:
> > > #include<stdio.h>
> > > #include<mpi.h>
> > > int main(int argc,char* argv[]){
> > > int rank,size;
> > > MPI_Init(&argc,&argv);
> > > MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> > > MPI_Comm_size(MPI_COMM_WORLD,&size);
> > >
> > > printf("Hello World %d of %d \n",rank,size);
> > > MPI_Finalize();
> > > printf("After final...");
> > > return 0;
> > > }
> > >
> > >
> > > Lam-mpi is configured on 4 systems.
> > > The command lamnodes gave:
> > > n0 master.mec.ac.in:1:origin,this_node
> > > n1 node-1.mec.ac.in:1:
> > > n2 node-2.mec.ac.in:1:
> > > n3 node-3.mec.ac.in:1:
> > >
> > > When we do mpirun -np 4 a.out after mpicc test.c -lmpi having
> > > compiled successfully we are getting the following output:
> > > Hello World 0 of 4
> > > Hello World 2 of 4
> > > After final...Afterfinal...
> > >
> > >
> > > Then we did mpirun -np 8 a.out and got output as follows
> > > Hello World 0 of 8
> > > Hello World 4 of 8
> > > After final...Hello World 2 of 8
> > > Hello World 6 of 8
> > > After final...After final...After final...
> > >
> > > Then we did lamshrink n1 .
> > > Then mpirun -np 8 a.out
> > > Hello World 3 of 8
> > > Hello World 1 of 8
> > > Hello World 4 of 8
> > > Hello World 7 of 8
> > > After final...After final...After final...After final...Hello World
> > > 0 of 8
> > > After final...Hello World 6 of 8
> > >
> > > It seems that the nodes node-1 and node-3 is not printing any
> > > message although communication between the nodes by means of
> > > message parsing is taking place and lam is properly installed in
> > > all 4 nodes.
> > >
> > > What could be the reason? Please suggest a solution.
> > >
> > > Thank u.
> > > --
> > > CE 2002-06
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> > --
> > {+} Jeff Squyres
> > {+} The Open MPI Project
> > {+} http://www.open-mpi.org/
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> >
> > --
> > CE 2002-06
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> >
> >
> > --
> > CE 2002-06
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
CE 2002-06
|