LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Soumya,Teena, Ranjana,Liss,Navya (bytecode.compression_at_[hidden])
Date: 2006-03-06 02:03:38


 we did
mpicc hello.c -lmpi
lamshrink n2
ssh node-1
lamshrink n0

lamnodes gave
no invalid node
n1 node-1.mec.ac.in:1:this_node
n2 invalid node
 n3 node-3.mec.ac.in:1:

 we did mpirun -np 2 a.out
gave no output .No error displayed either.

we did ./a.out
gave output
Hello World 0 of 1 on debian-204
After Final...0 of 1 running on debian-204

It seems mpirun is not giving any output on node-1 but .a/.out is giving
output.

> On 2/27/06, Josh Hursey < jjhursey_at_[hidden] > wrote:
> >
> > I'm wondering if there is a problem with the LAM/MPI installation or
> > environment setup on one or more of the compute nodes.
> >
> > What version of LAM/MPI are you using on all of these machines? Is it
> > the same across all of the nodes?
> >
> > To help pinpoint the machines that are being troublesome try the
> > enclosed extension to your program:
> >
> > /* ========== */
> > #include <stdio.h>
> > #include <mpi.h>
> >
> > int main(int argc, char *argv[]){
> > int rank, size, len;
> > char hostname[256] = "";
> >
> > MPI_Init(&argc, &argv);
> >
> > MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> > MPI_Comm_size(MPI_COMM_WORLD, &size);
> > MPI_Get_processor_name(hostname, &len);
> >
> > printf("Hello World! I am %d of %d running on %s\n",
> > rank, size, hostname);
> >
> > MPI_Finalize();
> > printf("After final... %d of %d running on %s\n",
> > rank, size, hostname);
> >
> > return 0;
> > }
> > /* ========== */
> >
> > This will print out the rank, and returned hostname identifier
> > (should be the UNIX hostname) for each node running the application.
> > This should tell you which nodes are working properly. To find the
> > nodes that are not working properly use the difference between:
> > $ lamnodes
> > and
> > $ mpirun N a.out
> >
> > The 'N' will tell LAM/MPI to run on all available nodes, so this will
> > cover all of the nodes from lamnodes (except those marked
> > 'no_schedule').
> >
> > Once you know the subset of machines that are not returning stdout,
> > try ssh'ing to those mahcines and running the MPI program as a
> > 'singleton':
> > $ ./a.out
> > This will run the MPI program as if you had a lamnodes of just the
> > localhost with a '-np 1' specified on the command line to mpirun.
> >
> > If this doesn't give you anything on stdout, then I would try a non-
> > MPI program to see if behaves differently. If it does then there may
> > be other problems.
> >
> > Let me know if this helps pinpoint your problem.
> >
> > Josh
> >
> > On Feb 27, 2006, at 5:30 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
> >
> > > Tried MPI_Barrier(MPI_COMM_WORLD) before MPI_Finalize(). It
> > > didn't help. Same output as before.
> > >
> > > On 2/27/06, Elie Choueiri < elie.choueiri_at_[hidden]> wrote:
> > > Would an MPI_Barrier before the MPI_Finalize make a difference?
> > >
> > >
> > > On 2/27/06, Soumya,Teena, Ranjana,Liss,Navya <
> > > bytecode.compression_at_[hidden] > wrote: Thank you for the prompt
> > > suggestions. But it did not help. The outputs are just the same
> > > as before except that the message comes on separate lines.
> > >
> > > On 2/26/06, Jeff Squyres < jsquyres_at_[hidden] > wrote: I *think*
> > > that you are seeing the canonical parallel output issue --
> > > that output from different nodes shows up at seemingly random times
> > > because, at least in part, all nodes are not exactly synchronized.
> > >
> > > However, you should probably still see the "Hello world" messages
> > > from the MPI processes on node-3 (note that you killed node-1, so you
> > > won't see anything from node-1). These are silly suggestions, but
> > > try them anyway:
> > >
> > > - Output is not guaranteed to be flushed unless you trail the message
> > > with a \n -- put a \n in your "After final" message.
> > > - Put a "sleep(1);" before MPI_Finalize to give the output time to
> > > flush over to mpirun before closing everything down.
> > >
> > > I'm sure your application is running properly and that for some
> > > reason the stdout is not getting flushed over to mpirun.
> > >
> > >
> > > On Feb 24, 2006, at 4:26 AM, Soumya,Teena, Ranjana,Liss,Navya wrote:
> > >
> > > > hi,
> > > >
> > > >
> > > > hello.c program is as follows:
> > > > #include<stdio.h>
> > > > #include<mpi.h>
> > > > int main(int argc,char* argv[]){
> > > > int rank,size;
> > > > MPI_Init(&argc,&argv);
> > > > MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> > > > MPI_Comm_size(MPI_COMM_WORLD,&size);
> > > >
> > > > printf("Hello World %d of %d \n",rank,size);
> > > > MPI_Finalize();
> > > > printf("After final...");
> > > > return 0;
> > > > }
> > > >
> > > >
> > > > Lam-mpi is configured on 4 systems.
> > > > The command lamnodes gave:
> > > > n0 master.mec.ac.in:1 :origin,this_node
> > > > n1 node-1.mec.ac.in:1:
> > > > n2 node-2.mec.ac.in:1:
> > > > n3 node-3.mec.ac.in:1:
> > > >
> > > > When we do mpirun -np 4 a.out after mpicc test.c -lmpi having
> > > > compiled successfully we are getting the following output:
> > > > Hello World 0 of 4
> > > > Hello World 2 of 4
> > > > After final...Afterfinal...
> > > >
> > > >
> > > > Then we did mpirun -np 8 a.out and got output as follows
> > > > Hello World 0 of 8
> > > > Hello World 4 of 8
> > > > After final...Hello World 2 of 8
> > > > Hello World 6 of 8
> > > > After final...After final...After final...
> > > >
> > > > Then we did lamshrink n1 .
> > > > Then mpirun -np 8 a.out
> > > > Hello World 3 of 8
> > > > Hello World 1 of 8
> > > > Hello World 4 of 8
> > > > Hello World 7 of 8
> > > > After final...After final...After final...After final...Hello World
> > > > 0 of 8
> > > > After final...Hello World 6 of 8
> > > >
> > > > It seems that the nodes node-1 and node-3 is not printing any
> > > > message although communication between the nodes by means of
> > > > message parsing is taking place and lam is properly installed in
> > > > all 4 nodes.
> > > >
> > > > What could be the reason? Please suggest a solution.
> > > >
> > > > Thank u.
> > > > --
> > > > CE 2002-06
> > > > _______________________________________________
> > > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > > --
> > > {+} Jeff Squyres
> > > {+} The Open MPI Project
> > > {+} http://www.open-mpi.org/
> > >
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > >
> > > --
> > > CE 2002-06
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > >
> > >
> > > --
> > > CE 2002-06
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> > ----
> > Josh Hursey
> > jjhursey_at_[hidden]
> > http://www.lam-mpi.org/
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
>
>
> --
> CE 2002-06
>
> --
> CE 2002-06
>
>
>
>
> ------------------------------------------------------------------------
> Fire is the test of gold; adversity, of strong men.
> ------------------------------------------------------------------------
>
> Jyothish Varma
> 1619 Crest Road
> Apartment# 1
> Raleigh -27606
> NC
> Ph:1-919-271-1419
> ------------------------------------------------------------------------
>
> ------------------------------
> Yahoo! Mail
> Use Photomail<http://us.rd.yahoo.com/mail_us/taglines/pmall2/*http://photomail.mail.yahoo.com>to share photos without annoying attachments.
>
>

--
CE  2002-06
--
CE  2002-06