LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-04-05 08:56:52


Sorry for taking so long to reply. What this error means is that one
of the processes has died. This can happen if one of the processes seg
faults, or has some other kind of error.

Do you get corefiles, or have any other kind of indication that a
process has died?

On Mar 25, 2005, at 4:37 PM, Barry J Mcinnes wrote:

> We are trying to get a model operational on a Mac G5 cluster, but
> model code that runs on Lintel with mpich1.2.x gives the following
> errors, running on a local MacOS 10.3.8 box.
> Eventually we want to run under SGE6u3 on a Mac cluster.
>
> fortran code compiled with xlf 8.1 and lam-mpi compiled under MacOS
>
> + mpiexec -boot -np 2
> /Volumes/Disk/jsw/gfsdist/cfs6264/cfs.25797/cfs6228
>
>
> Deleted startup output....
>
>      PROGRAM gsm      HAS BEGUN. COMPILED       0.00     ORG: np23
>      STARTING DATE-TIME  MAR 25,2005  12:50:49.379   84  FRI   2453455
>
>
>  &NAM_MRF
>  FHMAX=6.00000000000000000, FHOUT=6.00000000000000000,
> FHRES=6.00000000000000000, FHZER=6.00000000000000000,
> FHSEG=0.000000000000000000E+00, FHROT=0.000000000000000000E+00,
> DELTIM=1200.00000000000000, IGEN=82, FHDFI=3.00000000000000000,
> FHSWR=1.00000000000000000, FHLWR=3.00000000000000000,
> FHCYC=0.000000000000000000E+00, RAS=F, LDIAG3D=F
>  /
>  From compns : iret= 0  nsout= 18  nsswr= 3  nslwr= 9  nszer= 18
> nsres= 18  nsdfi= 9  nscyc= 0  ras= F
>  Reduced grid, nb points= 6536 full= 9024
>  nfile,fhour,idate= 11 0.0000000000E+00 0 10 9 2003  ntozi= 1  ntcwi=
> 2  ncldi= 1  ntraci= 2  tracers= 3.000000000  vtid= 21.00000000
> 1.000000000  xgf= 0.0000000000E+00
>  in fixio nread= 14  HOUR=  0.00   IDATE=    0   10    9 2003
> lonsfc,latsfc,ivssfc=     192      94  200004
> MPI_Recv: process in local group is dead (rank 1, comm 4)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - MPI_Scatter()
> Rank (1, MPI_COMM_WORLD):  - main()
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/