On Jul 7, 2005, at 10:49 AM, Sumeet Kapur wrote:
> I am trying to run my parallel program "prmpi.x" on 4 dual intel zeon
> processor computers with Red Hat 3 EL, WS.
> I use the following command mpirun -np 8 prmpi.x.
> I am able to lamboot successfully, and able to run this command on
> all 4
> computers individually, by just lambooting that particular computer.
> When I try to use 4 computers with disk on one of the computer
> (host) shared
> via nfs, the host computer hangs.
> I looked at output of "top" on other computer, where it showed that
> only of
> the instances of prmpi.x is running and using "ps -ef" shows that
> there are
> 2 instances of prmpi.x running but one of it is running for "0"
> elapsed
> time.
This sounds like a problem with your machine setup, if I understand
the problem correctly. If you are able to lamboot and run lamnodes
(to verify that the LAM universe is properly running), then you've
done the LAM part right. Machines locking up sounds like a hardware
or OS problem - you might want to find a local Linux guru to help you
with debugging that part.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|