Hello Michael,
Thanks for your reply.
My master node has two network interfaces, one public and another private. but the compute node has just one i.e. private. LAM configuration is fine as i am able to run other parallel jobs (e.g. Factorial calculation) successfully on both the nodes and i get the correct output. I have the users home directory shared via NFS. But when I run LS-Dyna, i get the problem running on both the nodes.
FYI - LS-Dyna runs fine if i invoke LAM on just one node and run only on signle node.
Thanks,
Jigar
Michael Arndt <M.Arndt_at_[hidden]> wrote: Hello Jigar
-does each cluster node have one or two network interfaces ?
keep in mind that for network connection on both nodes
lam/mpi must have the same opinion over which interface
to connect.
Translated: the hostnames in the lamhost file must resolve
to the same network !
Are the names in your hostfile are generated via the exec host list of
a batch system like PBS / SGE / LSF ?
-2nd trap: in case you do not have a NSF shared working directory
as common working directory for the calculation
the following recipe will help to resolve the real
problem easier:
mkdir -p /scratch/mydynajob on both nodes !
copy all input inclusive lamhosts file to both nodes
then start the job
dyna + lam7.0.3 works perfectly well and easy ...
so probably your problem ist a network / routing problem
of two prcesses not talking over the sanme interface
in case of problems verify also that the 2 CPU / single node job
runs on both nodes, so per se both nodes are configured ok
hth
Micha
On Sat, Mar 29, 2008 at 10:17:32AM -0700, Jigar Halani wrote:
> Hello
>
> I have a problem running LSDyna with LAM-MPI 7.0.3. I am using precompiled LSDyna binaries with LAM 7.0.3. When I run the job using just one node, it runs fine. But if i run the job over the network on 2 machines, it fails giving an error
>
> "It seems that[at least] one of the processes that was started with mpirun did not invoke MPI_INIT before quitting
> (it is possible that more than one process did not invoke MPI_INIT -- mpirun was only notified of the first one, which was on node n0"
>
> Can you please let me know what is the problem.
>
> Thanks in advance,
> Regards,
> Jigar
>
>
> ---------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Florian Geyer,
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Prof. Dr. Hanns Ruder
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
---------------------------------
Never miss a thing. Make Yahoo your homepage.
|