Thanks An. I have tried as you said but my program was running well in the
other 2 nodes. I will try again to find reason. However as Jeff Squyres's
opinion, I upgraded LAM to 6.3.2. LAM 6.3.1 has some known bugs. Do you have
other ideas?
Nguyen Hai Chau
-----Original Message-----
From: lam-admin_at_[hidden] [mailto:lam-admin_at_[hidden]]On Behalf Of Le
Dinh An
Sent: Saturday, February 10, 2001 10:55 AM
To: lam_at_[hidden]
Subject: Re: LAM: Help
On Fri, 9 Feb 2001, Nguyen Hai Chau wrote:
> Dear All,
>
> I run my Molecular Application (MD) on LAM 6.3.1. My cluster has 6 nodes
> running Linux RedHat 6.2. We use 100Mbps Ethernet network connection.
> The program runs ok on 2 and 4 nodes but generated error when running on
> 6 nodes in many cases (Error: NAN in Fortran - means divided by zero). I
> believe that I set up LAM ok (recon, lamboot and mpirun run well). Would
> you advice me something?
I think the error comes from some specific nodes in your cluster. When the
program runs ok on 4 nodes, does it run fine on the other 2 nodes?
Have you tried that?
--
Le Dinh An
Isn't it nice that people who prefer Los Angeles to San Francisco live
there?
-- Herb Caen
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|