LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Nguyen Hai Chau (hcnguyen_at_[hidden])
Date: 2001-02-10 01:04:34


Thanks An. I have tried as you said but my program was running well in the
other 2 nodes. I will try again to find reason. However as Jeff Squyres's
opinion, I upgraded LAM to 6.3.2. LAM 6.3.1 has some known bugs. Do you have
other ideas?

Nguyen Hai Chau

-----Original Message-----
From: lam-admin_at_[hidden] [mailto:lam-admin_at_[hidden]]On Behalf Of Le
Dinh An
Sent: Saturday, February 10, 2001 10:55 AM
To: lam_at_[hidden]
Subject: Re: LAM: Help

On Fri, 9 Feb 2001, Nguyen Hai Chau wrote:

> Dear All,
>
> I run my Molecular Application (MD) on LAM 6.3.1. My cluster has 6 nodes
> running Linux RedHat 6.2. We use 100Mbps Ethernet network connection.
> The program runs ok on 2 and 4 nodes but generated error when running on
> 6 nodes in many cases (Error: NAN in Fortran - means divided by zero). I
> believe that I set up LAM ok (recon, lamboot and mpirun run well). Would
> you advice me something?

I think the error comes from some specific nodes in your cluster. When the
program runs ok on 4 nodes, does it run fine on the other 2 nodes?

Have you tried that?

--
Le Dinh An
Isn't it nice that people who prefer Los Angeles to San Francisco live
there?
		-- Herb Caen
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/