LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shi Jin (jinzishuai_at_[hidden])
Date: 2004-02-02 16:28:24


Hi, there

We have upgraded from LAM/MPI version 6.5.9 to 7.0.4
for our ia32-linux-cluster for a while. There are two
problems troubling us. We have 16 nodes, 2 CPUs(4 with
hyperthreading) per node.

1. My code was running well with the old version, but
with the latest version, sometimes I would get NaN
expections depending on each compilation. I could just
add a print statment to make it work and also change
some optimization switch to make it crash. First I
though the problem is in my own coding. But then I
found out the crashing code would just work fine when
running 1 process per node. So I begin to think of
the new SMP feature introduced by version 7.0.x. I
digged into the manual and found a way to test it:
I run my code by
mpirun -ssi coll lam_basic C <mypro>
Then everything is fine, no matter how many processes
on each node.
I think in this way I disabled the SMP collective
communication which is newly introduced. Note we use a
lot of collective communication such as broadcast and
gather in our code.
Does this mean there is some problem in LAM?

2. This one sounds more strange for me. My colleague
has a code using mpi+fftw. He has a lot of all-to-all
commands. After the upgrading, he found out his
results are different each instance he runs the code,
which is never expected.

I don't know whether this two problems are connected.
I tried to use my previous solution for his code and
it is still not working.

Could somebody help us out?
Thank you very much.

Shi Jin

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/