LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shihab Choudhury (shihab1_at_[hidden])
Date: 2004-02-10 10:21:46


Hi All,

I have a question about a problem that I am not able to
find any clue.

The problem is like this.

I started the lam daemon with lamboot command at 4 nodes.
Then I submit a job to all of those nodes. The master sends
a value to the nodes by broadcast. All nodes calculate a
routine depending on that value. Now after some time I want
to reduce the number of nodes from 4 to 3. So I use the
lamshrink command to remove node 4. After that I see my
program stuck and crashes with an error saying receive
failed etc. My question is what I am doing wrong ? Is not
that lamshrink command should allow me to reduce the number
of calculating nodes with out any problem ?

I even tried to give the command from the program itself
but no improvement.

Am I doing something wrong ?

Tanks for your attention.

Shihab
U. of Windsor