Hi,
I have LAM running on 4 nodes via 1 controll node, here is the lamhost
file:
controller schedule=no
node1 cpu=8
node2 cpu=8
node3 cpu=8
node4 cpu=8
lamboot starts fine, if i run lamnodes i can see each node in the list!
I can successfully launch the application i'm running via mpirun like
this:
Mpirun -np 5 -wd WORKING_DIR PROGRAM_NAME
This launches and i can see that on node1 5 processes are loaded (FYI im
using LAM for FDS).
If i look at TOP i can see 5 new proceses with the PROGRAM_NAME
launched, they complete and all works fine!
If i run another job at the same time, that for example has more than 4
processes, they are all launched on node1, It looks like LAM is not
scheduling the processes on the other machines when the CPU count runs
out. What am i doing wrong??
Leon Yuhanov
|