If I undersatnd correctly you are using only mpi tasks
so for example you would have
8 mpitasks in one node
or 8 mpitasks 1 node + 8 mpitasks 2 node
unfortunately the tasks have to communicate between the nodes and that communication time is what creates
the delay (Elapsed time) as you have already noticed instead the CPU time decreases.
The solution to that is to use different systems of parallelization inside the nodes (internode) and between the nodes
(intranode)
Inside the nodes you have basically a shared memory situation and to have many mpitasks is normally not a good idea you should rather have the Threads processes inside nodes and mpi tasks in different nodes
Therefore if you have two nodes with 8 cores each you should have:
node 1 : 1 MPI task 8 threads
node 2 1 MPI task 8 threads
LAM MPI does support the installation with threads but the best solution would be to install OpenMP that has extended and reliable Threads support
Hope this helps
Regards
Roberto Scipioni
ICYS Researcg Fellow
ICYS-CLUSTER Manager
Dear All,
Â
I successfully construct a mpi environment in our cluster using the lam-7.1.4 package. After I use the "lamboot" command to invoke the computing nodes, I excecute the parrallel program "VASP" like this: "mpirun -np 16 ./vasp". When the computing has finished, I always get such an error : "bufferd (getroute): invalid node" . I don't know what can cause this problem ?
Â
Another serious problem that may be related with the setting of lam-mpi is : when I use one node (with 8 cpu cores inside), the computing become faster with the increase of the cpu core.  But when I use over one node, the computing become slower with the increase of the node. This is really a headache problem.
Â
The hardware and software configurations one each node are listed below:
Intel Xeon E5420 2.5G CPU (2*4 cores), 4G Memory,146G Diskspace, 1G Networkcard and 1G Exchanger
Suse Linux 10.0
Intel fortran compiler 10.1.021
Lam-mpi 7.1.4
BLAS: Supplied by Intel MKL 10.1.0.015
LAPACK: Supplied by Intel MKL 10.1.0.015
Here is a time result of a VASP bench file:
Â
one node (8 cores):
Total CPU time used (sec): 19.897
User time (sec): 19.717
System time (sec): 0.180
Elapsed time (sec): 19.916
Two nodes (16cores):
Total CPU time used (sec): 15.069
User time (sec): 11.309
System time (sec): 3.760
Elapsed time (sec): 91.696
It is obvious that the total cpu time is decreased, but the elapsed time is largely increased. I have check the occupied ratio of each CPU: when running on one node, the value is almost 100% ; while on two nodes, the value is less than 20%.
Â
Why computing with two nodes is slower than one node ? Can anyone give me a solution ?
Â
I really need your help. Thanks in advance!
Â
Â
With best regards
Â
Fanghz
Â
___________________________________________________________
好ç©è´ºå¡çä½ åï¼é®ç®±è´ºå¡å
¨æ°ä¸çº¿ï¼
http://card.mail.cn.yahoo.com/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|