Hello everyone
Well, at first, thank you for answering. I'd also like to apologize for not
having been able to write earlier, but some family dutys kept me out of all
this for a while.
Next, I'd like to say that the trouble I asked about in my previous mail has
been solved by disabling the Firewall so, certainly, that was the problem. The
thing is that now, I'm having another trouble.
After disabling the firewall, and managing to set the environemnt up, I looked
in the Internet for a very simple program (actually, a "Hello World")
done with
MPI:
---------------------prueba.c ------------------
/* C Example */
#include <stdio.h>
#include <mpi.h>
#include <math.h>
void
main (argc, argv)
int argc;
char *argv[];
{
char *message = "Hello world";
int rank, size, i, tag, node;
MPI_Status status;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
tag = 100;
if (rank == 0)
{
for (i = 1; i < size; i++)
{
MPI_Send (message, 12, MPI_CHAR, i, tag, MPI_COMM_WORLD);
}
}
else
{
MPI_Recv (message, 12, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
}
printf ("node:%d %s\n", rank, message);
MPI_Finalize ();
}
--------------------------------------------
I compile it with: mpicc -o prueba.exe prueba.c
(It's a Linux system, so I know that this of the .exe is unnecessary, but
anyway... I did it this way in order to know which the executable file is).
Then I place a copy of that executable in a folder which is in the Path
in both
computers (preciseness in $HOME/bin/)
Next, I start the environment properly (ehm... properly "I guess")
---------------------------------------------
hector_at_rdp13:~/Pa aprendé/Pruebas MPI> lamboot -v lamhosts
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<26498> ssi:boot:base:linear: booting n0 (155.210.155.67)
n-1<26498> ssi:boot:base:linear: booting n1 (155.210.155.70)
n-1<26498> ssi:boot:base:linear: finished
----------------------------------------------
But when I try to execute with mpirun, I get the following output:
---------------------------------------------
hector_at_rdp13:~/bin> mpirun -v -np 2 prueba.exe
26535 prueba.exe running on n0 (o)
4861 prueba.exe running on n1
node:0 Hello world
MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - main()
---------------------------------------------
It seems that node 1 (the remote node) is not working. It says it's "dead". I
looked for this error message in Google, and I understood that what is
happenning is that the process is not running in the remote machine. It was
also said that this can happen because the MPI_Finalize (); instruction was
executed too soon. I think in this case, that can't be it, because is an
absolutely simple program that has been downloaded from an example web
page, so
I guess it should work.
I would also like to say that in the remote machine, after setting up the
enviroment with the lamboot command, a "ps aux" shows (among many other
things)
a lamd daemon running
-----------------------------------
hector_at_venus2:~/bin> ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 776 304 ? S 17:24 0:00 init [5]
root 2 0.0 0.0 0 0 ? SN 17:24 0:00 [ksoftirqd/0]
[. . .]
hector 3743 0.0 0.0 6484 1148 ? S 17:26 0:00
/usr/bin/lamd -
-----------------------------------
So the environement seems to be raised properly... The thing is that it
doesn't
execute the program properly.
I imagine that the solution will be quite simple, but I can't see it :(
Thank you very much in advance!!
//Hector
>> 460853_at_[hidden] wrote:
>>> I know there's a firewall in each machine that only opens the SSH
>>> (22) port, so
>>> I guess the problem comes from that. So, what ports do I have to
>>> open in order
>>> to boot LAM?.
>>>
>>> Executing the lamboot with the -d option, I've read (among many
>>> other things)
>>> this:
>>>
>>> lamd -H 155.210.155.67 -P 6459 -n 1 -o 0 -d
>>>
>>> So, I guess that this means that the .155.70 machine should be able
>>> to reach the
>>> port 6459 in the .155.67 machine. Am I right? So the solution comes
>>> by opening
>>> the 6459 port in the .155.67 machine? Should I open this port also in the
>>> .155.70 machine? Otherwise, which ports should I open? Because I
>>> don't know if
>>> it will be enough with opening only these ports.
>>
>> All non-system (> 1024) TCP ports are needed to boot and run LAM. In
>> more detail - LAM does not use any specific port numbers, but instead
>> requests any random open port from the OS. Check out FAQs 17 and 18
>> here for some more info:
>>
>> http://www.lam-mpi.org/faq/category4.php3
>>
>> Hope this helps!
>>
>> Andrew
|