The problem is quite simple. The program you gave is incorrect, and
the remote process (rank 1) is segfaulting.
The problem is that you are trying to receive into 'message', which
points to a constant. Thus when you call MPI_Recv, it cannot write
into the buffer. One (somewhat silly) way to fix this is below.
Tim
On Nov 20, 2006, at 12:24 PM, 460853_at_[hidden] wrote:
> Hello everyone
>
> Well, at first, thank you for answering. I'd also like to apologize
> for not
> having been able to write earlier, but some family dutys kept me
> out of all
> this for a while.
>
> Next, I'd like to say that the trouble I asked about in my previous
> mail has
> been solved by disabling the Firewall so, certainly, that was the
> problem. The
> thing is that now, I'm having another trouble.
>
> After disabling the firewall, and managing to set the environemnt
> up, I looked
> in the Internet for a very simple program (actually, a "Hello World")
> done with
> MPI:
>
>
> ---------------------prueba.c ------------------
> /* C Example */
> #include <stdio.h>
> #include <mpi.h>
> #include <math.h>
>
>
> void
> main (argc, argv)
> int argc;
> char *argv[];
> {
> /* char *message = "Hello world"; */
char message[12];
> int rank, size, i, tag, node;
> MPI_Status status;
strncpy(message, "Hello world", 12);
> MPI_Init (&argc, &argv); /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current
> process id */
> MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
> processes */
> tag = 100;
>
> if (rank == 0)
> {
> for (i = 1; i < size; i++)
> {
> MPI_Send (message, 12, MPI_CHAR, i, tag, MPI_COMM_WORLD);
> }
> }
> else
> {
> MPI_Recv (message, 12, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
> &status);
> }
>
> printf ("node:%d %s\n", rank, message);
> MPI_Finalize ();
> }
> --------------------------------------------
>
> I compile it with: mpicc -o prueba.exe prueba.c
> (It's a Linux system, so I know that this of the .exe is
> unnecessary, but
> anyway... I did it this way in order to know which the executable
> file is).
> Then I place a copy of that executable in a folder which is in the
> Path
> in both
> computers (preciseness in $HOME/bin/)
>
> Next, I start the environment properly (ehm... properly "I guess")
> ---------------------------------------------
> hector_at_rdp13:~/Pa aprendé/Pruebas MPI> lamboot -v lamhosts
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> n-1<26498> ssi:boot:base:linear: booting n0 (155.210.155.67)
> n-1<26498> ssi:boot:base:linear: booting n1 (155.210.155.70)
> n-1<26498> ssi:boot:base:linear: finished
> ----------------------------------------------
>
> But when I try to execute with mpirun, I get the following output:
> ---------------------------------------------
> hector_at_rdp13:~/bin> mpirun -v -np 2 prueba.exe
> 26535 prueba.exe running on n0 (o)
> 4861 prueba.exe running on n1
> node:0 Hello world
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - main()
> ---------------------------------------------
>
> It seems that node 1 (the remote node) is not working. It says it's
> "dead". I
> looked for this error message in Google, and I understood that what is
> happenning is that the process is not running in the remote
> machine. It was
> also said that this can happen because the MPI_Finalize ();
> instruction was
> executed too soon. I think in this case, that can't be it, because
> is an
> absolutely simple program that has been downloaded from an example web
> page, so
> I guess it should work.
>
> I would also like to say that in the remote machine, after setting
> up the
> enviroment with the lamboot command, a "ps aux" shows (among many
> other
> things)
> a lamd daemon running
>
> -----------------------------------
> hector_at_venus2:~/bin> ps aux
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
> COMMAND
> root 1 0.0 0.0 776 304 ? S 17:24 0:00
> init [5]
> root 2 0.0 0.0 0 0 ? SN 17:24 0:00
> [ksoftirqd/0]
> [. . .]
> hector 3743 0.0 0.0 6484 1148 ? S 17:26 0:00
> /usr/bin/lamd -
> -----------------------------------
>
> So the environement seems to be raised properly... The thing is
> that it
> doesn't
> execute the program properly.
>
> I imagine that the solution will be quite simple, but I can't see
> it :(
>
> Thank you very much in advance!!
> //Hector
>
>>> 460853_at_[hidden] wrote:
>>>> I know there's a firewall in each machine that only opens the SSH
>>>> (22) port, so
>>>> I guess the problem comes from that. So, what ports do I have to
>>>> open in order
>>>> to boot LAM?.
>>>>
>>>> Executing the lamboot with the -d option, I've read (among many
>>>> other things)
>>>> this:
>>>>
>>>> lamd -H 155.210.155.67 -P 6459 -n 1 -o 0 -d
>>>>
>>>> So, I guess that this means that the .155.70 machine should be able
>>>> to reach the
>>>> port 6459 in the .155.67 machine. Am I right? So the solution comes
>>>> by opening
>>>> the 6459 port in the .155.67 machine? Should I open this port
>>>> also in the
>>>> .155.70 machine? Otherwise, which ports should I open? Because I
>>>> don't know if
>>>> it will be enough with opening only these ports.
>>>
>>> All non-system (> 1024) TCP ports are needed to boot and run
>>> LAM. In
>>> more detail - LAM does not use any specific port numbers, but
>>> instead
>>> requests any random open port from the OS. Check out FAQs 17 and 18
>>> here for some more info:
>>>
>>> http://www.lam-mpi.org/faq/category4.php3
>>>
>>> Hope this helps!
>>>
>>> Andrew
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|