LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: 460853_at_[hidden]
Date: 2006-11-21 12:55:22


Yey!! That was it! It's working now.

Thank you very much!

Quoting Tim Prins <tprins_at_[hidden]>:

> The problem is quite simple. The program you gave is incorrect, and
> the remote process (rank 1) is segfaulting.
>
> The problem is that you are trying to receive into 'message', which
> points to a constant. Thus when you call MPI_Recv, it cannot write
> into the buffer. One (somewhat silly) way to fix this is below.
>
> Tim
>
> On Nov 20, 2006, at 12:24 PM, 460853_at_[hidden] wrote:
>
>> Hello everyone
>>
>> Well, at first, thank you for answering. I'd also like to apologize
>> for not
>> having been able to write earlier, but some family dutys kept me
>> out of all
>> this for a while.
>>
>> Next, I'd like to say that the trouble I asked about in my previous
>> mail has
>> been solved by disabling the Firewall so, certainly, that was the
>> problem. The
>> thing is that now, I'm having another trouble.
>>
>> After disabling the firewall, and managing to set the environemnt
>> up, I looked
>> in the Internet for a very simple program (actually, a "Hello World")
>> done with
>> MPI:
>>
>>
>> ---------------------prueba.c ------------------
>> /* C Example */
>> #include <stdio.h>
>> #include <mpi.h>
>> #include <math.h>
>>
>>
>> void
>> main (argc, argv)
>> int argc;
>> char *argv[];
>> {
>> /* char *message = "Hello world"; */
> char message[12];
>> int rank, size, i, tag, node;
>> MPI_Status status;
> strncpy(message, "Hello world", 12);
>
>> MPI_Init (&argc, &argv); /* starts MPI */
>> MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current
>> process id */
>> MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of
>> processes */
>> tag = 100;
>>
>> if (rank == 0)
>> {
>> for (i = 1; i < size; i++)
>> {
>> MPI_Send (message, 12, MPI_CHAR, i, tag, MPI_COMM_WORLD);
>> }
>> }
>> else
>> {
>> MPI_Recv (message, 12, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
>> &status);
>> }
>>
>> printf ("node:%d %s\n", rank, message);
>> MPI_Finalize ();
>> }
>> --------------------------------------------
>>
>> I compile it with: mpicc -o prueba.exe prueba.c
>> (It's a Linux system, so I know that this of the .exe is
>> unnecessary, but
>> anyway... I did it this way in order to know which the executable
>> file is).
>> Then I place a copy of that executable in a folder which is in the
>> Path
>> in both
>> computers (preciseness in $HOME/bin/)
>>
>> Next, I start the environment properly (ehm... properly "I guess")
>> ---------------------------------------------
>> hector_at_rdp13:~/Pa aprendé/Pruebas MPI> lamboot -v lamhosts
>>
>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>
>> n-1<26498> ssi:boot:base:linear: booting n0 (155.210.155.67)
>> n-1<26498> ssi:boot:base:linear: booting n1 (155.210.155.70)
>> n-1<26498> ssi:boot:base:linear: finished
>> ----------------------------------------------
>>
>> But when I try to execute with mpirun, I get the following output:
>> ---------------------------------------------
>> hector_at_rdp13:~/bin> mpirun -v -np 2 prueba.exe
>> 26535 prueba.exe running on n0 (o)
>> 4861 prueba.exe running on n1
>> node:0 Hello world
>> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
>> Rank (1, MPI_COMM_WORLD): - main()
>> ---------------------------------------------
>>
>> It seems that node 1 (the remote node) is not working. It says it's
>> "dead". I
>> looked for this error message in Google, and I understood that what is
>> happenning is that the process is not running in the remote
>> machine. It was
>> also said that this can happen because the MPI_Finalize ();
>> instruction was
>> executed too soon. I think in this case, that can't be it, because
>> is an
>> absolutely simple program that has been downloaded from an example web
>> page, so
>> I guess it should work.
>>
>> I would also like to say that in the remote machine, after setting
>> up the
>> enviroment with the lamboot command, a "ps aux" shows (among many
>> other
>> things)
>> a lamd daemon running
>>
>> -----------------------------------
>> hector_at_venus2:~/bin> ps aux
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
>> COMMAND
>> root 1 0.0 0.0 776 304 ? S 17:24 0:00
>> init [5]
>> root 2 0.0 0.0 0 0 ? SN 17:24 0:00
>> [ksoftirqd/0]
>> [. . .]
>> hector 3743 0.0 0.0 6484 1148 ? S 17:26 0:00
>> /usr/bin/lamd -
>> -----------------------------------
>>
>> So the environement seems to be raised properly... The thing is
>> that it
>> doesn't
>> execute the program properly.
>>
>> I imagine that the solution will be quite simple, but I can't see
>> it :(
>>
>> Thank you very much in advance!!
>> //Hector
>>
>>>> 460853_at_[hidden] wrote:
>>>>> I know there's a firewall in each machine that only opens the SSH
>>>>> (22) port, so
>>>>> I guess the problem comes from that. So, what ports do I have to
>>>>> open in order
>>>>> to boot LAM?.
>>>>>
>>>>> Executing the lamboot with the -d option, I've read (among many
>>>>> other things)
>>>>> this:
>>>>>
>>>>> lamd -H 155.210.155.67 -P 6459 -n 1 -o 0 -d
>>>>>
>>>>> So, I guess that this means that the .155.70 machine should be able
>>>>> to reach the
>>>>> port 6459 in the .155.67 machine. Am I right? So the solution comes
>>>>> by opening
>>>>> the 6459 port in the .155.67 machine? Should I open this port
>>>>> also in the
>>>>> .155.70 machine? Otherwise, which ports should I open? Because I
>>>>> don't know if
>>>>> it will be enough with opening only these ports.
>>>>
>>>> All non-system (> 1024) TCP ports are needed to boot and run
>>>> LAM. In
>>>> more detail - LAM does not use any specific port numbers, but
>>>> instead
>>>> requests any random open port from the OS. Check out FAQs 17 and 18
>>>> here for some more info:
>>>>
>>>> http://www.lam-mpi.org/faq/category4.php3
>>>>
>>>> Hope this helps!
>>>>
>>>> Andrew
>>
>>
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>