Just to double check -- the MPI_COMM_WORLD rank 0 is running on the
same node as mpirun, right?
On Feb 4, 2005, at 4:25 PM, Jonathan Herriott wrote:
> Well, the interesting thing is it is stopping on my first read from
> the standard input stream. A file is being redirected through the
> standard input stream, and the file is full of lines of text, and yet
> it is getting stuck.
>
> --
> Jonathan Herriott
> Architecture and Performance Group
> Apple Computer, Inc.
>
> On Feb 3, 2005, at 2:14 PM, Jeff Squyres wrote:
>
>> Sorry -- I should have been more clear.
>>
>> mpirun does not execute this function -- only the MPI processes
>> execute this function. rpwait() is LAM's internal function for
>> "remote process wait" -- it's waiting for the MPI processes to
>> complete.
>>
>> Can you attach gdb to the running MPI processes and see where they
>> are stuck?
>>
>>
>> On Feb 3, 2005, at 5:11 PM, Jonathan Herriott wrote:
>>
>>> When I use gdb, it seems to stop up on line 823 of mpirun.c. The
>>> line reads "if (rpwait(&nodeid, &pid, &status))"
>>>
>>> --
>>> Jonathan Herriott
>>> Architecture and Performance Group
>>> Apple Computer, Inc.
>>>
>>> On Feb 3, 2005, at 7:18 AM, Jeff Squyres wrote:
>>>
>>>> Something sounds quite wrong here -- the lam_tv_load_type_defs()
>>>> function is a dummy function that is essentially a no-op, and is
>>>> only included so that the linker pulls in relevant symbols.
>>>> Indeed, here's the code for that function:
>>>>
>>>> -----
>>>> void *
>>>> lam_tv_load_type_defs(void)
>>>> {
>>>> static void *dummy[11];
>>>>
>>>> /* Referencing the above variables needed for loading type
>>>> definitions in TotalView so that compiler does not optimize
>>>> them
>>>> out. */
>>>>
>>>> dummy[0] = &dummy_req;
>>>> dummy[1] = &dummy_comm;
>>>> dummy[2] = &dummy_group;
>>>> dummy[3] = &dummy_proc;
>>>> dummy[4] = &dummy_gps;
>>>> dummy[5] = &dummy_ah_desc;
>>>> dummy[6] = &dummy_al_desc;
>>>> dummy[7] = &dummy_al_head;
>>>> dummy[8] = &dummy_msg;
>>>> dummy[9] = &dummy_cid;
>>>> dummy[10] = &dummy_envl;
>>>>
>>>> return dummy;
>>>> }
>>>> -----
>>>>
>>>> All the "dummy" variables are instantiated earlier in the file.
>>>>
>>>> So if a thread is blocking in this function, there is something
>>>> wrong with the installation. Can you attach a debugger to see
>>>> where exactly it is blocking?
>>>>
>>>>
>>>> On Feb 2, 2005, at 3:42 PM, Jonathan Herriott wrote:
>>>>
>>>>> Well, you were right about it being a spinlock issue (95% of the
>>>>> profile) when running two threads. The problem is being spent in
>>>>> the function lam_tv_load_type_defs. I'll include the shark
>>>>> profile. I also tried leaving the program running over night on
>>>>> two threads, which it should finish around 430s, but after 17
>>>>> hours, it was still running. Both processors are being used, but
>>>>> only one thread is active and being passed between the two. The
>>>>> other thread starts up and then doesn't do anything. There was no
>>>>> use in trying to do it with one thread since the thread stays
>>>>> inactive. On another note, which version of LAM/MPI uses the
>>>>> mpirun_ssh command if any does at all?
>>>>>
>>>>> <LAM_Thr2.mshark>
>>>>>
>>>>> --
>>>>> Jonathan Herriott
>>>>> Architecture and Performance Group
>>>>> Apple Computer, Inc.
>>>>> 408-974-5931_______________________________________________
>>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>> --
>>>> {+} Jeff Squyres
>>>> {+} jsquyres_at_[hidden]
>>>> {+} http://www.lam-mpi.org/
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>> --
>> {+} Jeff Squyres
>> {+} jsquyres_at_[hidden]
>> {+} http://www.lam-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|