Sorry -- I should have been more clear.
mpirun does not execute this function -- only the MPI processes execute
this function. rpwait() is LAM's internal function for "remote process
wait" -- it's waiting for the MPI processes to complete.
Can you attach gdb to the running MPI processes and see where they are
stuck?
On Feb 3, 2005, at 5:11 PM, Jonathan Herriott wrote:
> When I use gdb, it seems to stop up on line 823 of mpirun.c. The line
> reads "if (rpwait(&nodeid, &pid, &status))"
>
> --
> Jonathan Herriott
> Architecture and Performance Group
> Apple Computer, Inc.
>
> On Feb 3, 2005, at 7:18 AM, Jeff Squyres wrote:
>
>> Something sounds quite wrong here -- the lam_tv_load_type_defs()
>> function is a dummy function that is essentially a no-op, and is only
>> included so that the linker pulls in relevant symbols. Indeed,
>> here's the code for that function:
>>
>> -----
>> void *
>> lam_tv_load_type_defs(void)
>> {
>> static void *dummy[11];
>>
>> /* Referencing the above variables needed for loading type
>> definitions in TotalView so that compiler does not optimize them
>> out. */
>>
>> dummy[0] = &dummy_req;
>> dummy[1] = &dummy_comm;
>> dummy[2] = &dummy_group;
>> dummy[3] = &dummy_proc;
>> dummy[4] = &dummy_gps;
>> dummy[5] = &dummy_ah_desc;
>> dummy[6] = &dummy_al_desc;
>> dummy[7] = &dummy_al_head;
>> dummy[8] = &dummy_msg;
>> dummy[9] = &dummy_cid;
>> dummy[10] = &dummy_envl;
>>
>> return dummy;
>> }
>> -----
>>
>> All the "dummy" variables are instantiated earlier in the file.
>>
>> So if a thread is blocking in this function, there is something wrong
>> with the installation. Can you attach a debugger to see where
>> exactly it is blocking?
>>
>>
>> On Feb 2, 2005, at 3:42 PM, Jonathan Herriott wrote:
>>
>>> Well, you were right about it being a spinlock issue (95% of the
>>> profile) when running two threads. The problem is being spent in
>>> the function lam_tv_load_type_defs. I'll include the shark profile.
>>> I also tried leaving the program running over night on two threads,
>>> which it should finish around 430s, but after 17 hours, it was still
>>> running. Both processors are being used, but only one thread is
>>> active and being passed between the two. The other thread starts up
>>> and then doesn't do anything. There was no use in trying to do it
>>> with one thread since the thread stays inactive. On another note,
>>> which version of LAM/MPI uses the mpirun_ssh command if any does at
>>> all?
>>>
>>> <LAM_Thr2.mshark>
>>>
>>> --
>>> Jonathan Herriott
>>> Architecture and Performance Group
>>> Apple Computer, Inc.
>>> 408-974-5931_______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>> --
>> {+} Jeff Squyres
>> {+} jsquyres_at_[hidden]
>> {+} http://www.lam-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|