It's a one node system with 2 processors (an XServe), so everything is
running on the same system.
--
Jonathan Herriott
Architecture and Performance Group
Apple Computer, Inc.
On Feb 14, 2005, at 6:24 AM, Jeff Squyres wrote:
> Just to double check -- the MPI_COMM_WORLD rank 0 is running on the
> same node as mpirun, right?
>
> On Feb 4, 2005, at 4:25 PM, Jonathan Herriott wrote:
>
>> Well, the interesting thing is it is stopping on my first read from
>> the standard input stream. A file is being redirected through the
>> standard input stream, and the file is full of lines of text, and yet
>> it is getting stuck.
>>
>> --
>> Jonathan Herriott
>> Architecture and Performance Group
>> Apple Computer, Inc.
>>
>> On Feb 3, 2005, at 2:14 PM, Jeff Squyres wrote:
>>
>>> Sorry -- I should have been more clear.
>>>
>>> mpirun does not execute this function -- only the MPI processes
>>> execute this function. rpwait() is LAM's internal function for
>>> "remote process wait" -- it's waiting for the MPI processes to
>>> complete.
>>>
>>> Can you attach gdb to the running MPI processes and see where they
>>> are stuck?
>>>
>>>
>>> On Feb 3, 2005, at 5:11 PM, Jonathan Herriott wrote:
>>>
>>>> When I use gdb, it seems to stop up on line 823 of mpirun.c. The
>>>> line reads "if (rpwait(&nodeid, &pid, &status))"
>>>>
>>>> --
>>>> Jonathan Herriott
>>>> Architecture and Performance Group
>>>> Apple Computer, Inc.
>>>>
>>>> On Feb 3, 2005, at 7:18 AM, Jeff Squyres wrote:
>>>>
>>>>> Something sounds quite wrong here -- the lam_tv_load_type_defs()
>>>>> function is a dummy function that is essentially a no-op, and is
>>>>> only included so that the linker pulls in relevant symbols.
>>>>> Indeed, here's the code for that function:
>>>>>
>>>>> -----
>>>>> void *
>>>>> lam_tv_load_type_defs(void)
>>>>> {
>>>>> static void *dummy[11];
>>>>>
>>>>> /* Referencing the above variables needed for loading type
>>>>> definitions in TotalView so that compiler does not optimize
>>>>> them
>>>>> out. */
>>>>>
>>>>> dummy[0] = &dummy_req;
>>>>> dummy[1] = &dummy_comm;
>>>>> dummy[2] = &dummy_group;
>>>>> dummy[3] = &dummy_proc;
>>>>> dummy[4] = &dummy_gps;
>>>>> dummy[5] = &dummy_ah_desc;
>>>>> dummy[6] = &dummy_al_desc;
>>>>> dummy[7] = &dummy_al_head;
>>>>> dummy[8] = &dummy_msg;
>>>>> dummy[9] = &dummy_cid;
>>>>> dummy[10] = &dummy_envl;
>>>>>
>>>>> return dummy;
>>>>> }
>>>>> -----
>>>>>
>>>>> All the "dummy" variables are instantiated earlier in the file.
>>>>>
>>>>> So if a thread is blocking in this function, there is something
>>>>> wrong with the installation. Can you attach a debugger to see
>>>>> where exactly it is blocking?
>>>>>
>>>>>
>>>>> On Feb 2, 2005, at 3:42 PM, Jonathan Herriott wrote:
>>>>>
>>>>>> Well, you were right about it being a spinlock issue (95% of the
>>>>>> profile) when running two threads. The problem is being spent in
>>>>>> the function lam_tv_load_type_defs. I'll include the shark
>>>>>> profile. I also tried leaving the program running over night on
>>>>>> two threads, which it should finish around 430s, but after 17
>>>>>> hours, it was still running. Both processors are being used, but
>>>>>> only one thread is active and being passed between the two. The
>>>>>> other thread starts up and then doesn't do anything. There was
>>>>>> no use in trying to do it with one thread since the thread stays
>>>>>> inactive. On another note, which version of LAM/MPI uses the
>>>>>> mpirun_ssh command if any does at all?
>>>>>>
>>>>>> <LAM_Thr2.mshark>
>>>>>>
>>>>>> --
>>>>>> Jonathan Herriott
>>>>>> Architecture and Performance Group
>>>>>> Apple Computer, Inc.
>>>>>> 408-974-5931_______________________________________________
>>>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>>
>>>>> --
>>>>> {+} Jeff Squyres
>>>>> {+} jsquyres_at_[hidden]
>>>>> {+} http://www.lam-mpi.org/
>>>>>
>>>>> _______________________________________________
>>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>>
>>>>
>>>> _______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>>
>>>
>>> --
>>> {+} Jeff Squyres
>>> {+} jsquyres_at_[hidden]
>>> {+} http://www.lam-mpi.org/
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|