LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jonathan Herriott (jherriott_at_[hidden])
Date: 2005-02-04 16:25:14


Well, the interesting thing is it is stopping on my first read from the
standard input stream. A file is being redirected through the standard
input stream, and the file is full of lines of text, and yet it is
getting stuck.

--
Jonathan Herriott
Architecture and Performance Group
Apple Computer, Inc.
On Feb 3, 2005, at 2:14 PM, Jeff Squyres wrote:
> Sorry -- I should have been more clear.
>
> mpirun does not execute this function -- only the MPI processes 
> execute this function.  rpwait() is LAM's internal function for 
> "remote process wait" -- it's waiting for the MPI processes to 
> complete.
>
> Can you attach gdb to the running MPI processes and see where they are 
> stuck?
>
>
> On Feb 3, 2005, at 5:11 PM, Jonathan Herriott wrote:
>
>> When I use gdb, it seems to stop up on line 823 of mpirun.c.  The 
>> line reads "if (rpwait(&nodeid, &pid, &status))"
>>
>> --
>> Jonathan Herriott
>> Architecture and Performance Group
>> Apple Computer, Inc.
>>
>> On Feb 3, 2005, at 7:18 AM, Jeff Squyres wrote:
>>
>>> Something sounds quite wrong here -- the lam_tv_load_type_defs() 
>>> function is a dummy function that is essentially a no-op, and is 
>>> only included so that the linker pulls in relevant symbols.  Indeed, 
>>> here's the code for that function:
>>>
>>> -----
>>> void *
>>> lam_tv_load_type_defs(void)
>>> {
>>>   static void *dummy[11];
>>>
>>>   /* Referencing the above variables needed for loading type
>>>      definitions in TotalView so that compiler does not optimize them
>>>      out. */
>>>
>>>   dummy[0] = &dummy_req;
>>>   dummy[1] = &dummy_comm;
>>>   dummy[2] = &dummy_group;
>>>   dummy[3] = &dummy_proc;
>>>   dummy[4] = &dummy_gps;
>>>   dummy[5] = &dummy_ah_desc;
>>>   dummy[6] = &dummy_al_desc;
>>>   dummy[7] = &dummy_al_head;
>>>   dummy[8] = &dummy_msg;
>>>   dummy[9] = &dummy_cid;
>>>   dummy[10] = &dummy_envl;
>>>
>>>   return dummy;
>>> }
>>> -----
>>>
>>> All the "dummy" variables are instantiated earlier in the file.
>>>
>>> So if a thread is blocking in this function, there is something 
>>> wrong with the installation.  Can you attach a debugger to see where 
>>> exactly it is blocking?
>>>
>>>
>>> On Feb 2, 2005, at 3:42 PM, Jonathan Herriott wrote:
>>>
>>>> Well, you were right about it being a spinlock issue (95% of the 
>>>> profile) when running two threads.  The problem is being spent in 
>>>> the function lam_tv_load_type_defs.  I'll include the shark 
>>>> profile.  I also tried leaving the program running over night on 
>>>> two threads, which it should finish around 430s, but after 17 
>>>> hours, it was still running.  Both processors are being used, but 
>>>> only one thread is active and being passed between the two.  The 
>>>> other thread starts up and then doesn't do anything.  There was no 
>>>> use in trying to do it with one thread since the thread stays 
>>>> inactive.  On another note, which version of LAM/MPI uses the 
>>>> mpirun_ssh command if any does at all?
>>>>
>>>> <LAM_Thr2.mshark>
>>>>
>>>> --
>>>> Jonathan Herriott
>>>> Architecture and Performance Group
>>>> Apple Computer, Inc.
>>>> 408-974-5931_______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>> -- 
>>> {+} Jeff Squyres
>>> {+} jsquyres_at_[hidden]
>>> {+} http://www.lam-mpi.org/
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> -- 
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>