On May 15, 2006, at 3:42 PM, Tom Crockett wrote:
> Jeff Squyres (jsquyres) wrote:
>> MPI_COMM_SELF
>> actually requires the lam_basic coll module; the shmem and smp
>> modules
>> are not capable of being used on COMM_SELF.
>>
>> So you really need to:
>>
>> mpirun -ssi coll lam_basic,shmem ...
>> and mpirun -ssi coll lam_basic,smp ...
>
> Jeff,
>
> Thanks for the clarification. smp works now, but I'm still having
> trouble with shmem. On a single quad-processor node, here's what I
> get
> from the verbose output (from one of the four processes -- the other
> three are similar):
<snip>
> Is there some option that will show me why shmem is not being
> selected?
> I tried increasing the coll_verbose level from 1000 to 10000, but
> the
> output was the same.
>
> A few more details: 4 GB of physical memory on the box, 8 GB of
> virtual
> memory, maximum shared memory segment size (a tunable Solaris
> parameter)
> is set to 256 MB, 20 segments available.
It doesn't look like we added any debugging information as to *why* a
component didn't select itself. This, in hindsight, was probably
silly. If you use a debugger and break on lam_ssi_coll_shmem_query()
and figure out at which point we call "return NULL", that would be
most helpful in figuring out what is going on.
My initial guess is that you are running out of system V semaphores
or shared memory. But then again, it sounds like you bumped those
limits up high enough that something else might be going on. Without
knowing where the component query function failed, I can't really
make a reasonable guess as to what is going on.
Thanks,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|