Are you running all 30 processes on one node?
Note that sysv and usysv consume system resources such as SYSV
semaphores and shared memory; the amount used is directly proportional
to how many processes are running on that node (see the LAM User's
Guide for the specifics). Running so many one one node may cause
problems.
Which test, specifically, is failing? Is it still struct_gatherv?
On Mar 10, 2005, at 1:39 AM, Ilya Lashuk wrote:
> Jeff Squyres wrote:
>
>> We're unfortunately unable to replicate your problem -- we did find a
>> problem with the struct_gatherv test (it wasn't able to handle over
>> 100 processes), but haven't been able to get alltoall2 to fail. :-(
>>
>> Does it fail on the other RPI's, or just usysv?
>
>
> It turned out that i'm unable to reproduce it, too. In all other tests
> tcp, sysv, usysv ran through tests smoothly. What concerns other
> rpi's, i haven't tested them very much. However, when i try to run one
> problem in "hypre" on, say, 30 or more processors using "sysv" or
> "usysv" it almost always crashes with a message like this:
> --------------------------------------------------------------------
> MPI_Waitany: message truncated: Input/output error (rank 4,
> MPI_COMM_WORLD)
> Rank (4, MPI_COMM_WORLD): Call stack within LAM:
> Rank (4, MPI_COMM_WORLD): - MPI_Waitany()
> Rank (4, MPI_COMM_WORLD): - MPI_Waitall()
> Rank (4, MPI_COMM_WORLD): - main
> ................................
> -------------------------------------------------------------------
>
> On the other hand, it runs fine with "tcp".
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|