LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: jess michelsen (jam_at_[hidden])
Date: 2003-10-27 06:36:05


Hi Jeff, I will try to be a little more clear:

On Sun, 2003-10-26 at 13:33, Jeff Squyres wrote:
>
> > In order to test, whether I have the right latency and bandwidth in my
> > bi-directional isend/irecv communications (Gigabit), I've put together a
> > simple fortran program, as seen below. For small packet sizes, I get
> > exactly the same timings (2*latency) as seen with NetPipe. For larger
> > packets (up to 64Kb), I get almost (95%) the same bandwidth as seen with
> > NetPipe (isn't NetPipe sending the packets uni-directionally?).
>
> IIRC, NetPIPE's latency and bandwidth measurements are all ping-pong
> divided by two. I'm not going to swear to this :-), but it would make
> sense with the numbers you're seeing.
>
> > However, once in a while during the test, one of the execution nodes
> > 'hang'. It's even impossible to ssh to the node - so the power button is
> > the only means of communication(!)
>
> When you say that you can't ssh to the node, what exactly happens? Does
> ssh time out, or give "no route to host"?

I simply get a "no route to host"
>
> > My question is now: could this be a buffer issue (buffered send with a
> > really big buffer didn't work better - only slower) -
>
> MPI buffered sends are generally not a good idea; they force the MPI to
> use an additional memory copy.
>
> > or could there be a hardware flaw - or should I do the communication in
> > another fashion?
>
> Do you see the same kind of hangups when you run netpipe? Have you run
> the TCP and MPI versions of netpipe? For example, if this happens even
> during the TCP version of netpipe, that could be indicative of a device
> driver and/or hardware issue.

NetPipe doesn't show the same kind of 'hangs' - (could my much higher
number of repeats matter?) I have worked only with the MPI version of
NetPipe, can TCP be employed without recabling (back-to-back with
crossed cable)?

Best regards, Jess Michelsen.