LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Ming Wu (wuming_at_[hidden])
Date: 2004-03-31 00:57:27


Hi, thanks a lot for help.

In your opinion, what factors may cause the idle time? I just observed that the machine is idle in most of time during two processes running on the same machine. That is the reason that why even the computation time of two processes is increased by 5 times instead of 2 times. But I am not sure what cause system idle when 2 processes are running. If it is caused by communication, the communication cost should at least close to the computation cost if not bigger. However, the experiment result shows that the communication cost is still a small part of the total application execution time.

----- Original Message -----
From: Mohammed ELKANOUNI <elkanouni_at_[hidden]>
Date: Tuesday, March 30, 2004 4:14 pm
Subject: Re: LAM: is that possible to reduce the communication cost byassigning processes on the same node

> In parallel program we define the acceleration which is the qution
> betweenthe sequential execution time (ts) of alogorithm and
> parallel execution time
> (tp) then we note : S=ts/tp
> More S is big more parallelism is good, but S never depasses p
> (processorsnumber)
> tp is composed from communication time (tcom), iddle time (tiddle) and
> execution time (texe).
> tp=(tcom+tidle+texe)/p
> Ofcourse texe=ts
> Then S=p*ts/(tcomm+tidle+texe)=p*ts/(tcomm+tidle+ts)
> Now we want to know which is performant parallel or sequetial
> algorithm ? I
> think we cant respond immediately, all depends on the behaviour of
> S with
> the number of processor and the problem size, if S>1 then parallel
> algortihmis more performant than sequential algorithm, if S<1 then
> sequentialalgorithm is more performant than parallel algortihm
> (that means that: it is
> better to execute sequential algorithm than parallel beacause tcom and
> tiddle are bigger)
>
> Exemple:
> We have parallel sommation, suppose that tcomm=0.01*p (when we
> want to somme
> the partial somms we use MPI_Reduce, the execution time is
> proportionnalwith the number of processes), ts= 0.0001*n (n:
> number of loop) suppose that
> tiddle=0, then
> S=0.0001*n/(0.01+0+0.0001*n/p) with a fixed p=10, we find that for
> n=111 we
> have a balance. If n<111 sequential is good else if n>111 parallel
> is good
> (but not enough good), if n>>>111 (parallel is more good)
>
>
>
>
>
>
>
>
>
> ----- Original Message -----
> From: <dburbano_at_[hidden]>
> To: <lam_at_[hidden]>
> Sent: Tuesday, March 30, 2004 7:50 AM
> Subject: Re: LAM: is that possible to reduce the communication cost
> byassigning processes on the same node
>
>
> > When you run multiple processes in a processor and they want to
> > communicate between them, the only way or the only form that
> they can
> > communicate or do some computation is with the same processor,
> for that
> > reason take more time than in many processors.
> >
> > for example, I have 4 processes and only one processor, they
> have to do
> > some computation and some communication (reduce the information
> to process
> > 0). The processes can not communicate betwem them after do their
> > computation; so, each process (p0, p1, p2, p3) is executed in the
> > processor. The processor can not execute 2 processes at same
> time, it can
> > do by time slice. For that reason, the last processes that are
> in the
> > queue have to wait until the first processes free the processor.
> >
> > This happens in the communications, the processes need the
> processor to
> > send or receive information, if you are going to use MPI_Reduce,
> and the
> > process p0 is the last to use the processor, the first processes
> should> send the information to the process p0 but the process 0
> should not ready.
> >
> > now, what happen if you have four processors and four processes?
> There is
> > one processor for each process; then, the computation is
> executed at same
> > time, they don't share their processors, and they don't wait to
> use a
> > processor. When they finish their computation, they start to
> send the
> > information to other. Using MPI_Reduce, they start to send the
> > information to one process (for example p0) that belong to a
> processor.> In this case the the process that is receiving the
> information is ready to
> > receive the data from the others (some times is not ready but
> depend on
> > many characteristics);and the others are sending the information
> at same
> > time (depend on hardware configuration).
> >
> > This one reason from many that the communication and computation
> time in
> > an environment of many processes and processors is less than many
> > processes in one processor.
> >
> >
> > This is good link:
> >
> > http://www.cs.rit.edu/~ncs/parallel.html#books
> >
> > thanks
> >
> >
> >
> > > Hi,
> > >
> > > I run them on a cluster of machines when I say processes
> assigned to
> > > multiple machines. It is obvious that the computation cost
> will be
> > > increased if multiple processes running on the same host. But
> why the
> > > communication cost is also increased a lot? Could you please
> give a
> > > detail explaination?
> > >
> > > thanks
> > >
> > > ----- Original Message -----
> > > From: Roberto Pasianot <pasianot_at_[hidden]>
> > > Date: Monday, March 29, 2004 1:25 pm
> > > Subject: Re: LAM: is that possible to reduce the communication
> cost by
> > > assigning processes on the same node
> > >
> > >>
> > >> Hi,
> > >>
> > >> This might be a stupid question but anyway : are you running on
> > >> a multiprocessor ?. Otherwise what you get is exactly what
> should be
> > >> expected.
> > >>
> > >> Bye. Roberto.
> > >>
> > >>
> > >> On Mon, 29 Mar 2004, Ming Wu wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I assigned multiple processes on one machine instead of several
> > >> machines.
> > >> > In this way, I expect the communication cost will be reduced
> > >> compared> with assigning them to several machines. However, the
> > >> result is weird.
> > >> > both computation cost and communication cost are increased
> sharply,> >> especially MPI_Reduce, MPI_Sendrecv. It seems that the
> > >> underlying
> > >> > implementation of lam_mpi doesn't favor multiple processes on
> > >> the same
> > >> > host.
> > >> >
> > >> > Your help will be greatly appreciated.
> > >> >
> > >> > thanks
> > >> >
> > >> > _______________________________________________
> > >> > This list is archived at http://www.lam-
> mpi.org/MailArchives/lam/> >> >
> > >>
> > >> _______________________________________________
> > >> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >>
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>