LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Sims, James S. Dr. (james.sims_at_[hidden])
Date: 2009-06-26 18:41:58


Thanks Mac. I think this helps. I am running the 64 bit version,
but here is a detailed comparison of what works and what doesn't.
If I do a qsub -I -l nodes=1:ppn=2
lamboot
mpirun -np 2 MPI_li_64
in the torque/pbs environment, the code dies with
PID 10261 failed on node n0 (10.2.1.54) due to signal 11.

If on the other hand, I don't use torque but run the same
example,
mpirun -np 2 MPI_li_64, the job runs. So I think it is
something about the PBS environment that is causing the
problem. I've examined the environment varibles in these
two cases and the only difference I see are some PBS
environment variables that aren't there when I'm not using
torque. But nothing obviously a problem. It does sound like
something like the PBS environment is doing something like
invoking a 32 bit library where a 64 bit library is required.
What exactly does torque do with the job that isn't done otherwise?

Jim

----------------------------------------------------------------------------------------
From: McCalla, Mac

Hi,
        Sounds like your login environment finds the 64bit version of
mpirun and perhaps the torque queue setup does not?

Mac McCalla
Houston

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Sims, James S. Dr.
Sent: Friday, June 26, 2009 1:08 PM
To: General LAM/MPI mailing list
Subject: Re: LAM: Problem with 64 bit lam and intel

Oops, sorry, I totally inaccurately described the problem. The problem
is not with LAM per se (or so it appears), but with lam and torque. I
can log in to a 2 processor node and mpirun -np 2 ./MPI_li_64 works. But
if I submit the same job in batch, or use an interactive torque queue
with qsub -I, when I issue the same mpirun -np 2 command I get

PID 10261 failed on node n0 (10.2.1.54) due to signal 11.

Jim
________________________________________
From: Sims, James S. Dr. [james.sims_at_[hidden]]
Sent: Thursday, June 25, 2009 12:32 PM
To: tprince_at_[hidden]; General LAM/MPI mailing list
Subject: Re: LAM: Problem with 64 bit lam and intel

Yes, I have done so.

From: Tim Prince [TimothyPrince_at_[hidden]]
Sent: Thursday, June 25, 2009 12:29 AM
To: General LAM/MPI mailing list
Subject: Re: LAM: Problem with 64 bit lam and intel

Sims, James S. Dr. wrote:
> I have a LAM/MPI program compiled with a version of LAM built using
> the ifort 10.1 compiler. I can compile and run this code with no
> problem, using the 32 bit version of ifort. However, compiling the
> same code to produce a 64 bit executable does not run correctly, but
> gives a segmentation violation in a beginning part of the code that is

> fine with the 32 bit version. So I can't run this code as a 64 bit
> application, which I need to do to get beyond memory problems. Same
> behavior if I switch to OpenMPI. Any help you can give me with this
will be greatly appreciated.
Did you build copies of LAM and OpenMPI with the corresponding 64-bit
compilers? These must match, and be kept separate from the 32-bit
versions.