Dear Sir ;
Big greetings; I am Hashem from Jordan Univ. .... remember me ? I am Dr's
Bothina Hamad student .
About our cluster ..... Now we can ssh from the server to the nodes without
password .... now we installed MPICH2 ... and we are testing it ..... the
step that we are stuck in is that we are trying some examples to run
parallel jobs ; lets say the example suggested in the installation manual
... which is :
---------------------------------------------------------------------------------------------------------------
Now we will run an MPI job, using the mpiexec command as specified
in the MPI-2 standard.
As part of the build process for MPICH2, a simple program to compute
the value of ¼ by numerical integration is created in the mpich2-1.0.5
/examples
directory. If the current directory is the top level MPICH2 build directory,
then you can run this program with
mpiexec -n 5 examples/cpi
The number of processes need not match the number of hosts. The
cpi example will tell you which hosts it is running on. By default,
the processes are launched one after the other on the hosts in the mpd
ring, so it is not necessary to specify hosts when running a job with
mpiexec.
---------------------------------------------------------------------------------------------------------------
the problem was always that it gives the following massege :
[Errno 2] No such file or directory
After that we comiled that file using the command :
mpicc -o cpi cpi.c
Now the massege became :
---------------------------------------------------------------------------------------------------------------
It seems that there is no lamd running on the host Main-server.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
---------------------------------------------------------------------------------------------------------------
When we installed Lam and tried to start it using the command "lamboot" but
it didn't work on the root and gave the massege :
---------------------------------------------------------------------------------------------------------------
It is a Very Bad Idea to run this program as root.
LAM was designed to be run by individual users; it was *not* designed
to be run as a root-level service where multiple users use the same
LAM daemons in a client-server fashion.
Especially with today's propensity for hackers to scan for root-owned
network daemons, it could be tragic to run this program as root.
While LAM is known to be quite stable, and LAM does not leave network
sockets open for random connections after the initial setup, several
factors should strike fear into system administrator's hearts if LAM
were to be constantly running for all users to utilize:
1. LAM leaves a Unix domain socket open on each machine in the
/tmp directory. So if someone breaks into root on one
machine, they effectively have root on all machines that
are connected via LAM.
2. Indeed, there must have been a .rhosts (or some other trust
mechanism) for root which must have allowed you to run LAM
on remote nodes. Depending on your local setup, this may
not be safe.
3. LAM has never been checked for buffer overflows and other
malicious input types of errors. We don't *think* that
there are any buffer-overflow types of situations in LAM,
we've never checked explicitly (hence, per Mr. Murphy,
there are certainly some hiding somewhere).
4. LAM programs are not audited or tracked in any way. This
could present a sneaky way to execute binaries without log
trails (especially as root).
Hence, it's a Very Bad Idea to run LAM as root. Please login as a
different user and run LAM again.
---------------------------------------------------------------------------------------------------------------
After that we tried to boot the lam from a user not a root ... it booted ..
but : we can't excute our example from the user .... it has to be done from
the root ........ taking in consideration that the whole nodes boots as
users not root ... but I don't know if the mpdboot runs on users or on roots
of the nodes .
What do u think we shall do ?
Thank alot
|