You can also increase the debug and verbosity levels with
-d -v -vv (at least that is what they were with the 6.5.X series)
-J
On Sunday, Sep 28, 2003, at 23:05 US/Pacific, Jeremy Archuleta wrote:
> I have had similar problems, but one thought comes to mind...
>
> You are running on a cluster and either getting files to the proper
> nodes either via NFS (or some other network file system) or you are
> manually copying the code to all nodes. If you are manually copying,
> make sure you copy after every compile so as to ensure that the
> executables are the same. If you are using NFS, I do believe there is
> a refresh rate at which files are synchronized. So, "if there is an
> error, you fix it, and try mpirun again" all within the refresh rate,
> the remote exectuables aren't actually the current version...yet. By
> changing executable names you guarantee that if the code runs, that it
> is the most current code.
>
> Just a thought. Whenever I have received those errors, it usually
> turns out to be 1) different executables than what I thought (forgot
> to copy manually), 2) my fault and a special case with the code
> exiting early, or 3) something crashed on that node and wiped out the
> executable (like a segfault)
>
> Hope that helps.
> If it is indeed something else, with LAM perhaps, ... uh... that could
> be a big problem.
>
> -J
>
>
> On Sunday, Sep 28, 2003, at 21:53 US/Pacific, Andras Balogh wrote:
>
>>
>> I had the following strange problem.
>> I don't know if it is due to redhat or lam or ssh.
>> Looking through the archive I have the feeling that some other people
>> had
>> the same problem before me and maybe they did not realize what
>> happened.
>>
>> I compile my code on a dual-processor redhat system
>> and upload it to a redhat cluster in order to run it.
>>
>> I got error message
>> ``...mpirun did not invoke MPI_INIT before quitting...''
>> due to programming error.
>>
>> This is no big news, but the message stayed even after recompiling and
>> uploading a previously working version.
>>
>> Only renaming the executable solved the problem.
>>
>> It looks like that the OS (or lam) remembers the name of the
>> incorrect executable and does not want to accept it anymore as
>> correct.
>> This is freaky.
>> I renamed the file back and forth with the same result.
>>
>> --
>> Andras Balogh
>> ---------------------------------------------------------------------
>> Department of Mathematics | phone: (956) 381-2119
>> University of Texas - Pan American | phone: (956) 381-3452
>> Edinburg, TX 78541-2999 | fax: (956) 384-5091
>> http://www.math.panam.edu/abalogh | abalogh_at_[hidden]
>> ---------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|