Gareth --
Many thanks for this analysis!
You're exactly right -- there's no reason we should be examining the
(argc, argv) from MPI_INIT -- LAM doesn't use it. LAM only examines
them for obscure hysterical raisins. I've coded around it and now the
user's (argc, argv) are not examined. This has now been committed to
SVN, and I'll upload 7.1.2b4 later today with this fix.
On Oct 7, 2004, at 11:05 PM, <Gareth.Williams_at_[hidden]> wrote:
>
>
> I believe there is a bug in sfh_argv_dup.
>
> As I interpret the code, the test on the while loop relies on the
> 'final' pointer in argv being 0, which is not necessarily the case.
> If this value happens to be an invalid address the code will not work
> as intended. A potential solution would be to pass both argc and argv
> and use argc to determine the length of the loop. On the other hand,
> since argc and argv are not used in MPI_Init in lam-mpi (from
> MPI_Init(3)) the calls could just be eliminated and the other lam
> executables (which presumably rely on the argument parsing procedures)
> would not be affected.
>
> The problem turned up for me in rams4.3.0 which has a fortran main
> routine which parses command line arguments into c compatible
> datastructures then calls a c procedure which in turn calls MPI_Init
> which goes on to cause a SIGSEGV due to the bug (nb. all of the MPI
> calls in this code are done from c). I have been using the intel
> compilers on an ia32 linux system. The problem only occurred for me
> under some sets of compilation options because whether it turns up or
> not depends on a value in potentially invalid and therefore
> uninitialized memory.
>
> I can work-around the problem by explicitly setting the pointer for
> the element of argv after last to NULL. This is a useful work-around
> but not a 'proper' solution.
>
> You should still fix the problem as it is likely to occur in other
> situations and is not easy to diagnose.
>
> - Gareth Williams
>
> For reference from 7.1.1, all_argv.c:
>
>
>
> /*
> * sfh_argv_dup
> *
> * Function: - create a duplicate copy of an argv
> * Accepts: - argv
> * Returns: - duplicate argv or NULL
> */
> char **
> sfh_argv_dup(char **argv)
> {
> char **dupv = 0; /* duplicate argv */
> int dupc = 0; /* dupv count */
>
> if (argv == 0) return(0);
>
> while (*argv != 0) {
>
> if (sfh_argv_add(&dupc, &dupv, *argv)) {
> sfh_argv_free(dupv);
> return(0);
> }
>
> argv++;
> }
>
> return(dupv);
> }
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|