Hello Alexei,
It works. Thanks a lot for your help.
Yiye
On Wed, 17 Aug 2005, Alexei Yakovlev wrote:
> Hi Yiye,
>
> please change the two lines as follows:
>
> mpirun SCM_WD=$SCM_WD SCM_STDIN=$SCM_STDIN -np $NSCM $PROG "$@"
> else
> mpirun SCM_WD=$SCM_WD SCM_STDIN=$SCM_STDIN -machinefile $SCM_MACHINEFILE -np $NSCM $PROG "$@"
>
>
> Alexei
>
> Y. Huang wrote:
>
> >Hi Alexei,
> >
> >I modified the file $ADFBIN/start on line 300 and line 302 to
> >
> > if test -z "$SCM_MACHINEFILE"
> > then
> > mpirun.ch_gm -np $NSCM $SCM_WD $SCM_STDIN $PROG "$@"
> > else
> > mpirun.ch_gm -machinefile $SCM_MACHINEFILE -np $NSCM $SCM_WD
> > $SCM_STDIN $PROG "$@"
> > fi
> >
> >I deleted the NSCM entry in .bashrc, set this environment variable from
> >the shell, and ran the example,
> >
> >huang_at_k05cn001 e_AIM_HF> export NSCM=2
> >huang_at_k05cn001 e_AIM_HF> ./run
> >...
> >env: /home/huang/software/ADF/adf2004.01/examples/adf/e_AIM_HF: Permission
> >denied
> >env: /home/huang/software/ADF/adf2004.01/examples/adf/e_AIM_HF: Permission
> >denied
> >
> > ************* Input file not found *************
> >
> >KFEXIT
> >Contents of rdt21.res:
> >cat: rdt21.res: No such file or directory
> >Contents of WFN-alpha:
> >cat: WFN-alpha: No such file or directory
> >Contents of WFN-beta:
> >cat: WFN-beta: No such file or directory
> >
> >
> >I got this "env:...Permission denied" error, am I missing something?
> >
> >Thanks
> >
> >Yiye
> >
> >
> >On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> >
> >
> >
> >>By the way just so that you know, you should never set NSCM variable in
> >>~/.cshrc or ~/.bashrc and definitely not to "1". If you do then every
> >>parallel slave node will get this value and decide that it's running on
> >>its own in serial mode and will never call MPI_Init. It will then try to
> >>read its input data from stdin.
> >>
> >>Alexei
> >>
> >>Y. Huang wrote:
> >>
> >>
> >>
> >>>Hi Alexei,
> >>>
> >>>What values do I use for SCM_WD and SCM_STDIN ? Thanks.
> >>>
> >>>Yiye
> >>>
> >>>On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>Hello Yiye
> >>>>
> >>>>The most crucial variables are NSCM, SCM_WD and SCM_STDIN. The rest is either not mandatory but useful for performance (SCM_IOBUFFERSIZE, SCM_VECTORLENGTH), or should be set in ~/.cshrc and ~/.bashrc (SCM_TMPDIR, SCM_USETMPDIR, SCMLICENSE), or are only needed for debugging (the rest)
> >>>>
> >>>>
> >>>>Alexei
> >>>>
> >>>>
> >>>>Huang wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Hello Alexei,
> >>>>>
> >>>>>I can pass the environment variables using mpirun.ch_gm command. Do you
> >>>>>mean I would need to pass all of SCM_VECTORLENGTH, SCM_WD, SCM_IOBUFFERSIZE,
> >>>>>SCM_DIAG_EXCLUDE, SCM_DEBUG, SCM_NEWDIAG, SCM_TMPDIR, SCM_USETMPDIR,
> >>>>>SCM_NOMEMCHECK, and SCMLICENSE, SCM_STDIN? I know the variable SCM_WD must
> >>>>>be passed due to the "/bin/cp: omitting directory" error. Could you tell
> >>>>>me which variable I will need to pass and what is the value of the
> >>>>>variable I need to use?
> >>>>>
> >>>>>Thanks for your help.
> >>>>>
> >>>>>Yiye
> >>>>>
> >>>>>On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Dear Dr. Huang,
> >>>>>>
> >>>>>>These error messages usually indicate that environment variables are not
> >>>>>>passed from mpirun to the ADF master node. Whether variables are passed
> >>>>>>or not depends on the MPI implementation and can sometimes be influenced
> >>>>>>by mpirun switches. From my experience, the mpirun command that comes
> >>>>>>with MPICH (ch_p4) only passes environment if the master node is spawned
> >>>>>>directly by mpirun and not via ssh/rsh. The latter happens when one is
> >>>>>>using -nolocal. As far as I know, ch_gm has a command line option to
> >>>>>>request passing of certain environment variables from mpirun to the
> >>>>>>executed program. For example, to pass two variable FOO_1 and FOO_2 to a
> >>>>>>parallel program use the following command option (from
> >>>>>>http://www.myri.com/scs/READMES/README-mpich-gm):
> >>>>>>mpirun FOO_1=<my_value_there> FOO_2=<another_value> -np 2 foo.x
> >>>>>>
> >>>>>>For ADF, the most important environment variables are SCM_VECTORLENGTH,
> >>>>>>SCM_WD, SCM_IOBUFFERSIZE, SCM_DIAG_EXCLUDE, SCM_DEBUG, SCM_NEWDIAG,
> >>>>>>SCM_TMPDIR, SCM_USETMPDIR, SCM_NOMEMCHECK, and SCMLICENSE, SCM_STDIN.
> >>>>>>
> >>>>>>In the next version, we've implemented a permanent solution. All
> >>>>>>necessary environment variables can be passed to ADF on the command line
> >>>>>>as -DNAME=value. We hope this solution will work in all cases.
> >>>>>>
> >>>>>>Best regards,
> >>>>>>Alexei
> >>>>>>
> >>>>>>
> >>>>>>Y. Huang wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Hello,
> >>>>>>>
> >>>>>>>I am running ADF jobs on a dual Opteron cluster connected by Myrinet
> >>>>>>>interconnect. OS is SuSE. MPICH-GM has been installed on the system. I
> >>>>>>>have set the following environment variables in .bashrc
> >>>>>>>
> >>>>>>>export ADFHOME=/home/huang/software/ADF/adf2004.01
> >>>>>>>export ADFBIN=$ADFHOME/bin
> >>>>>>>export ADFRESOURCES=$ADFHOME/atomicdata
> >>>>>>>export SCMLICENSE=$ADFHOME/license
> >>>>>>>export MPIDIR=/opt/mpigm
> >>>>>>>export SCM_MACHINEFILE=/home/huang/nodes
> >>>>>>>export NSCM=4
> >>>>>>>export SCM_TMPDIR=/tmp
> >>>>>>>export SCM_USETMPDIR=yes
> >>>>>>>export P4_RSHCOMMAND=/usr/bin/ssh
> >>>>>>>
> >>>>>>>The file /home/huang/nodes contains 4 nodes.
> >>>>>>>
> >>>>>>>I got the following errors when I ran the example accompanied with the ADF
> >>>>>>>package in the directory of examples/adf/e_AIM_HF:
> >>>>>>>
> >>>>>>>...
> >>>>>>>/bin/cp: omitting directory `/dev'
> >>>>>>>/bin/cp: omitting directory `/etc'
> >>>>>>>/bin/cp: omitting directory `/home'
> >>>>>>>/bin/cp: omitting directory `/lib'
> >>>>>>>/bin/cp: omitting directory `/lib64'
> >>>>>>>/bin/cp: omitting directory `/lost+found'
> >>>>>>>/bin/cp: omitting directory `/media'
> >>>>>>>/bin/cp: omitting directory `/mnt'
> >>>>>>>/bin/cp: omitting directory `/opt'
> >>>>>>>/bin/cp: omitting directory `/opt.orig'
> >>>>>>>/bin/cp: omitting directory `/proc'
> >>>>>>>/bin/cp: omitting directory `/root'
> >>>>>>>/bin/cp: omitting directory `/sbin'
> >>>>>>>/bin/cp: omitting directory `/srv'
> >>>>>>>/bin/cp: omitting directory `/sys'
> >>>>>>>/bin/cp: omitting directory `/tmp'
> >>>>>>>/bin/cp: omitting directory `/usr'
> >>>>>>>/bin/cp: omitting directory `/var'
> >>>>>>>...
> >>>>>>><Aug15-2005> <11:56:10> ADF 2004.01 RunTime: Aug15-2005 11:56:10
> >>>>>>><Aug15-2005> <11:56:10> *** (NO TITLE) ***
> >>>>>>><Aug15-2005> <11:56:10> RunType : SINGLE POINT
> >>>>>>><Aug15-2005> <11:56:10> NO ATOMS INPUT
> >>>>>>><Aug15-2005> <11:56:10> ardel: last array to be deleted not found
> >>>>>>><Aug15-2005> <11:56:10> NO ATOMS INPUT
> >>>>>>><Aug15-2005> <11:56:10> END
> >>>>>>>ERROR DETECTED
> >>>>>>>************* Input file not found *************
> >>>>>>>KFEXIT
> >>>>>>>Contents of rdt21.res:
> >>>>>>>cat: rdt21.res: No such file or directory
> >>>>>>>Contents of WFN-alpha:
> >>>>>>>cat: WFN-alpha: No such file or directory
> >>>>>>>Contents of WFN-beta:
> >>>>>>>cat: WFN-beta: No such file or directory
> >>>>>>>
> >>>>>>>
> >>>>>>>Any suggestion is appreciated.
> >>>>>>>
> >>>>>>>Yiye
> >>>>>>>
> >>>>>>>**********************************
> >>>>>>>Faculty of Science
> >>>>>>>University of Waterloo
> >>>>>>>Waterloo, Ontario N2L 3G1
> >>>>>>>E-mail: huang_at_uwaterloo.ca
> >>>>>>>Tel: (519) 888-4567 ext.6110
> >>>>>>>**********************************
> >>>>>>>| (\
> >>>>>>>| http://hpc.uwaterloo.ca ( \
> >>>>>>>|__________________________) ) />
> >>>>>>> / ) / //))/
> >>>>>>> \ \_/ /////
> >>>>>>> \ /
> >>>>>>> \_ /
> >>>>>>> | |
> >>>>>>> |___|
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >
> >
> >
>
Received on 2005-08-17 17:41:09
This archive was generated by hypermail 2.2.0 : 2006-11-02 07:00:02 CET