Hello Alexei,
I am testing the ADF program using MPI on our cluster. Could you tell me
which example in the directory adf2004.01/examples/{adf,band} is suitable
to run as parallel. I would like to see the performance improvement by
using MPI to run parallel jobs over the sequential execution. I have
tested some of the examples accompanied with the ADF package, but I didn't
see much difference between the parallel and sequential execution of the
job. If there is no such test cases in the directory
adf2004.01/examples/{adf,band}, do you have any good parallel input files?
Thanks a lot for your help.
Yiye
>
> On Wed, 17 Aug 2005, Alexei Yakovlev wrote:
>
> > Hi Yiye,
> >
> > please change the two lines as follows:
> >
> > mpirun SCM_WD=$SCM_WD SCM_STDIN=$SCM_STDIN -np $NSCM $PROG "$@"
> > else
> > mpirun SCM_WD=$SCM_WD SCM_STDIN=$SCM_STDIN -machinefile $SCM_MACHINEFILE -np $NSCM $PROG "$@"
> >
> >
> > Alexei
> >
> > Y. Huang wrote:
> >
> > >Hi Alexei,
> > >
> > >I modified the file $ADFBIN/start on line 300 and line 302 to
> > >
> > > if test -z "$SCM_MACHINEFILE"
> > > then
> > > mpirun.ch_gm -np $NSCM $SCM_WD $SCM_STDIN $PROG "$@"
> > > else
> > > mpirun.ch_gm -machinefile $SCM_MACHINEFILE -np $NSCM $SCM_WD
> > > $SCM_STDIN $PROG "$@"
> > > fi
> > >
> > >I deleted the NSCM entry in .bashrc, set this environment variable from
> > >the shell, and ran the example,
> > >
> > >huang_at_k05cn001 e_AIM_HF> export NSCM=2
> > >huang_at_k05cn001 e_AIM_HF> ./run
> > >...
> > >env: /home/huang/software/ADF/adf2004.01/examples/adf/e_AIM_HF: Permission
> > >denied
> > >env: /home/huang/software/ADF/adf2004.01/examples/adf/e_AIM_HF: Permission
> > >denied
> > >
> > > ************* Input file not found *************
> > >
> > >KFEXIT
> > >Contents of rdt21.res:
> > >cat: rdt21.res: No such file or directory
> > >Contents of WFN-alpha:
> > >cat: WFN-alpha: No such file or directory
> > >Contents of WFN-beta:
> > >cat: WFN-beta: No such file or directory
> > >
> > >
> > >I got this "env:...Permission denied" error, am I missing something?
> > >
> > >Thanks
> > >
> > >Yiye
> > >
> > >
> > >On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> > >
> > >
> > >
> > >>By the way just so that you know, you should never set NSCM variable in
> > >>~/.cshrc or ~/.bashrc and definitely not to "1". If you do then every
> > >>parallel slave node will get this value and decide that it's running on
> > >>its own in serial mode and will never call MPI_Init. It will then try to
> > >>read its input data from stdin.
> > >>
> > >>Alexei
> > >>
> > >>Y. Huang wrote:
> > >>
> > >>
> > >>
> > >>>Hi Alexei,
> > >>>
> > >>>What values do I use for SCM_WD and SCM_STDIN ? Thanks.
> > >>>
> > >>>Yiye
> > >>>
> > >>>On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>Hello Yiye
> > >>>>
> > >>>>The most crucial variables are NSCM, SCM_WD and SCM_STDIN. The rest is either not mandatory but useful for performance (SCM_IOBUFFERSIZE, SCM_VECTORLENGTH), or should be set in ~/.cshrc and ~/.bashrc (SCM_TMPDIR, SCM_USETMPDIR, SCMLICENSE), or are only needed for debugging (the rest)
> > >>>>
> > >>>>
> > >>>>Alexei
> > >>>>
> > >>>>
> > >>>>Huang wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>Hello Alexei,
> > >>>>>
> > >>>>>I can pass the environment variables using mpirun.ch_gm command. Do you
> > >>>>>mean I would need to pass all of SCM_VECTORLENGTH, SCM_WD, SCM_IOBUFFERSIZE,
> > >>>>>SCM_DIAG_EXCLUDE, SCM_DEBUG, SCM_NEWDIAG, SCM_TMPDIR, SCM_USETMPDIR,
> > >>>>>SCM_NOMEMCHECK, and SCMLICENSE, SCM_STDIN? I know the variable SCM_WD must
> > >>>>>be passed due to the "/bin/cp: omitting directory" error. Could you tell
> > >>>>>me which variable I will need to pass and what is the value of the
> > >>>>>variable I need to use?
> > >>>>>
> > >>>>>Thanks for your help.
> > >>>>>
> > >>>>>Yiye
> > >>>>>
> > >>>>>On Tue, 16 Aug 2005, Alexei Yakovlev wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>>Dear Dr. Huang,
> > >>>>>>
> > >>>>>>These error messages usually indicate that environment variables are not
> > >>>>>>passed from mpirun to the ADF master node. Whether variables are passed
> > >>>>>>or not depends on the MPI implementation and can sometimes be influenced
> > >>>>>>by mpirun switches. From my experience, the mpirun command that comes
> > >>>>>>with MPICH (ch_p4) only passes environment if the master node is spawned
> > >>>>>>directly by mpirun and not via ssh/rsh. The latter happens when one is
> > >>>>>>using -nolocal. As far as I know, ch_gm has a command line option to
> > >>>>>>request passing of certain environment variables from mpirun to the
> > >>>>>>executed program. For example, to pass two variable FOO_1 and FOO_2 to a
> > >>>>>>parallel program use the following command option (from
> > >>>>>>http://www.myri.com/scs/READMES/README-mpich-gm):
> > >>>>>>mpirun FOO_1=<my_value_there> FOO_2=<another_value> -np 2 foo.x
> > >>>>>>
> > >>>>>>For ADF, the most important environment variables are SCM_VECTORLENGTH,
> > >>>>>>SCM_WD, SCM_IOBUFFERSIZE, SCM_DIAG_EXCLUDE, SCM_DEBUG, SCM_NEWDIAG,
> > >>>>>>SCM_TMPDIR, SCM_USETMPDIR, SCM_NOMEMCHECK, and SCMLICENSE, SCM_STDIN.
> > >>>>>>
> > >>>>>>In the next version, we've implemented a permanent solution. All
> > >>>>>>necessary environment variables can be passed to ADF on the command line
> > >>>>>>as -DNAME=value. We hope this solution will work in all cases.
> > >>>>>>
> > >>>>>>Best regards,
> > >>>>>>Alexei
> > >>>>>>
> > >>>>>>
> > >>>>>>Y. Huang wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>>Hello,
> > >>>>>>>
> > >>>>>>>I am running ADF jobs on a dual Opteron cluster connected by Myrinet
> > >>>>>>>interconnect. OS is SuSE. MPICH-GM has been installed on the system. I
> > >>>>>>>have set the following environment variables in .bashrc
> > >>>>>>>
> > >>>>>>>export ADFHOME=/home/huang/software/ADF/adf2004.01
> > >>>>>>>export ADFBIN=$ADFHOME/bin
> > >>>>>>>export ADFRESOURCES=$ADFHOME/atomicdata
> > >>>>>>>export SCMLICENSE=$ADFHOME/license
> > >>>>>>>export MPIDIR=/opt/mpigm
> > >>>>>>>export SCM_MACHINEFILE=/home/huang/nodes
> > >>>>>>>export NSCM=4
> > >>>>>>>export SCM_TMPDIR=/tmp
> > >>>>>>>export SCM_USETMPDIR=yes
> > >>>>>>>export P4_RSHCOMMAND=/usr/bin/ssh
> > >>>>>>>
> > >>>>>>>The file /home/huang/nodes contains 4 nodes.
> > >>>>>>>
> > >>>>>>>I got the following errors when I ran the example accompanied with the ADF
> > >>>>>>>package in the directory of examples/adf/e_AIM_HF:
> > >>>>>>>
> > >>>>>>>...
> > >>>>>>>/bin/cp: omitting directory `/dev'
> > >>>>>>>/bin/cp: omitting directory `/etc'
> > >>>>>>>/bin/cp: omitting directory `/home'
> > >>>>>>>/bin/cp: omitting directory `/lib'
> > >>>>>>>/bin/cp: omitting directory `/lib64'
> > >>>>>>>/bin/cp: omitting directory `/lost+found'
> > >>>>>>>/bin/cp: omitting directory `/media'
> > >>>>>>>/bin/cp: omitting directory `/mnt'
> > >>>>>>>/bin/cp: omitting directory `/opt'
> > >>>>>>>/bin/cp: omitting directory `/opt.orig'
> > >>>>>>>/bin/cp: omitting directory `/proc'
> > >>>>>>>/bin/cp: omitting directory `/root'
> > >>>>>>>/bin/cp: omitting directory `/sbin'
> > >>>>>>>/bin/cp: omitting directory `/srv'
> > >>>>>>>/bin/cp: omitting directory `/sys'
> > >>>>>>>/bin/cp: omitting directory `/tmp'
> > >>>>>>>/bin/cp: omitting directory `/usr'
> > >>>>>>>/bin/cp: omitting directory `/var'
> > >>>>>>>...
> > >>>>>>><Aug15-2005> <11:56:10> ADF 2004.01 RunTime: Aug15-2005 11:56:10
> > >>>>>>><Aug15-2005> <11:56:10> *** (NO TITLE) ***
> > >>>>>>><Aug15-2005> <11:56:10> RunType : SINGLE POINT
> > >>>>>>><Aug15-2005> <11:56:10> NO ATOMS INPUT
> > >>>>>>><Aug15-2005> <11:56:10> ardel: last array to be deleted not found
> > >>>>>>><Aug15-2005> <11:56:10> NO ATOMS INPUT
> > >>>>>>><Aug15-2005> <11:56:10> END
> > >>>>>>>ERROR DETECTED
> > >>>>>>>************* Input file not found *************
> > >>>>>>>KFEXIT
> > >>>>>>>Contents of rdt21.res:
> > >>>>>>>cat: rdt21.res: No such file or directory
> > >>>>>>>Contents of WFN-alpha:
> > >>>>>>>cat: WFN-alpha: No such file or directory
> > >>>>>>>Contents of WFN-beta:
> > >>>>>>>cat: WFN-beta: No such file or directory
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>Any suggestion is appreciated.
> > >>>>>>>
> > >>>>>>>Yiye
> > >>>>>>>
> > >>>>>>>**********************************
> > >>>>>>>Faculty of Science
> > >>>>>>>University of Waterloo
> > >>>>>>>Waterloo, Ontario N2L 3G1
> > >>>>>>>E-mail: huang_at_uwaterloo.ca
> > >>>>>>>Tel: (519) 888-4567 ext.6110
> > >>>>>>>**********************************
> > >>>>>>>| (\
> > >>>>>>>| http://hpc.uwaterloo.ca ( \
> > >>>>>>>|__________________________) ) />
> > >>>>>>> / ) / //))/
> > >>>>>>> \ \_/ /////
> > >>>>>>> \ /
> > >>>>>>> \_ /
> > >>>>>>> | |
> > >>>>>>> |___|
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >
> > >
> > >
> >
>
>
Received on 2005-08-18 21:00:51
This archive was generated by hypermail 2.2.0 : 2006-11-02 07:00:02 CET