I'm trying to debug a problem with the MPI version of ADF, specifically
running the GO_H2O test. All but one of the 4 processes is exiting
through pprdsu; they hang when they get to a barrier. The other process
hangs waiting for an allreduce that never happens.
I have put a write statement in pprdsu.d to display "mykid" and "isaldo"
on entry , and another to display "nchang" and "isaldo" after the two
lines which change the isaldo array. Here's what I get:
mykid, isaldo 0 241 542 -711 -72
nchang, isaldo 241 0 542 -470 -72
nchang, isaldo 470 0 72 0 -72
nchang, isaldo 72 0 0 0 0
mykid, isaldo 2 241 542 0 0
mykid, isaldo 1 241 542 362 0
mykid, isaldo 3 241 542 0 275
It seems like there's a race condition going on. Does someone have
another explanation? Is there an easy place to put a barrier to keep this
from happening? It's not clear to me how this load balancing is supposed
to happen.
Thanks in advance for any help.
Tom Spraggins
tas_at_virginia.edu
Received on 1999-02-27 18:09:29
This archive was generated by hypermail 2.2.0 : 2006-11-02 07:00:02 CET