ogs5.7.1_IPQC_ MPI -- PARALLEL on EVE -- ERROR

Dear ogs-users,

I am having problems with parallelized simulation of ogs5.7.1 on the UFZ-EVE-cluster. My reactive transport simulations (using IPQC and MPI) crash when running with more than 8 cores with the following error mesage:

ORTE has lost communication with its daemon located on node: hostname: node033 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job.

Does someone experienced this error ralready?

Seems to be a problem related to node communicatoin on the cluster, however, the simulations with 4 and 8 cores finished. The benchmark isofrac_2d  using 20 cores finishes as well.

Sometimes the model crashed after 5min sometimes after 2h.

I am a little confused now as I do not change input files between the simulations except *.ddc.

Best,

Johannes

Sounds like a network connection problem.

ยทยทยท

On 08/07/2018 12:43 PM, 'Johannes Boog' via ogs-users wrote:

Dear ogs-users,

I am having problems with parallelized simulation of ogs5.7.1 on the UFZ-EVE-cluster. My reactive transport simulations (using IPQC and MPI) crash when running with more than 8 cores with the following error mesage:

/ORTE has lost communication with its daemon located on node://
//
// hostname: node033//
//
//This is usually due to either a failure of the TCP network//
//connection to the node, or possibly an internal failure of//
//the daemon itself. We cannot recover from this failure, and//
//therefore will terminate the job./

Does someone experienced this error ralready?

Seems to be a problem related to node communicatoin on the cluster, however, the simulations with 4 and 8 cores finished. The benchmark isofrac_2d using 20 cores finishes as well.

Sometimes the model crashed after 5min sometimes after 2h.
I am a little confused now as I do not change input files between the simulations except *.ddc.

Best,

Johannes
--
You received this message because you are subscribed to the Google Groups "ogs-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ogs-users+unsubscribe@googlegroups.com <mailto:ogs-users+unsubscribe@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout\.