# Computing/Running time

Hello everyone,

I previously posted on the discourse about a problem I had with a prj file with HT process associated with density and viscosity functions temperature-dependent.

Now it’s working fine but I still have an issue with the computing time. Even with PardisoLU it takes for example an hour to achieve 50kyrs and I want to compute at least 500kyrs or 1Myr…

So I was wondering if it’s normal ? Or is there a way for it to be faster ? I know I’m working with long time series but maybe it’s possible ?

new_data_test_0.zip (511.2 KB)

Rose-Nelly

TL;DR; The behaviour is not unexpected but can be improved on the model side, code side, and clever parallelization techniques.

Some observations.

Looking at a time step one observation is that the half of time is spent in the assembly and half in the linear solver (using OMP_NUM_THREADS=8, which are used by PardisoLU):

``````info: [time] Iteration #5 took 0.178429 s.
info: [time] Assembly took 0.0861893 s.
info: [time] Applying Dirichlet BCs took 0.00332492 s.
info: ------------------------------------------------------------------
info: *** Eigen solver computation
info: -> scale
info: -> solve with Eigen direct linear solver PardisoLU
info: ------------------------------------------------------------------
info: [time] Linear solver took 0.0873809 s.
info: Convergence criterion: |dx|=5.6791e-01, |x|=4.3901e+09, |dx|/|x|=1.2936e-10
info: [time] Iteration #6 took 0.177022 s.
info: [time] Solving process #0 took 1.07002 s in time step #11
info: [time] Time step #11 took 1.07006 s.
info: [time] Output of timestep 11 took 0.0723171 s.
``````

0.09s asm + 0.09s lin.solve

Then there are many non-linear solver iterations due to the Picard implementation; implementing a Newton scheme would significantly improve the convergence and the total solution time, but this requires code modifications.

Reducing the mesh resolution would help, but affects the solution. Probably not possible in this setting.

You could try to parallelize the problem using domain decomposition (using 4 domains here, use maximum number of partitions as you have physical cores or less):

``````partmesh -s -i a.vtu
partmesh -m -n  4 -i a.vtu -- a_*.vtu
``````

Then add a linear solver snippet to the project file:

``````<OpenGeoSysProject>
...
<linear_solvers>
<linear_solver>
<name>general_linear_solver</name>
<eigen>
...
</eigen>
<petsc> <!-- the new snippet -->
<parameters>-ksp_type bcgs -pc_type mg -ksp_rtol 1.e-10 -ksp_max_it 10000</parameters>
</petsc>
</linear_solver>
</linear_solvers>
</OpenGeoSysProject>
``````

Compile “petsc” version of OGS, Build configuration for MPI and PETSc .

Then running

``````mpirun -np 4 ogs test_0_modif.prj
``````

should work almost four times faster.

Increasing initial dt to a larger value would save few time steps in the beginning. (10e6s worked for me but is dependent on the linear solver being used.)

– d

I think the top layer can be meshed much coarser, since there is almost no flow.

– d

Yes, you’re right i thought about reducing the size of the top layer and maybe I’ll do that.

Again, thanks for your answer. Right now I’m trying to configure ogs with petsc and I’ll try to run it with all your indications to see how much time it takes. I’ll keep you updated.

I ran the simulation out of curiosity partitioned on 30 cores (same original mesh), and now after 20 hours runtime the simulation reached t=9.5e12s, with dt around 40e9s. Extrapolating to end time of 6.3e13s, it would take another 1400 steps. With time steps ranging from 8s to 40s it will end somewhere from 3h to 15h from now on, so total 23 to 35 hours, which is not that bad.

Update Simulation ran for 196238s approx. 55h. There where 3086 steps, and many 25602 were rejected because the Picard solver didn’t converge—this is a hint to adjust the time stepping scheme to avoid excessively large time steps.

Staggered scheme could give some improvement too, see https://gitlab.opengeosys.org/ogs/ogs/-/blob/master/Tests/Data/Parabolic/HT/ClassicalTransportExample/classical_transport_example_full_upwind_staggered.prj for example. There is also full upwind scheme used, which might help as well.