TL;DR; The behaviour is not unexpected but can be improved on the model side, code side, and clever parallelization techniques.

Some observations.

Looking at a time step one observation is that the half of time is spent in the assembly and half in the linear solver (using OMP_NUM_THREADS=8, which are used by PardisoLU):

```
info: [time] Iteration #5 took 0.178429 s.
info: [time] Assembly took 0.0861893 s.
info: [time] Applying Dirichlet BCs took 0.00332492 s.
info: ------------------------------------------------------------------
info: *** Eigen solver computation
info: -> scale
info: -> solve with Eigen direct linear solver PardisoLU
info: ------------------------------------------------------------------
info: [time] Linear solver took 0.0873809 s.
info: Convergence criterion: |dx|=5.6791e-01, |x|=4.3901e+09, |dx|/|x|=1.2936e-10
info: [time] Iteration #6 took 0.177022 s.
info: [time] Solving process #0 took 1.07002 s in time step #11
info: [time] Time step #11 took 1.07006 s.
info: [time] Output of timestep 11 took 0.0723171 s.
```

0.09s asm + 0.09s lin.solve

Then there are many non-linear solver iterations due to the Picard implementation; implementing a Newton scheme would significantly improve the convergence and the total solution time, but this requires code modifications.

Reducing the mesh resolution would help, but affects the solution. Probably not possible in this setting.

You could try to parallelize the problem using domain decomposition (using 4 domains here, use maximum number of partitions as you have physical cores or less):

```
partmesh -s -i a.vtu
partmesh -m -n 4 -i a.vtu -- a_*.vtu
```

Then add a linear solver snippet to the project file:

```
<OpenGeoSysProject>
...
<linear_solvers>
<linear_solver>
<name>general_linear_solver</name>
<eigen>
...
</eigen>
<petsc> <!-- the new snippet -->
<parameters>-ksp_type bcgs -pc_type mg -ksp_rtol 1.e-10 -ksp_max_it 10000</parameters>
</petsc>
</linear_solver>
</linear_solvers>
</OpenGeoSysProject>
```

Compile “petsc” version of OGS, Build configuration for MPI and PETSc .

Then running

```
mpirun -np 4 ogs test_0_modif.prj
```

should work almost four times faster.

Increasing initial dt to a larger value would save few time steps in the beginning. (10e6s worked for me but is dependent on the linear solver being used.)

– d