Parameters for MUMPS solver

tcajuhi · June 16, 2021, 12:21pm

Hello everyone,

does anyone have experience using MUMPS (PetSc) with ogs?

The parameter set below does not work, i.e. ogs jumps back to the default solver.

<linear_solver>
	<name>general_linear_solver</name>
	<petsc>
		<parameters>-mat_type aij -pc_type lu -pc_factor_mat_solver_package mumps -ksp_view</parameters>
	</petsc>
</linear_solver>

Do you have any suggestions on how to correclty activate/parametrize it?

Thanks!

Best,
Tuanny

PS: Please disregard my last post “MUMPS solver - Parameters”.

tcajuhi · June 17, 2021, 11:55am

Dear all,

the simulation runs with

    <linear_solver>
		<name>general_linear_solver</name>
		<petsc>
			<prefix>hc</prefix>
			<parameters>-hc_mat_type aij -hc_pc_type lu -hc_pc_factor_mat_solver_type mumps -hc_ksp_view</parameters>
		</petsc>
    </linear_solver>

This seems to be the simplest parameter combination.

Output example:

KSP Object: (hc_) 1 MPI processes
  type: gmres
	restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
	happy breakdown tolerance 1e-30
  maximum iterations=10000, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (hc_) 1 MPI processes
  type: lu
	out-of-place factorization
	tolerance for zero pivot 2.22045e-14
	matrix ordering: nd
	factor fill ratio given 0., needed 0.
	  Factored matrix follows:
		Mat Object: 1 MPI processes
		  type: mumps
		  rows=26874, cols=26874
		  package used to perform factorization: mumps
		  total: nonzeros=5792610, allocated nonzeros=5792610
			MUMPS run parameters:
			  SYM (matrix type):                   0 
			  PAR (host participation):            1 
			  ICNTL(1) (output for error):         6 
			  ICNTL(2) (output of diagnostic msg): 0 
			  ICNTL(3) (output for global info):   0 
			  ICNTL(4) (level of printing):        0 
			  ICNTL(5) (input mat struct):         0 
			  ICNTL(6) (matrix prescaling):        7 
			  ICNTL(7) (sequential matrix ordering):7 
			  ICNTL(8) (scaling strategy):        77 
			  ICNTL(10) (max num of refinements):  0 
			  ICNTL(11) (error analysis):          0 
			  ICNTL(12) (efficiency control):                         1 
			  ICNTL(13) (sequential factorization of the root node):  0 
			  ICNTL(14) (percentage of estimated workspace increase): 20 
			  ICNTL(18) (input mat struct):                           0 
			  ICNTL(19) (Schur complement info):                      0 
			  ICNTL(20) (RHS sparse pattern):                         0 
			  ICNTL(21) (solution struct):                            0 
			  ICNTL(22) (in-core/out-of-core facility):               0 
			  ICNTL(23) (max size of memory can be allocated locally):0 
			  ICNTL(24) (detection of null pivot rows):               0 
			  ICNTL(25) (computation of a null space basis):          0 
			  ICNTL(26) (Schur options for RHS or solution):          0 
			  ICNTL(27) (blocking size for multiple RHS):             -32 
			  ICNTL(28) (use parallel or sequential ordering):        1 
			  ICNTL(29) (parallel ordering):                          0 
			  ICNTL(30) (user-specified set of entries in inv(A)):    0 
			  ICNTL(31) (factors is discarded in the solve phase):    0 
			  ICNTL(33) (compute determinant):                        0 
			  ICNTL(35) (activate BLR based factorization):           0 
			  ICNTL(36) (choice of BLR factorization variant):        0 
			  ICNTL(38) (estimated compression rate of LU factors):   333 
			  CNTL(1) (relative pivoting threshold):      0.01 
			  CNTL(2) (stopping criterion of refinement): 1.49012e-08 
			  CNTL(3) (absolute pivoting threshold):      0. 
			  CNTL(4) (value of static pivoting):         -1. 
			  CNTL(5) (fixation for null pivots):         0. 
			  CNTL(7) (dropping parameter for BLR):       0. 
			  RINFO(1) (local estimated flops for the elimination after analysis): 
				[0] 8.52752e+08 
			  RINFO(2) (local estimated flops for the assembly after factorization): 
				[0]  7.85459e+06 
			  RINFO(3) (local estimated flops for the elimination after factorization): 
				[0]  8.52752e+08 
			  INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): 
			  [0] 72 
			  INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): 
				[0] 72 
			  INFO(23) (num of pivots eliminated on this processor after factorization): 
				[0] 26874 
			  RINFOG(1) (global estimated flops for the elimination after analysis): 8.52752e+08 
			  RINFOG(2) (global estimated flops for the assembly after factorization): 7.85459e+06 
			  RINFOG(3) (global estimated flops for the elimination after factorization): 8.52752e+08 
			  (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
			  INFOG(3) (estimated real workspace for factors on all processors after analysis): 5792610 
			  INFOG(4) (estimated integer workspace for factors on all processors after analysis): 261736 
			  INFOG(5) (estimated maximum front size in the complete tree): 399 
			  INFOG(6) (number of nodes in the complete tree): 1394 
			  INFOG(7) (ordering option effectively use after analysis): 5 
			  INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 
			  INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 5792610 
			  INFOG(10) (total integer space store the matrix factors after factorization): 261736 
			  INFOG(11) (order of largest frontal matrix after factorization): 399 
			  INFOG(12) (number of off-diagonal pivots): 0 
			  INFOG(13) (number of delayed pivots after factorization): 0 
			  INFOG(14) (number of memory compress after factorization): 0 
			  INFOG(15) (number of steps of iterative refinement after solution): 0 
			  INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 72 
			  INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 72 
			  INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 72 
			  INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 72 
			  INFOG(20) (estimated number of entries in the factors): 5792610 
			  INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 63 
			  INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 63 
			  INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0 
			  INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1 
			  INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 
			  INFOG(28) (after factorization: number of null pivots encountered): 0
			  INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 5792610
			  INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 65, 65
			  INFOG(32) (after analysis: type of analysis done): 1
			  INFOG(33) (value used for ICNTL(8)): 7
			  INFOG(34) (exponent of the determinant if determinant is requested): 0
			  INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 5792610
			  INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0 
			  INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0 
			  INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0 
			  INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0 
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
	type: mpiaij
	rows=26874, cols=26874
	total: nonzeros=1054062, allocated nonzeros=1054062
	total number of mallocs used during MatSetValues calls=0
	  using I-node (on process 0) routines: found 11493 nodes, limit used is 5

Best,
Tuanny

dmitri.naumov · June 17, 2021, 12:34pm

This is great news! Thanks for finding out the solution. Now more interesting solvers can be used.

There is indeed some clarification needed for those prefixed petsc solver settings and which one has priority.

tcajuhi · June 17, 2021, 1:00pm

Hi Dima, indeed there are many options! We need to try a bit more and see where/how we can optimize the calculations. Here is a link with additional parameters:

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERMUMPS.html

keita.yoshioka · June 17, 2021, 2:28pm

Thanks indeed. Now we can use a direct solver in a Petsc environment. This is great.