11. df — Density fitting

11.1. Introduction

The df module provides the fundamental functions to handle the 3-index tensors required by the density fitting (DF) method or the resolution of identity (RI) approximation. Specifically, it includes the functions to compute the 3-center 2-electron AO integrals, the DF/RI 3-index tensor in the form of Cholesky decomposed integral tensor (\((ij|kl)=V_{ij,x}V_{kl,x}\)), the AO to MO integral transformation of the 3-index tensor, as well as the functions to generate the density fitting basis.

The density_fit() method can utilize the DF method at SCF and MCSCF level:

from pyscf import gto, scf, mcscf
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2-tzvp')
mf = scf.RHF(mol).density_fit().run()
mc = mcscf.CASSCF(mf, 8, 10).density_fit().run()

Once the DF method is enabled at the SCF level, all the post-SCF methods will automatically enable the DF method, for example:

from pyscf import gto, dft, tddft
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2-tzvp')
mf = dft.RKS(mol).density_fit().run()
td = tddft.TDA(mf).run()

In PySCF, DF is implemented at different level of implementations for different methods. They are summarized in the following table

Methods Fake-ERI Native DF Properties with DF
HF/DFT Yes Yes  
Generlized HF/DFT Yes No  
Relativistic HF Yes Yes  
TDDFT Yes Yes  
RCCSD Yes Yes  
UCCSD Yes No  
RCCSD(T) Yes No  
RMP2 Yes No  
UMP2 Yes No  
PBC HF/DFT Yes Yes  
PBC Gamma-point CCSD Yes Yes  
PBC k-points CCSD Yes No  

Fake-ERI means to mimic the 4-center 2-electron repulsion integrals (ERI) by precontracting the DF 3-index tensor. This is the simplest way to enable DF integrals, although the fake-ERI mechanism may require huge amount of memory also may be slow in performance. It provides the most convenient way to embed the DF integrals in the existing code, thus it is supported by almost every method in PySCF. It is particularly important in the periodic code. Using the fake-ERIs allows us to call all quantum chemistry methods developed at molecular level in the \(\Gamma\)-point calculations without modifying any existing molecular code. See also the pbc.df — PBC denisty fitting module.

Some methods have native DF implementation. This means the performance of the DF technique has been considered in the code. In these methods, DF approximation generally runs faster than the regular scheme without integral approximation and also consumes less memory or disk space.

When density fitting is enabled in a method, a with_df object will be generated and attached to the method object. with_df is the object to hold all DF relevant quantiles, such as the DF basis, the file to save the 3-index tensor, the amount of memory to use etc. You can modify the attributes of with_df to get more control over the DF methods. In the SCF and MCSCF methods, setting with_df to None will switch off the DF approximation. In the periodic code, all two-electron integrals are evaluated by DF approximations. There are four different types of DF schemes (FFTDF, AFTDF, GDF, MDF see pbc.df — PBC denisty fitting), available in the periodic code. By assigning different DF object to with_df, different DF schemes can be applied in the PBC calculations.

11.1.1. DF auxiliary basis

The default auxiliary basis set are generated by function pyscf.df.addons.make_basis() based on the orbital basis specified in the calculation according to the rules defined in pyscf.df.addons.DEFAULT_AUXBASIS. Specifically, the jkfit basis in the first column is used for Hartree-Fock or DFT methods, and the ri basis in the second column is used for correlation calculations. These optimized auxiliary basis sets are obtained from http://www.psicode.org/psi4manual/master/basissets_byfamily.html If optimized auxiliary basis set was not found for the orbital basis set, even-tempered Gaussian functions are generated automatically.

Specifying auxiliary basis is a common requirement in the real applications. For example, the default auxiliary basis set for the pure DFT calculations may be over complete since it is designed to represent both the Coulomb and HF exchange matrix. Coulomb fitting basis such as Weigend-cfit basis or Ahlrichs-cfit basis are often enough to obtain chemical accuracy. To control the fitting basis in DF method, You can change the value of with_df.auxbasis attribute. The input format of auxiliary fitting basis is exactly the same to the input format of orbital basis set. For example:

from pyscf import gto, dft
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2-tzvp')
mf = dft.RKS(mol)
mf.xc = 'pbe,pbe'
mf.run()  # -109.432313679876
mf = mf.density_fit()
mf.run()  # -109.432329411505
mf.with_df.auxbasis = 'weigend'
mf.run()  # -109.432334646584

More examples for inputing auxiliary basis in the DF calculation can be found in examples/df/01-auxbasis.py. Even-tempered auxiliary Gaussian basis

The even-tempered auxiliary Gaussian basis is generated by function aug_etb():

from pyscf import gto, df
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='ccpvdz')
print(mol.nao_nr())                            # 28
auxbasis = df.aug_etb(mol)
print(df.make_auxmol(mol, auxbasis).nao_nr())  # 200
auxbasis = df.aug_etb(mol, beta=1.6)
print(df.make_auxmol(mol, auxbasis).nao_nr())  # 338

Here the make_auxmol() function converts the auxbasis to a Mole object which can be used to evaluate the analytical integrals the same way as the regular Mole object. The formula to generate the exponents \(\zeta\) of the even-tempered auxiliary basis are

\[\begin{split}\varphi &= r^l \exp(-\zeta_{il} r^2), \quad i = 0..n \\ \zeta_{il} &= \alpha \times \beta^i :label: etb\end{split}\]

The default value of \(\beta\) is 2.3. \(\alpha\) and the number of auxiliary basis \(n\) is determined based on the orbital basis. Given the orbital basis

\[\chi = r^l \exp(-\alpha_l r^2)\]

the orbital pair on the same center produces a new one-center basis

\[\chi \chi' = r^{l+l'} \exp(-(\alpha_l+\alpha_l') r^2) = r^L \exp(-\alpha_L r^2)\]

The minimal \(\alpha_L\) in all orbital pairs is assigned to \(\alpha\) in (?). Then \(n\) is estimated to make the largest auxiliary exponent \(\zeta\) as close as possible to the maximum \(\alpha_L\). The size of generated even-tempered Gaussian basis is typically 5 - 10 times of the size of the orbital basis, or 2 - 3 times more than the optimized auxiliary basis. (Note the accuracy of this even-tempered auxiliary basis is not fully benchmarked. The error is close to the optimized auxiliary basis in our tests.)

11.1.2. Saving/Loading DF integral tensor

Although it is not expensive to compute DF integral tensor in the molecular calculation, saving/loading the 3-index tensor is still useful since it provides an alternative way, different to the attribute _eri of mean-field object (see Customizing Hamiltonian), to customize the Hamiltonian.

In the DF-SCF method, the 3-index tensor is held in the with_df object. The with_df object (see pyscf.df.df.DF class) provides two attributes _cderi_to_save and _cderi to access the DF 3-index integrals.

If a DF integral tensor is assigned to _cderi, the integrals will be used in the DF calculation. The DF integral tensor can be either a numpy array or an HDF5 file on disk. When the DF integrals are provided in the HDF5 file, the integral needs to be stored under the dataset 'j3c':

import numpy
import h5py
from pyscf import gto, scf, df
mol = gto.M(atom='H 0 0 0; H 1 0 1; H 0 1 1; H 1 1 0', basis='sto3g')
nao = mol.nao_nr()
with h5py.File('df_ints.h5', 'w') as f:
    f['j3c'] = numpy.random.random((10,nao*(nao+1)//2))
mf = scf.RHF(mol).density_fit()
mf.with_df._cderi = 'df_ints.h5'

As shown in the above example, the integral tensor \(V_{x,ij}\) provided in _cderi should be a 2D array in C (row-major) convention. Its first index corresponds to the auxiliary basis and the second combined index ij is the orbital pair index. When load DF integrals, we assumed hermitian symmetry between the two orbital index, ie only the elements \(i\geq j\) are left in the DF integral tensor. Thus the DF integral tensor should be a 2D array, with shape (M,N*(N+1)/2), where M is the number of auxiliary functions, N is the number of orbitals.

If _cderi is not specified, the DF integral tensor will be generated during the calculation and stored to the file that the attribute _cderi_to_save points to. By default, it is a random file and the random file will be deleted if the calculation finishes successfully. You can find the filename in the output log (when with.verbose > 3, for example:

******** <class 'pyscf.df.df.DF'> flags ********
auxbasis = None
max_memory = 20000
_cderi_to_save = /scratch/tmp6rGrSD

If the calculation is terminated problematically with error or any other reasons, you can reuse the DF integrals in the next calculation by assigning the integral file to _cderi. Overwriting _cderi_to_save with a filename will make the program save the DF integrals in the given filename regardless whether the calculation is succeed or failed. See also the example pyscf/examples/df/10-access_df_integrals.py. Precomputing the DF integral tensor

The DF integral tensor can be computed without initialization the with_df object. Functions cholesky_eri() defined in df.incore and df.outcore can generate DF integral tensor in memory or in a HDF5 file:

from pyscf import gto, df
mol = gto.M(atom='N 0 0 0; N 1 1 1', basis='ccpvdz')
cderi = df.incore.cholesky_eri(mol, auxbasis='ccpvdz-jkfit')
df.outcore.cholesky_eri(mol, 'df_ints.h5', auxbasis='ccpvdz-jkfit')

These cderi integrals has the same data structure as the one generated in with_df object. They can be directly used in the DF type calculations:

from pyscf import scf
mf = scf.RHF(mol).density_fit()
mf.with_df._cderi = cderi

mf.with_df._cderi = 'df_ints.h5'

11.1.3. Approximating orbital hessian in SCF and MCSCF

Orbital hessian is required by the second order SCF solver or MCSCF solver. In many systems, approximating the orbital hessian has negligible effects to the convergence and the solutions of the SCF or MCSCF orbital optimization procedure. Using DF method to approximate the orbital hessian can improve the overall performance. For example, the following code enables the DF approximation to the orbital hessian in SCF calculation:

from pyscf import gto, scf
mol = gto.M(atom='N 0 0 0; O 0 0 1.5', spin=1, basis='ccpvdz')
mf = scf.RHF(mol).newton().density_fit().run(verbose=4)  # converged SCF energy = -129.0896469563
mf = scf.RHF(mol).run(verbose=4)                         # converged SCF energy = -129.0896469563

The approximation to orbital hessian does not change the SCF result. In the above example, it produces the same solution to the regular SCF result. Similarly, when the DF approximation is used with CASSCF orbital hessian, the CASSCF result should not change. Continuing the above example, we can use the mcscf.approx_hessian() function to change the orbital hessian of the given CASSCF object:

from pyscf import mcscf
mc = mcscf.approx_hessian(mcscf.CASSCF(mf, 8, 11)).run()  # -129.283077136
mc = mcscf.CASSCF(mf, 8, 11).run()                        # -129.283077136


In the second order SCF solver, the order to apply the density_fit and newton methods affects the character of the resultant SCF object. For example, the statement mf = scf.RHF(mol).density_fit().newton() first produces a DFHF object then enable the second order Newton solver for the DFHF object. The resultant SCF object is a DFHF object. See more examples in examples/scf/23-decorate_scf.py

11.2. Program reference

11.2.1. DF class

class pyscf.df.df.DF(mol, auxbasis=None)[source]

Object to hold 3-index tensor

: str or dict
Same input format as Mole.basis
: Mole object
Read only Mole object to hold the auxiliary basis. auxmol is generated automatically in the initialization step based on the given auxbasis. It is used in the rest part of the code to determine the problem size, the integral batches etc. This object should NOT be modified.
: str
If _cderi_to_save is specified, the DF integral tensor will be saved in this file.
: str or numpy array
If _cderi is specified, the DF integral tensor will be read from this HDF5 file (or numpy array). When the DF integral tensor is provided from the HDF5 file, it has to be stored under the dataset ‘j3c’. The DF integral tensor \(V_{x,ij}\) should be a 2D array in C (row-major) convention, where x corresponds to index of auxiliary basis, and the combined index ij is the orbital pair index. The hermitian symmetry is assumed for the combined ij index, ie the elements of \(V_{x,i,j}\) with \(i\geq j\) are existed in the DF integral tensor. Thus the shape of DF integral tensor is (M,N*(N+1)/2), where M is the number of auxbasis functions and N is the number of basis functions of the orbital basis.
: int
When reading DF integrals from disk the chunk size to load. It is used to improve the IO performance.

11.2.2. df.incore

pyscf.df.incore.aux_e1(mol, auxmol, intor='int3c2e', aosym='s1', comp=None, out=None)[source]

3-center 2-electron AO integrals (L|ij), where L is the auxiliary basis.

Note aux_e1 is basically analogous to aux_e2 function. It can be viewed as the version of transposed aux_e2 tensor: if comp == 1:

aux_e1 = aux_e2().T
aux_e1 = aux_e2().transpose(0,2,1)

The same arguments as function aux_e2 can be input to aux_e1.

pyscf.df.incore.aux_e2(mol, auxmol, intor='int3c2e', aosym='s1', comp=None, out=None, cintopt=None)[source]

3-center AO integrals (ij|L), where L is the auxiliary basis.

: Libcint-3.14 and newer version support to compute int3c2e

without the opt for the 3rd index. It can be precomputed to reduce the overhead of cintopt initialization repeatedly.

cintopt = gto.moleintor.make_cintopt(mol._atm, mol._bas, mol._env, intor)

pyscf.df.incore.cholesky_eri(mol, auxbasis='weigend+etb', auxmol=None, int3c='int3c2e', aosym='s2ij', int2c='int2c2e', comp=1, verbose=0, fauxe2=<function aux_e2>)[source]
2D array of (naux,nao*(nao+1)/2) in C-contiguous
pyscf.df.incore.fill_2c2e(mol, auxmol, intor='int2c2e', comp=None, hermi=1, out=None)[source]

2-center 2-electron AO integrals for auxiliary basis (auxmol)

11.2.3. df.outcore

pyscf.df.outcore.cholesky_eri(mol, erifile, auxbasis='weigend+etb', dataname='j3c', tmpdir=None, int3c='int3c2e', aosym='s2ij', int2c='int2c2e', comp=1, max_memory=2000, ioblk_size=256, auxmol=None, verbose=3)[source]

3-center 2-electron AO integrals

pyscf.df.outcore.cholesky_eri_b(mol, erifile, auxbasis='weigend+etb', dataname='j3c', int3c='int3c2e', aosym='s2ij', int2c='int2c2e', comp=1, ioblk_size=256, auxmol=None, verbose=3)[source]

3-center 2-electron AO integrals

pyscf.df.outcore.general(mol, mo_coeffs, erifile, auxbasis='weigend+etb', dataname='eri_mo', tmpdir=None, int3c='int3c2e', aosym='s2ij', int2c='int2c2e', comp=1, max_memory=2000, ioblk_size=256, verbose=0, compact=True)[source]

Transform ij of (ij|L) to MOs.

11.2.4. df.addons

pyscf.df.addons.aug_etb(mol, beta=2.0)[source]

To generate the even-tempered auxiliary Gaussian basis

pyscf.df.addons.aug_etb_for_dfbasis(mol, dfbasis='weigend', beta=2.0, start_at=36)[source]

augment weigend basis with even-tempered gaussian basis exps = alpha*beta^i for i = 1..N

class pyscf.df.addons.load(eri, dataname='j3c')[source]

load 3c2e integrals from hdf5 file. It can be used in the context manager:

with load(cderifile) as eri:
pyscf.df.addons.make_auxbasis(mol, mp2fit=False)[source]

Depending on the orbital basis, generating even-tempered Gaussians or the optimized auxiliary basis defined in DEFAULT_AUXBASIS

pyscf.df.addons.make_auxmol(mol, auxbasis=None)[source]

Generate a fake Mole object which uses the density fitting auxbasis as the basis sets. If auxbasis is not specified, the optimized auxiliary fitting basis set will be generated according to the rules recorded in pyscf.df.addons.DEFAULT_AUXBASIS. If the optimized auxiliary basis is not available (either not specified in DEFAULT_AUXBASIS or the basis set of the required elements not defined in the optimized auxiliary basis), even-tempered Gaussian basis set will be generated.

See also the paper JCTC, 13, 554 about generating auxiliary fitting basis.