11. df — Density fitting¶
11.1. Introduction¶
The df
module provides the fundamental functions to handle the 3index
tensors required by the density fitting (DF) method or the resolution of
identity (RI) approximation. Specifically, it includes the functions to compute
the 3center 2electron AO integrals, the DF/RI 3index tensor in the form of
Cholesky decomposed integral tensor (\((ijkl)=V_{ij,x}V_{kl,x}\)),
the AO to MO integral transformation of the 3index tensor, as well as the
functions to generate the density fitting basis.
The density_fit()
method can utilize the DF method at SCF and MCSCF
level:
from pyscf import gto, scf, mcscf
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2tzvp')
mf = scf.RHF(mol).density_fit().run()
mc = mcscf.CASSCF(mf, 8, 10).density_fit().run()
Once the DF method is enabled at the SCF level, all the postSCF methods will automatically enable the DF method, for example:
from pyscf import gto, dft, tddft
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2tzvp')
mf = dft.RKS(mol).density_fit().run()
td = tddft.TDA(mf).run()
print(td.e)
In PySCF, DF is implemented at different level of implementations for different methods. They are summarized in the following table
Methods  FakeERI  Native DF  Properties with DF 
HF/DFT  Yes  Yes  
Generlized HF/DFT  Yes  No  
Relativistic HF  Yes  Yes  
TDDFT  Yes  Yes  
RCCSD  Yes  Yes  
UCCSD  Yes  No  
RCCSD(T)  Yes  No  
EOMCCSD  Yes  No  
RMP2  Yes  No  
UMP2  Yes  No  
PBC HF/DFT  Yes  Yes  
PBC TDDFT  Yes  Yes  
PBC Gammapoint CCSD  Yes  Yes  
PBC kpoints CCSD  Yes  No 
FakeERI means to mimic the 4center 2electron repulsion integrals (ERI) by precontracting the DF 3index tensor. This is the simplest way to enable DF integrals, although the fakeERI mechanism may require huge amount of memory also may be slow in performance. It provides the most convenient way to embed the DF integrals in the existing code, thus it is supported by almost every method in PySCF. It is particularly important in the periodic code. Using the fakeERIs allows us to call all quantum chemistry methods developed at molecular level in the \(\Gamma\)point calculations without modifying any existing molecular code. See also the pbc.df — PBC denisty fitting module.
Some methods have native DF implementation. This means the performance of the DF technique has been considered in the code. In these methods, DF approximation generally runs faster than the regular scheme without integral approximation and also consumes less memory or disk space.
When density fitting is enabled in a method, a with_df
object will be
generated and attached to the method object. with_df
is the object to
hold all DF relevant quantiles, such as the DF basis, the file to save the
3index tensor, the amount of memory to use etc. You can modify the attributes
of with_df
to get more control over the DF methods. In the SCF and
MCSCF methods, setting with_df
to None
will switch off the DF
approximation. In the periodic code, all twoelectron integrals are evaluated
by DF approximations. There are four different types of DF schemes
(FFTDF
, AFTDF
, GDF
, MDF
see pbc.df — PBC denisty fitting),
available in the periodic code. By assigning different DF object to
with_df
, different DF schemes can be applied in the PBC calculations.
11.1.1. DF auxiliary basis¶
The default auxiliary basis set are generated by function
pyscf.df.addons.make_basis()
based on the orbital basis specified in
the calculation according to the rules defined in
pyscf.df.addons.DEFAULT_AUXBASIS
.
Specifically, the jkfit basis in the first column is used for HartreeFock or
DFT methods, and the ri basis in the second column is used for correlation
calculations. These optimized auxiliary basis sets are obtained from
http://www.psicode.org/psi4manual/master/basissets_byfamily.html
If optimized auxiliary basis set was not found for the orbital basis set,
eventempered Gaussian functions are generated automatically.
Specifying auxiliary basis is a common requirement in the real applications.
For example, the default auxiliary basis set for the pure DFT calculations may
be over complete since it is designed to represent both the Coulomb and HF
exchange matrix. Coulomb fitting basis such as Weigendcfit basis or
Ahlrichscfit basis are often enough to obtain chemical accuracy. To control
the fitting basis in DF method, You can change the value of
with_df.auxbasis
attribute. The input format of auxiliary fitting basis
is exactly the same to the input format of orbital basis set. For
example:
from pyscf import gto, dft
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='def2tzvp')
mf = dft.RKS(mol)
mf.xc = 'pbe,pbe'
mf.run() # 109.432313679876
mf = mf.density_fit()
mf.run() # 109.432329411505
mf.with_df.auxbasis = 'weigend'
mf.run() # 109.432334646584
More examples for inputing auxiliary basis in the DF calculation can be found
in examples/df/01auxbasis.py
.
11.1.1.1. Eventempered auxiliary Gaussian basis¶
The eventempered auxiliary Gaussian basis is generated by function aug_etb()
:
from pyscf import gto, df
mol = gto.M(atom='N 0 0 0; N 0 0 1.2', basis='ccpvdz')
print(mol.nao_nr()) # 28
auxbasis = df.aug_etb(mol)
print(df.make_auxmol(mol, auxbasis).nao_nr()) # 200
auxbasis = df.aug_etb(mol, beta=1.6)
print(df.make_auxmol(mol, auxbasis).nao_nr()) # 338
Here the make_auxmol()
function converts the auxbasis to a Mole
object which can be used to evaluate the analytical integrals the same way as
the regular Mole
object. The formula to generate the exponents
\(\zeta\) of the eventempered auxiliary basis are
The default value of \(\beta\) is 2.3. \(\alpha\) and the number of auxiliary basis \(n\) is determined based on the orbital basis. Given the orbital basis
the orbital pair on the same center produces a new onecenter basis
The minimal \(\alpha_L\) in all orbital pairs is assigned to \(\alpha\) in (?). Then \(n\) is estimated to make the largest auxiliary exponent \(\zeta\) as close as possible to the maximum \(\alpha_L\). The size of generated eventempered Gaussian basis is typically 5  10 times of the size of the orbital basis, or 2  3 times more than the optimized auxiliary basis. (Note the accuracy of this eventempered auxiliary basis is not fully benchmarked. The error is close to the optimized auxiliary basis in our tests.)
11.1.2. Saving/Loading DF integral tensor¶
Although it is not expensive to compute DF integral tensor in the molecular
calculation, saving/loading the 3index tensor is still useful since it provides
an alternative way, different to the attribute _eri
of meanfield object
(see Customizing Hamiltonian), to customize the Hamiltonian.
In the DFSCF method, the 3index tensor is held in the with_df
object.
The with_df
object (see pyscf.df.df.DF
class) provides
two attributes _cderi_to_save
and _cderi
to access the DF 3index
integrals.
If a DF integral tensor is assigned to _cderi
, the integrals will be
used in the DF calculation. The DF integral tensor can be either a numpy array
or an HDF5 file on disk. When the DF integrals are provided in the HDF5 file,
the integral needs to be stored under the dataset 'j3c'
:
import numpy
import h5py
from pyscf import gto, scf, df
mol = gto.M(atom='H 0 0 0; H 1 0 1; H 0 1 1; H 1 1 0', basis='sto3g')
nao = mol.nao_nr()
with h5py.File('df_ints.h5', 'w') as f:
f['j3c'] = numpy.random.random((10,nao*(nao+1)//2))
mf = scf.RHF(mol).density_fit()
mf.with_df._cderi = 'df_ints.h5'
mf.kernel()
As shown in the above example, the integral tensor \(V_{x,ij}\) provided in
_cderi
should be a 2D array in C (rowmajor) convention. Its first index
corresponds to the auxiliary basis and the second combined index ij
is the
orbital pair index. When load DF integrals, we assumed hermitian symmetry
between the two orbital index, ie only the elements \(i\geq j\) are left in
the DF integral tensor. Thus the DF integral tensor should be a 2D array, with
shape (M,N*(N+1)/2)
, where M
is the number of auxiliary functions, N is
the number of orbitals.
If _cderi
is not specified, the DF integral tensor will be generated
during the calculation and stored to the file that the attribute
_cderi_to_save
points to. By default, it is a random file and the
random file will be deleted if the calculation finishes successfully. You
can find the filename in the output log (when with.verbose > 3
, for example:
******** <class 'pyscf.df.df.DF'> flags ********
auxbasis = None
max_memory = 20000
_cderi_to_save = /scratch/tmp6rGrSD
If the calculation is terminated problematically with error or any other
reasons, you can reuse the DF integrals in the next calculation by assigning
the integral file to _cderi
.
Overwriting _cderi_to_save
with a filename will make the program save the
DF integrals in the given filename regardless whether the calculation is succeed
or failed. See also the example
pyscf/examples/df/10access_df_integrals.py
.
11.1.2.1. Precomputing the DF integral tensor¶
The DF integral tensor can be computed without initialization the
with_df
object. Functions cholesky_eri()
defined in
df.incore
and df.outcore
can generate DF integral tensor in memory
or in a HDF5 file:
from pyscf import gto, df
mol = gto.M(atom='N 0 0 0; N 1 1 1', basis='ccpvdz')
cderi = df.incore.cholesky_eri(mol, auxbasis='ccpvdzjkfit')
df.outcore.cholesky_eri(mol, 'df_ints.h5', auxbasis='ccpvdzjkfit')
These cderi
integrals has the same data structure as the one generated in
with_df
object. They can be directly used in the DF type calculations:
from pyscf import scf
mf = scf.RHF(mol).density_fit()
mf.with_df._cderi = cderi
mf.kernel()
mf.with_df._cderi = 'df_ints.h5'
mf.kernel()
11.1.3. Approximating orbital hessian in SCF and MCSCF¶
Orbital hessian is required by the second order SCF solver or MCSCF solver. In many systems, approximating the orbital hessian has negligible effects to the convergence and the solutions of the SCF or MCSCF orbital optimization procedure. Using DF method to approximate the orbital hessian can improve the overall performance. For example, the following code enables the DF approximation to the orbital hessian in SCF calculation:
from pyscf import gto, scf
mol = gto.M(atom='N 0 0 0; O 0 0 1.5', spin=1, basis='ccpvdz')
mf = scf.RHF(mol).newton().density_fit().run(verbose=4) # converged SCF energy = 129.0896469563
mf = scf.RHF(mol).run(verbose=4) # converged SCF energy = 129.0896469563
The approximation to orbital hessian does not change the SCF result. In the
above example, it produces the same solution to the regular SCF result.
Similarly, when the DF approximation is used with CASSCF orbital hessian, the
CASSCF result should not change. Continuing the above example, we can use the
mcscf.approx_hessian()
function to change the orbital hessian of the given
CASSCF object:
from pyscf import mcscf
mc = mcscf.approx_hessian(mcscf.CASSCF(mf, 8, 11)).run() # 129.283077136
mc = mcscf.CASSCF(mf, 8, 11).run() # 129.283077136
Note
In the second order SCF solver, the order to apply the density_fit
and newton methods affects the character of the resultant SCF object. For
example, the statement mf = scf.RHF(mol).density_fit().newton()
first
produces a DFHF object then enable the second order Newton solver for the DFHF
object. The resultant SCF object is a DFHF object. See more examples in
examples/scf/23decorate_scf.py
11.2. Program reference¶
11.2.1. DF class¶

class
pyscf.df.df.
DF
(mol, auxbasis=None)[source]¶ Object to hold 3index tensor
 Attributes:
 auxbasis : str or dict
 Same input format as
Mole.basis
 auxmol : Mole object
 Read only Mole object to hold the auxiliary basis. auxmol is generated automatically in the initialization step based on the given auxbasis. It is used in the rest part of the code to determine the problem size, the integral batches etc. This object should NOT be modified.
 _cderi_to_save : str
 If _cderi_to_save is specified, the DF integral tensor will be saved in this file.
 _cderi : str or numpy array
 If _cderi is specified, the DF integral tensor will be read from this HDF5 file (or numpy array). When the DF integral tensor is provided from the HDF5 file, it has to be stored under the dataset ‘j3c’. The DF integral tensor \(V_{x,ij}\) should be a 2D array in C (rowmajor) convention, where x corresponds to index of auxiliary basis, and the combined index ij is the orbital pair index. The hermitian symmetry is assumed for the combined ij index, ie the elements of \(V_{x,i,j}\) with \(i\geq j\) are existed in the DF integral tensor. Thus the shape of DF integral tensor is (M,N*(N+1)/2), where M is the number of auxbasis functions and N is the number of basis functions of the orbital basis.
 blockdim : int
 When reading DF integrals from disk the chunk size to load. It is used to improve the IO performance.
11.2.2. df.incore¶

pyscf.df.incore.
aux_e1
(mol, auxmol, intor='int3c2e', aosym='s1', comp=None, out=None)[source]¶ 3center 2electron AO integrals (Lij), where L is the auxiliary basis.
Note aux_e1 is basically analogous to aux_e2 function. It can be viewed as the version of transposed aux_e2 tensor: if comp == 1:
aux_e1 = aux_e2().T else:
 aux_e1 = aux_e2().transpose(0,2,1)
The same arguments as function aux_e2 can be input to aux_e1.

pyscf.df.incore.
aux_e2
(mol, auxmol, intor='int3c2e', aosym='s1', comp=None, out=None, cintopt=None)[source]¶ 3center AO integrals (ijL), where L is the auxiliary basis.
 Kwargs:
 cintopt : Libcint3.14 and newer version support to compute int3c2e
without the opt for the 3rd index. It can be precomputed to reduce the overhead of cintopt initialization repeatedly.
cintopt = gto.moleintor.make_cintopt(mol._atm, mol._bas, mol._env, intor)
11.2.3. df.outcore¶

pyscf.df.outcore.
cholesky_eri
(mol, erifile, auxbasis='weigend+etb', dataname='j3c', tmpdir=None, int3c='int3c2e', aosym='s2ij', int2c='int2c2e', comp=1, max_memory=2000, ioblk_size=256, auxmol=None, verbose=3)[source]¶ 3center 2electron AO integrals
11.2.4. df.addons¶

pyscf.df.addons.
aug_etb
(mol, beta=2.0)[source]¶ To generate the eventempered auxiliary Gaussian basis

pyscf.df.addons.
aug_etb_for_dfbasis
(mol, dfbasis='weigend', beta=2.0, start_at=36)[source]¶ augment weigend basis with eventempered gaussian basis exps = alpha*beta^i for i = 1..N

class
pyscf.df.addons.
load
(eri, dataname='j3c')[source]¶ load 3c2e integrals from hdf5 file. It can be used in the context manager:
 with load(cderifile) as eri:
 print(eri.shape)

pyscf.df.addons.
make_auxbasis
(mol, mp2fit=False)[source]¶ Depending on the orbital basis, generating eventempered Gaussians or the optimized auxiliary basis defined in DEFAULT_AUXBASIS

pyscf.df.addons.
make_auxmol
(mol, auxbasis=None)[source]¶ Generate a fake Mole object which uses the density fitting auxbasis as the basis sets. If auxbasis is not specified, the optimized auxiliary fitting basis set will be generated according to the rules recorded in pyscf.df.addons.DEFAULT_AUXBASIS. If the optimized auxiliary basis is not available (either not specified in DEFAULT_AUXBASIS or the basis set of the required elements not defined in the optimized auxiliary basis), eventempered Gaussian basis set will be generated.
See also the paper JCTC, 13, 554 about generating auxiliary fitting basis.