SemiCond_DPA_v1_0
Jianchuan Liu
Description

README

Introduction

Description

This data set is generated with DPGEN [1,2], a concurrent learning scheme that generates the DP training dataset iteratively.

The set of data contains structure-to-energy-force labels for 20 semiconductors, namely, Si, Ge, SiC, BAs, BN, AlN, AlP, AlAs, InP, InAs, InSb, GaN, GaP, GaAs, CdTe, InTe-In2Te3, CdSe-CdSe2, InSe-In2Se3, ZnS, CdS-CdS2. The total number of frames is 151,715.

DFT

The DFT-based electronic structure calculations are computed by ABACUS with numerical atomic orbitals and used triple-zeta plus double polarization (TZDP) basis sets. The generalized gradient approximation (GGA) in the form of the Perdew–Burke–Ernzerhof (PBE) was used for the exchange–correlation functional.

The norm-conserving pseudopotentials were employed and the valence configurations were [B]2s2^22p1^1, [C]2s2^22p2^2, [N]2s2^22p3^3, [Al]3s2^23p1^1, [Si]3s2^23p2^2, [P]3s2^23p3^3, [S]3s2^23p4^4, [Zn]3s2^23p6^63d10^{10}4s2^2, [Ga]4s2^24p1^1, [Ge]4s2^24p2^2, [As]4s2^24p3^3, [Se]4s2^24p4^4, [Cd]4s2^24p6^64d10^{10}5s2^2, [In]4d10^{10} 5s2^25p1^1, [Sb]5s2^25p3^3, and [Te]4d10^{10} 5s2^25p4^4, respectively.

The numerical atomic orbitals were 3s3p2d for B, 3s3p2d for C, 3s3p2d for N, 3s3p2d for Al, 3s3p2d for Si, 3s3p2d for P, 3s3p2d for S, 6s3p3d2f for Zn, s3p2d for Ga, 3s3p2d for Ge, 3s3p2d for As, 3s3p2d for Se, 6s3p3d2f for Cd, 3s3p3d2f for In, 3s3p2d for Sb, and 3s3p3d2f for Te, respectively.

The cutoff orbitals are set to 9 a.u.. The energy cutoff is set to 100 Ry. The K-points are set using the Monkhorst–Pack mesh with the spacing 0.08 1/a.u.. We employed the Gaussian smearing method with the smearing widths of 0.002 Ry. The electronic iteration convergence threshold was set to 106^{-6}.

Data generation

se_e2_a model is used in DP-GEN. DP-GEN temperature upto 1 Tm.

Data Format

The directory tree is as follows:

sys_dir (optional) -- train.txt -- valid.txt sys_dir_downstream (optional) -- train.txt -- valid.txt *other_data_dirs

The dataset encompasses upstream and/or downstream splits, which are used for training DPA [3] models. Each system is represented as a single line in *.txt files in the DeePMD [4] format for training/validation. To obtain system paths from unstructured *other_data_dirsother\_data\_dirs, one can refer to these *.txt files.

A pbc unit system in DeePMD format usually has the following substructure:

-- system -- -- type_map.raw -- -- type.raw -- -- set.000/box.npy -- -- set.000/coord.npy -- -- set.000/energy.npy -- -- set.000/force.npy -- -- set.000/virial.npy

Format Description

NamePropertyRaw fileUnitShapeDescription
typeAtom type indexestype.rawNatomsIntegers that start with 0, represent the atomic type corresponding to type_map.raw
type_mapAtom type namestype_map.rawNtypesAtom names that map to atom type, which is unnecessart to be contained in the periodic table
coordAtomic coordinatescoord.npyÅNframes * Natoms * 3The atomic coordinates
boxBoxesbox.npyÅNframes * 3 * 3The box axes in the order XX XY XZ YX YY YZ ZX ZY ZZ
energyFrame energiesenergy.npyeVNframesThe potential energy of snapshot
forceAtomic forcesforce.npyeV/ÅNframes * Natoms * 3The atomic forces
virialFrame virialvirial.npyeVNframes * 9The virial frames are in the order XX XY XZ YX YY YZ ZX ZY ZZ

Check here for more details.

References

[1] Zhang, Linfeng, et al. "Active learning of uniformly accurate interatomic potentials for materials simulation." Physical Review Materials 3.2 (2019): 023804.

[2] Zhang, Yuzhi, et al. "DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models." Computer Physics Communications 253 (2020): 107206.

[3] Zhang, Duo, et al. "DPA-1: Pretraining of Attention-based Deep Potential Model for Molecular Simulation." arXiv preprint arXiv:2208.08236 (2022).

[4] Wang, Han, et al. "DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics." Computer Physics Communications 228 (2018): 178-184.

Files
SemiCond.tar.gz
2024-03-06 11:50:06
758.73 MB
License
CC BY 4.0Learn more
Dataset Information
Publisher:
iProzd
Published:
2023-11-11
Downloads:
56