Carnegie Mellon University
Browse

BASChem19

dataset
posted on 2025-10-22, 20:34 authored by Olexandr IsayevOlexandr Isayev
<p dir="ltr">BASChem19 is a comprehensive benchmark dataset that contains the activation barriers for nineteen important radical and nonradical reactions that are often explored in polymer and chemical industry settings. The first seven reactions consist of neutral closed-shell, polar nucleophilic addition/substitution reactions: the nucleophilic addition of dimethylamine to acetyl chloride (1), methylamine base-catalyzed addition of methanol to acetic acid (2), phenol catalyzed addition of methanol to acetic anhydride (3), phenol catalyzed addition of imidazole to methyl acetate (4), water shuttled addition of N-methylaniline to phenyl isocyanate (5), addition of methanol to acetophenone (6), and the addition of methylamine to benzoic acid (7). The last two of these reactions are intentionally not catalyzed, also to include unfavorable TS in the benchmark series, where the reaction occurs via a four-membered ring. Reactions 8 and 9 represent the alcohol-initiated anionic ring-opening polymerization of ethylene oxide and propylene oxide, respectively. The following five reactions include radical species. The radical beta scission of glucose (10), 2-cyanoprop-2-yl radical hydrogen abstraction from ethyl-2-methylbutyrate (11), addition of the terminal carbon centered methacrylic acid radical to tertiary carbon of 2,2-dimethylbutane (12), intramolecular hydrogen abstraction by a peroxy radical within acrylic acid endoperoxides (13), and the terminal addition of a peroxy radical to acrylic acid (14). The last five reactions include larger, flexible species, and thus, they are representative of more complex reactions. The tertiary amine (TMEDA) catalyzed urethanization of butanol and phenyl isocyanate is given in reaction 15. Reaction 16 represents the rate-determining step within a complex cascade of consecutive reaction steps converting pyromellitic anhydride and phenyl isocyanate to form an imide. In an intramolecular reaction, CO2 is released after the nucleophilic addition of the deprotonated amine to the anhydride unit. The last three reactions describe important steps from the organocatalytic route to endo-vinylene carbonates from carbon dioxide-based exo-vinylene carbonates, including base-catalyzed nucleophilic attack of phenol to vinylene carbonate (17), ring-opening (18), and hydrogen transfer reactions (19).</p><p><br></p><p dir="ltr">All structures for reactant, product, and TS were optimized with TPSS-D3/def2-TZVP in TURBOMOLE:</p><p><br></p><p dir="ltr">#TPSS-D3/def2-TZVP gas-phase grid m4 geometries and energies.</p><p><br></p><p>-------------------------------------------</p><p dir="ltr">DATA & FILE OVERVIEW</p><p>-------------------------------------------</p><p dir="ltr"><br></p><p dir="ltr"><br></p><p dir="ltr">Directory of Files:</p><p dir="ltr"><br></p><p dir="ltr">  A. acid-chloride_N-methylmethanamine/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for the nucleophilic addition of dimethylamine to acetyl chloride.</p><p dir="ltr"><br></p><p dir="ltr">  B. acid_methanol/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for methylamine base-catalyzed addition of methanol to acetic acid.</p><p dir="ltr"><br></p><p dir="ltr">  C. anhydride_methanol/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for phenol catalyzed addition of methanol to acetic anhydride.</p><p dir="ltr"><br></p><p dir="ltr">  D. ester_imidazole/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for phenol catalyzed addition of imidazole to methyl acetate.</p><p dir="ltr"><br></p><p dir="ltr">  E. imide_1/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for water shuttled addition of N-methylaniline to phenyl isocyanate.</p><p dir="ltr"><br></p><p dir="ltr">  F. 1-phenylethanone_methanol/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for addition of methanol to acetophenone.</p><p dir="ltr"><br></p><p dir="ltr">  G. amine_benzoicacid/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for the addition of methylamine to benzoic acid.</p><p dir="ltr"><br></p><p dir="ltr">  H. eo_eo/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for alcohol-initiated anionic ring-opening polymerization of ethylene oxide.</p><p dir="ltr"><br></p><p dir="ltr">  I. po_po/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for alcohol-initiated anionic ring-opening polymerization of propylene oxide.</p><p dir="ltr"><br></p><p dir="ltr">  J. glucose_beta/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for radical beta scission of glucose.</p><p dir="ltr"><br></p><p dir="ltr">  K. AIBN-rad_ethylacrylate-bb/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for 2-cyanoprop-2-yl radical hydrogen abstraction from ethyl-2-methylbutyrate.</p><p dir="ltr"><br></p><p dir="ltr">  L. isobuten-bb_methacylicacid-rad-c/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for addition of the terminal carbon centered methacrylic acid radical to tertiary carbon of 2,2-dimethylbutane.</p><p dir="ltr"><br></p><p dir="ltr">  M. AA-OO-AA_OO-rad_1_H-shift/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for intramolecular hydrogen abstraction by a peroxy radical within acrylic acid endoperoxides.</p><p dir="ltr"><br></p><p dir="ltr">  N. AA+AA-OO-AA_OO-rad_1_terminal-addition/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for terminal addition of a peroxy radical to acrylic acid.</p><p dir="ltr"><br></p><p dir="ltr">  O. Urethanization/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for The tertiary amine (TMEDA) catalyzed urethanization of butanol and phenyl isocyanate.</p><p dir="ltr"><br></p><p dir="ltr">  P. imide_3/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for the rate-determining step within reaction steps converting pyromellitic anhydride and phenyl isocyanate to form an imide.</p><p dir="ltr"><br></p><p dir="ltr">  Q. philipp_ts1/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for base-catalyzed nucleophilic attack of phenol to vinylene carbonate for the organocatalytic route to endo-vinylene carbonates from carbon dioxide-based exo-vinylene carbonates.</p><p dir="ltr"><br></p><p dir="ltr">  R. philipp_ts2m/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for ring-opening in organocatalytic route to endo-vinylene carbonates from carbon dioxide-based exo-vinylene carbonates.</p><p dir="ltr"><br></p><p dir="ltr">  S. philipp_ts3/</p><p dir="ltr">  Short description: Contains MIN.xyz and TS.xyz with embedded energy values for barrier computation for hydrogen transfer reaction in organocatalytic route to endo-vinylene carbonates from carbon dioxide-based exo-vinylene carbonates.</p><p dir="ltr"><br></p><p dir="ltr">  T. BASChem19_summary_from_xyz.csv</p><p dir="ltr">  Short description: Computed activation barriers in kcal/mol based on energies extracted from TS.xyz and MIN.xyz files.</p><p dir="ltr"><br></p><p dir="ltr">Additional Notes on File Relationships, Context, or Content:</p><p dir="ltr">  In addition to `MIN.xyz` and `TS.xyz`, each directory may include:</p><p dir="ltr">  - `res.txt`: Reaction barrier in kcal/mol.</p><p dir="ltr">  - `gradient`: Cartesian energy gradient file (TURBOMOLE format) used in optimization.</p><p dir="ltr">  - `energy`: Total single-point electronic energy (in Hartree), also embedded in `.xyz`.</p><p dir="ltr">  - `uhf.txt`: Relevant to open-shell systems; contains spin multiplicity information.</p><p dir="ltr"> - `chrg.txt`: Relevant to charged systems; contains charge information.</p><p> </p><p dir="ltr">  This dataset is used to evaluate ML models for reaction barrier prediction. Each folder contains structures and energy data for a single reaction from the BASChem19 benchmark.</p><p> </p><p dir="ltr">File Naming Convention:</p><p dir="ltr"> - `MIN.xyz`: Optimized geometry and energy for the reactant.</p><p dir="ltr"> - `TS.xyz`: Optimized geometry and energy for the transition state.</p><p dir="ltr"> - `res.txt`: Reaction barrier in kcal/mol.</p><p dir="ltr"> - `gradient`: Gradient file used for optimization.</p><p dir="ltr"> - `energy`: Redundant single-point energy file.</p><p dir="ltr"> - `uhf.txt`: Optional information for open-shell calculations.</p><p dir="ltr"> - `chrg.txt`: Optional information for charged reactant and transition state.</p><p dir="ltr"><br></p><p dir="ltr"><br></p><p>--------------------------------------------------------</p><p dir="ltr">DATA DESCRIPTION FOR: BASChem19_summary_from_xyz.csv</p><p>--------------------------------------------------------</p><p dir="ltr"><br></p><p dir="ltr">1. Number of variables: 4</p><p dir="ltr"><br></p><p dir="ltr">2. Number of cases/rows: 19</p><p dir="ltr"><br></p><p dir="ltr">3. Missing data codes: </p><p dir="ltr"> N/A</p><p dir="ltr"><br></p><p dir="ltr">4. Variable List </p><p dir="ltr"><br></p><p dir="ltr"> A. Name: Reaction </p><p dir="ltr"> Description: Folder name describing the chemical reaction</p><p dir="ltr"><br></p><p dir="ltr"> B. Name: TS </p><p dir="ltr"> Description: Total electronic energy (in Hartree) from TS.xyz</p><p dir="ltr"><br></p><p dir="ltr"> C. Name: reactant </p><p dir="ltr"> Description: Total electronic energy (in Hartree) from MIN.xyz</p><p dir="ltr"><br></p><p dir="ltr"> D. Name: Activation_Barrier_kcal/mol </p><p dir="ltr"> Description: (TS energy − reactant energy) × 627.509</p><p dir="ltr"><br></p><p dir="ltr"><br></p><p>--------------------------------------</p><p dir="ltr">METHODOLOGICAL INFORMATION</p><p>--------------------------------------</p><p dir="ltr"><br></p><p dir="ltr">1. Software-specific information:</p><p dir="ltr"><br></p><p dir="ltr">Name: TURBOMOLE </p><p dir="ltr">Version: 7.7.1 </p><p dir="ltr">System Requirements: Linux-based HPC cluster </p><p dir="ltr">Open Source? (Y/N): N </p><p dir="ltr">Developer: TURBOMOLE GmbH </p><p dir="ltr">Product URL: https://www.turbomole.org </p><p dir="ltr"><br></p><p dir="ltr">All geometries and energies were calculated at the TPSS-D3(BJ)/def2-TZVP level of theory in the gas phase. </p><p dir="ltr">The m4 integration grid was used along with tight convergence settings (energy 8, gcart 4, scfconv 8, denconv 1e-8).</p><p dir="ltr"><br></p><p dir="ltr">Name: xtb (for CREST) </p><p dir="ltr">Version: 6.7 </p><p dir="ltr">System Requirements: Unix-like systems </p><p dir="ltr">Open Source? (Y/N): Y </p><p dir="ltr">Developer: Grimme Group </p><p dir="ltr">Product URL: https://github.com/grimme-lab/xtb </p><p dir="ltr"><br></p><p dir="ltr">Name: CREST </p><p dir="ltr">Version: 2.12 </p><p dir="ltr">System Requirements: Same as xtb </p><p dir="ltr">Open Source? (Y/N): Y </p><p dir="ltr">Developer: Grimme Group </p><p dir="ltr">Product URL: https://github.com/grimme-lab/crest </p><p dir="ltr"><br></p><p dir="ltr">2. Equipment-specific information: </p><p dir="ltr"> N/A — Data is computational</p><p dir="ltr"><br></p><p dir="ltr">3. Date of data collection: </p><p> 20230101 - 20241001</p><p dir="ltr"><br></p>

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC