Manual

openabc.forcefields

class CGModel

The general class with general methods that can be inherited by any other CG model classes.

Each child class of CGModel has to set self.bonded_attr_names, which is used to append molecules and set rigid bodies.

add_plumed(plumed_script_path)

Add OpenMM plumed.

Parameters

plumed_script_pathstr: Input plumed script path.

Reference

PLUMED website: https://www.plumed.org/

add_reporters(report_interval, output_dcd='output.dcd', report_dcd=True, report_state_data=True)

Add reporters for OpenMM simulation.

Parameters

report_intervalint: Report dcd and report state interval.
output_dcdstr: Output dcd file path.
report_dcdbool: Whether to output dcd file.
report_state_databool: Whether to output simulation state and data.

append_mol(new_mol, verbose=False)

The method can append new molecules by concatenating atoms and bonded interaction information saved in dataframes.

Parameters

new_mola consistent parser object or CG model object: The object of a new molecule including atom and bonded interaction information.
verbosebool: Whether to report the appended attributes.

atoms_to_pdb(cg_pdb, reset_serial=True)

Save atoms to pdb as topology will be read from the pdb file. PDB file is required for OpenMM simulation.

Parameters

cg_pdbstr: Output path for CG PDB file.
reset_serialbool: Reset serial to 1, 2, …

create_system(top, use_pbc=True, box_a=500, box_b=500, box_c=500, remove_cmmotion=True)

Create OpenMM system for simulation. Need to further add forces to this OpenMM system.

Parameters

topOpenMM Topology: The OpenMM topology.
use_pbcbool: Whether to use periodic boundary condition (PBC).
box_afloat or int: Box length along x-axis.
box_bfloat or int: Box length along y-axis.
box_cfloat or int: Box length along z-axis.
remove_cmmotionbool: Whether to add CMMotionRemover to remove center of mass motions.

move_COM_to_box_center(): Move center of mass (COM) to simulation box center.

save_state(state_xml='state.xml')

Save simulation state to readable xml format.

Parameters

state_xmlstr: Output state xml file path.

save_system(system_xml='system.xml')

Save system to readable xml format.

Parameters

system_xmlstr: Output system xml file path.

set_rigid_bodies(rigid_coord, rigid_bodies, keep_unchanged=[])

Set rigid bodies and remove bonded interactions within the same rigid body.

Directly run createRigidBodies if the user does not want to change any bonded interactions or exclusions.

Parameters

rigid_coordQuantity, shape = (n_atoms, 3): Atom coordinates to build rigid body restrictions.
rigid_bodieslist: A list includes many sublists. Atoms in each sublist are recognized as one rigid body.
keep_unchangedlist: A list including attribute names that user intends to keep unchanged.

set_simulation(integrator, platform_name='CPU', properties={'Precision': 'mixed'}, init_coord=None)

Set OpenMM simulation.

Parameters

integratorOpenMM Integrator: OpenMM integrator.
platform_namestr: OpenMM simulation platform name. It can be one from: Reference or CPU or CUDA or OpenCL.
propertiesdict: OpenMM simulation platform properties.
init_coordNone or array-like: Initial coordinate.

class MOFFMRGModel

Bases: CGModel

A class for MOFF+MRG model that represents a mixture of MOFF proteins and MRG DNA.

add_all_default_forces(): Add all the forces with default settings.

add_contacts(eta=Quantity(value=0.7, unit=/angstrom), r0=Quantity(value=8, unit=angstrom), cutoff=Quantity(value=2.0, unit=nanometer), alpha_protein_dna=0.0016264, alpha_dna_dna=1.678e-05, epsilon_protein_dna=0, epsilon_dna_dna=0, force_group=8)

Add nonbonded contacts for MOFF protein and MRG DNA.

For amino acids, the CA atom type indices are 0-19, and CG nucleotide atom type index is 20.

The potential expression is: alpha/r^12 - 0.5*epsilon*(1 + tanh(eta*(r0 - r)))

Parameters

etaQuantity: Parameter eta.
r0Quantity: Parameter r0.
cutoffQuantity: Cutoff distance
alpha_protein_dnafloat or int: Protein-DNA interaction parameter alpha.
alpha_dna_dnafloat or int: DNA-DNA interaction parameter alpha.
epsilon_protein_dnafloat or int: Protein-DNA interaction parameter epsilon.
epsilon_dna_dnafloat or int: DNA-DNA interaction parameter epsilon.
force_groupint: Force group.

add_dna_angles(force_group=6)

Add DNA angles.

Parameters

force_groupint: Force group.

add_dna_bonds(force_group=5)

Add DNA bonds.

Parameters

force_groupint: Force group.

add_dna_fan_bonds(force_group=7)

Add DNA fan bonds.

Parameters

force_groupint: Force group.

add_elec_switch(salt_conc=Quantity(value=150.0, unit=millimolar), temperature=Quantity(value=300.0, unit=kelvin), cutoff1=Quantity(value=1.2, unit=nanometer), cutoff2=Quantity(value=1.5, unit=nanometer), switch_coeff=[1, 0, 0, -10, 15, -6], add_native_pair_elec=True, force_group=9)

Add electrostatic interaction with switch function.

The switch function switches potential to zero within range cutoff1 < r <= cutoff2.

Parameters

salt_concQuantity: Monovalent salt concentration.
temperatureQuantity: Temperature.
cutoff1Quantity: Cutoff distance 1.
cutoff2Quantity: Cutoff distance 2.
switch_coeffsequence-like: Switch function coefficients.
add_native_pair_elecbool: Whether to add electrostatic interactions between native pairs.
force_groupint: Force group.

add_native_pairs(force_group=4)

Add native pairs.

Parameters

force_groupint: Force group.

add_protein_angles(threshold=2.2689280275926285, clip=False, force_group=2, verbose=True)

Add protein angles.

Note that the angle potential is a harmonic angle potential, which may lead to unstable simulation if harmonic potential center theta0 is too large.

Based on some tests, theta0 <= 130 degrees (130*np.pi/180 radians) can facilitate 10 fs timestep.

Parameters

thresholdfloat or int: The suggested upper limit value of theta0.
clipbool: Whether to change theta0 values larger than threshold to threshold. If True, all the theta0 values larger than threshold will be changed to threshold. If False, do not change theta0 values.
force_groupint: Force group.
verbose: bool: Whether to show warnings if theta0 is larger than the threshold.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint: Force group.

add_protein_dihedrals(force_group=3)

Add protein dihedrals.

Parameters

force_groupint: Force group.

class HPSModel

Bases: CGModel

The class for HPS model that represents a mixture of HPS model proteins.

This class inherits CGModel class.

add_all_default_forces(): Add all the forces with default settings.

add_contacts(hydropathy_scale='Urry', epsilon=0.8368000000000001, mu=1, delta=0.08, force_group=2)

Add nonbonded contacts.

The raw hydropathy scale is scaled and shifted by: mu*lambda - delta

Parameters

hydropathy_scalestr: Hydropathy scale, can be KR or Urry.
epsilonfloat or int: Contact strength.
mufloat or int: Hydropathy scale factor.
deltafloat or int: Hydropathy shift factor.
force_groupint: Force group.

add_dh_elec(ldby=Quantity(value=1, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=3)

Add Debye-Huckel electrostatic interactions.

Parameters

ldbyQuantity: Debye length.
dielectric_waterfloat or int: Dielectric constant of water.
cutoffQuantity: Cutoff distance.
force_groupint: Force group.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint: Force group.

class MpipiModel

Bases: CGModel

The class for Mpipi protein and RNA models. This class inherits CGModel class. We follow the parameters provided in the Mpipi LAMMPS input file.

add_contacts(cutoff_to_sigma_ratio=3, force_group=3): Add nonbonded contacts, which is of Wang-Frenkel functional form.

Parameters

add_dh_elec(ldby=Quantity(value=0.7936507936507936, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=4)

Add Debye-Huckel electrostatic interactions.

Parameters

ldbyQuantity: Debye length.
dielectric_waterfloat or int: Dielectric constant of water.
cutoffQuantity: Cutoff distance.
force_groupint: Force group.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint: Force group.

add_rna_bonds(force_group=2)

Add RNA bonds.

Parameters

force_groupint: Force group.

openabc.forcefields.parsers

class MOFFParser(atomistic_pdb, ca_pdb, default_parse=True)

MOFF protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize a MOFF model protein from atomistic pdb.

Parameters

atomistic_pdbstr: Atomistic pdb file path.
ca_pdbstr: CA pdb file path.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

Returns

resultclass instance: A class instance.

parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)

Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.
exclude14bool: Whether to exclude nonbonded interactions between 1-4 atom pairs.
exclude_native_pairsbool: Whether to exclude nonbonded interactions between native pairs.

parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.25, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, ss=None, ordered_ss_names=['H', 'E'])

Parse molecule.

Parameters

get_native_pairsbool: Whether to get native pairs from atomistic pdb with shadow algorithm.
frameint: Frame index to compute native structure parameters for both CG and atomistic models.
radiusfloat or int: Shadow algorithm radius.
bonded_radiusfloat or int: Shadow algorithm bonded radius.
cutofffloat or int: Shadow algorithm cutoff.

box : None or np.ndarray Specify box shape. Note setting use_pbc as False is only compatible with orthogonal box. If use_pbc is False, then set box as None. If use_pbc is True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool: Whether to use periodic boundary condition (PBC) for searching native pairs.
exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.
exclude14bool: Whether to exclude nonbonded interactions between 1-4 atom pairs.
exclude_native_pairsbool: Whether to exclude nonbonded interactions between native pairs.
mass_dictdict: Mass dictionary.
charge_dictdict: Charge dictionary.
ssNone or sequence-like: Secondary structure type of each residue. If None, no secondary structure information is provided and all the native pairs are kept. If not None, then only native pairs within continuous ordered domains are kept.
ordered_ss_namessequence-like: The names of ordered secondary structures. The default value is [‘H’, ‘E’], where ‘H’ represents alpha-helix, and ‘E’ represents beta strand.

class MRGdsDNAParser(cg_pdb, default_parse=True)

MRG dsDNA parser. Note this parser only works on one dsDNA!

classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, write_TER=False, default_parse=True)

Initialize an MRG model dsDNA from atomistic pdb.

Parameters

atomistic_pdbstr: Path for the atomistic dsDNA pdb file.
cg_pdbstr: Output path for the CG dsDNA pdb file.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

Returns

resultclass instance: A class instance.

parse_exclusions(exclude12=True, exclude13=True)

Parse nonbonded exclusions based on bonds and angles. Note nonbonded interactions are not excluded for atom pairs with fan bonds.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.

parse_mol(exclude12=True, exclude13=True, bonded_energy_scale=0.9, mass=325, charge=-1)

Parse molecule.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.
bonded_energy_scalefloat or int: Scale factor for all the bonded energies.
massfloat or int: Mass of CG nucleotide bead.
chargefloat or int: Charge of each CG nucleotide bead.

class HPSParser(ca_pdb, default_parse=True)

HPS protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an HPS model protein from atomistic pdb.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.
ca_pdbstr: Output path for the CA pdb file.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

Returns

resultclass instance: A class instance.

parse_mol(exclude12=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.5, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atoms.
mass_dictdict: Mass dictionary.
charge_dictdict: Charge dictionary.

class MpipiProteinParser(ca_pdb, default_parse=True)

Mpipi protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an Mpipi model protein from atomistic pdb.

Parameters

atomistic_pdbstr: Atomistic pdb file path.
ca_pdbstr: Output CA pdb file path.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atoms.
mass_dictdict: Mass dictionary.
charge_dictdict: Charge dictionary.

class MpipiRNAParser(cg_pdb, default_parse=True)

Mpipi RNA parser.

classmethod from_sequence(seq, cg_pdb, write_TER=False, default_parse=True)

Initialize an Mpipi model ssRNA from sequence. The initial structure is a straight chain.

Parameters

seqstr or sequence-like: Input RNA sequence.
cg_pdbstr: Output CG RNA pdb file path.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

Returns

resultclass instance: A class instance.

parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
mass_dictdict: Mass dictionary.
chargefloat or int: Nucleotide charge.

class SMOGParser(atomistic_pdb, ca_pdb, default_parse=True)

SMOG protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an HPS model protein from atomistic pdb.

Parameters

atomistic_pdbstr: Path for the input atomistic pdb file.
ca_pdbstr: Path for the output CA pdb file.
write_TERbool: Whether to write TER between two chains.
default_parsebool: Whether to parse with default settings.

parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)

Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.

Parameters

exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.
exclude14bool: Whether to exclude nonbonded interactions between 1-4 atom pairs.
exclude_native_pairsbool: Whether to exclude nonbonded interactions between native pairs.

parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.0788, 'ARG': 156.1875, 'ASN': 114.1038, 'ASP': 115.0886, 'CYS': 103.1388, 'GLN': 128.1307, 'GLU': 129.1155, 'GLY': 57.0519, 'HIS': 137.1411, 'ILE': 113.1594, 'LEU': 113.1594, 'LYS': 128.1741, 'MET': 131.1926, 'PHE': 147.1766, 'PRO': 97.1167, 'SER': 87.0782, 'THR': 101.1051, 'TRP': 186.2132, 'TYR': 163.176, 'VAL': 99.1326}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.0, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, bonded_energy_scale=2.5)

Parse molecule.

Parameters

get_native_pairsbool: Whether to get native pairs from atomistic pdb with shadow algorithm.
frameint: Frame index to compute native configuration parameters for both CG and atomistic models.
radiusfloat or int: Shadow algorithm radius.
bonded_radiusfloat or int: Shadow algorithm bonded radius.
cutofffloat or int: Shadow algorithm cutoff.

box : None or np.ndarray Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool: Whether to use periodic boundary condition (PBC) for searching native pairs.
exclude12bool: Whether to exclude nonbonded interactions between 1-2 atom pairs.
exclude13bool: Whether to exclude nonbonded interactions between 1-3 atom pairs.
exclude14bool: Whether to exclude nonbonded interactions between 1-4 atom pairs.
exclude_native_pairsbool: Whether to exclude nonbonded interactions between native pairs.
mass_dictdict: Mass dictionary.
charge_dictdict: Charge dictionary.
bonded_energy_scalefloat or int: Bonded energy additional scale factor.

class DNA3SPN2Parser(dna_type='B_curved')

DNA 3SPN2 parser. For B-curved DNA, the parser works best for a single strand ssDNA or WC-paired dsDNA. Please ensure the parsed DNA has unique chainID for each chain.

static aa_to_cg(aa_atoms, PSB_order=True)

Convert DNA all-atom structure to CG structure. Both input and output structures are saved as pandas dataframe. Notably, this function gives each residue in the output CG model a unique resSeq (index starts from 0). Using unique resSeq can be helpful when comparing with x3dna built template as chainID may be different. This code is long and specific to 3SPN2 model, so we put it here instead of in helper_functions.py.

Parameters

aa_atomspd.DataFrame: Input all-atom structure.
PSB_orderbool: Whether to ensure CG atom order in each nucleotide is P-S-B.

Returns

cg_atomspd.DataFrame: Output CG structure.

build_x3dna_template(temp_name='dna')

Build template DNA structure with x3dna. We do not need to specify whether to use PSB order, since template atom order is aligned with self.atoms. If self.atoms is one dsDNA molecule and the target sequence is W-C paired, x3dna input sequence is the sequence of the first ssDNA. Else, use the full sequence of the DNA as x3dna input sequence.

Parameters

temp_namestr: Name for built template files.

Returns

temp_atomspd.DataFrame: X3DNA built CG DNA template structure.

static change_sequence(cg_atoms, sequence)

Change DNA sequence. Note sequence includes the sequence of all the ssDNA chains.

Parameters

cg_atomspd.DataFrame: DNA CG structure.
sequencestr: New DNA sequence.

Returns

new_cg_atomspd.DataFrame: DNA CG structure with the new sequence.

classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, PSB_order=True, new_sequence=None, dna_type='B_curved', default_parse=True, temp_from_x3dna=True, temp_name='dna', input_temp=None)

Create object from atomistic pdb file. Ensure each chain in input pdb_file has unique chainID. If a new sequence is provided, then the CG DNA topology is from input pdb file, but using the new sequence.

Parameters

atomistic_pdbstr: Path for the input atomistic pdb file.
cg_pdbstr: Path for the output CG pdb file.
PSB_orderbool: Whether to ensure CG atom order in each nucleotide is P-S-B.
new_sequenceNone or str: The new full DNA sequence. If None, keep the original sequence.
dna_typestr: DNA type.
default_parsebool: Whether to parse molecule with default settings.
temp_from_x3dnabool: Whether to use template built by x3dna. If False, use the input pdb file as the template.
temp_namestr: Template file names.
input_tempNone or pd.DataFrame: Additional input template.

get_sequence()

Get all ssDNA sequence from CG atoms as a string. Note each chain has to possess a unique chainID.

Returns

sequencestr: DNA sequence.

get_sequence_list()

Get all ssDNA sequence from self.atoms as a list. Note each chain has to possess a unique chainID.

Returns

sequence_listlist: A list including multiple strings. Each string is the sequence of one chain.

static get_sequence_list_from_cg_atoms(cg_atoms)

Get all ssDNA sequence from CG atoms as a list. Since we use groupby to group by chainID, each chain has to possess a unique chainID.

Parameters

cg_atomspd.DataFrame: CG DNA atoms.

Returns

sequence_listlist: A list including multiple strings. Each string is the sequence of one chain.

parse_mol(temp_from_x3dna=True, temp_name='dna', input_temp=None)

Parse molecule with given template.

Parameters

temp_from_x3dnabool: Whether to use x3dna template.
temp_namestr: Name for x3dna built template files.
input_tempNone or pd.DataFrame: Additional input template. Use input_temp only if (not ((self.dna_type == ‘B_curved’) and temp_from_x3dna)) and (input_temp is not None).

openabc.utils

align_protein(input_pdbfile, reference_pdbfile, output_pdbfile, reference_atoms='CA')

Align a protein with the reference protein accoding to positions of alpha carbons.

Parameters

input_pdbfilestr: Input pdb file to be aligned.
reference_pdbfilestr: Pdb file that be used as reference. Note ‘input_pdbfile’ and ‘reference_pdbfile’ must represent the same system.
output_pdbfilestr: Aligned pdb file.
reference_atomsstr, default=’CA’: Name of the atoms that used as reference atoms. Currently only support alpha carbons.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

combine_proteins(pdbfile_lis, output_pdbflie)

Merge multiple pdb files into a single pdb file.

Parameters

pdbfile_lisstr: List that contains name of pdb files to be merged.
output_pdbfilestr: Merged pdb file.

multiple_chains_CA2AA(input_pdbfile, num_chains, num_residues, REMO_path, debug=False)

Reconstruct all-atom representation of a multiple-chain protein from its alpha carbons. If the name of input file is ‘XXX.pdb’, then output will be saved as ‘XXX_AA.pdb’ in the folder where ‘XXX.pdb’ is located.

Parameters

input_pdbfilestr: Input pdb file.
num_chainslist: Number of protein chains in the system. Each element of the list represent one type of protein.
num_residueslist: Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.
REMO_pathstr: Path where REMO program locates.
debugbool, default=False: If True, temporary files will not be deleted.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

single_chain_CA2AA(input_pdbfile, output_pdbfile, REMO_path, debug=False)

Reconstruct all-atom representation for a single-chain protein from its alpha carbons using REMO.

Parameters

input_pdbfilestr: Input pdb file that contains one protein chain.
output_pdbfilestr: output pdb file.
REMO_pathstr: Path where REMO program locates.
debugbool, default=False: If True, temporary files will not be deleted.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

split_protein(input_pdbfile, num_chains, num_residues, output_path)

Split a PDB file containing multiple chains into individual files, each containing one chain.

Parameters

input_pdbfilestr: Input pdb file.
num_chainslist: Number of protein chains in the system. Each element of the list represent one type of protein.
num_residueslist: Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.
output_pathstr: The folder where user wants to save output pdb files.

atomistic_pdb_to_ca_pdb(atomistic_pdb, ca_pdb, write_TER=False)

Convert atomistic pdb to protein CA pdb.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.
ca_pdbstr: Output path for the CA pdb file.
write_TERbool: Whether to write TER between two chains.

atomistic_pdb_to_nucleotide_pdb(atomistic_pdb, cg_nucleotide_pdb, write_TER=False)

Convert DNA or RNA atomistic pdb to nucleotide pdb (i.e. one CG bead per nucleotide). The position of each CG nucleotide bead is the geometric center of all the nucleotide atoms in the pdb.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.
cg_nucleotide_pdbstr: Output path for the CG pdb file.
write_TERbool: Whether to write TER between two chains.

build_straight_CA_chain(sequence, r0=0.38)

Build a straight portein CA atom chain with given sequence.

Parameters

sequencestr: Protein chain sequence (1 letter for each amino acid).
r0: float or int: Distance in unit nm between two neighboring CA atoms.

Returns

df_atomspd.DataFrame: A pandas dataframe includes atom information.

build_straight_chain(n_atoms, chainID, r0)

Build a straight chain. Each atom is viewed as one residue.

Parameters

n_atomsint: The number of atoms along the chain.
chainIDfloar or int or str: Chain ID.
r0float or int: Distance in unit nm between neighboring atoms along the chain.

Returns

df_atomspd.DataFrame: A pandas dataframe includes atom information.

compute_PE(traj, system, platform_name='Reference', properties={'Precision': 'double'}, groups=-1)

Compute the potential energy of samples in the trajectory with the given openmm system.

Parameters

trajmdtraj.Trajectory: The trajectory including all the samples.
systemopenmm.System: The openmm system.
platform_namestr: The platform name for running openmm to evaulate energy.
propertiesdict: Properties for running openmm. Only used when platform_name is CUDA or OpenCL.
groupsint or set: The force groups included when computing energy. See openmm context.getState for details about this parameter.

Returns

energynp.ndarray, shape = (traj.n_frames,): The energy of each sample in the trajectory in unit kJ/mol.

compute_rg(coord, mass)

Compute radius of gyration.

Parameters

coordnp.ndarray, shape = (n_atoms, 3): Input atom coordinates.
massnp.ndarray, shape = (n_atoms,): Input atom mass.

Returns

rgfloat: Radius of gyration.

fix_pdb(pdb_file)

Fix pdb file with pdbfixer. The fixed structure is output as pandas dataframe. This function is adapted from open3SPN2.

Parameters

pdb_filestr: PDB file path.

Returns

atomspd.DataFrame: Output fixed structure.

get_WC_paired_sequence(sequence)

Get WC-paired sequence.

Parameters

sequencestr: Input sequence.

Returns

paired_sequencestr: W-C paired sequence.

make_mol_whole(coord, box_a, box_b, box_c)

Use this function to make the molecule whole. This is useful to recover the whole molecule if it is split by box boundary. The position of the first atom is kept. The other atoms are moved so that each one is in the closest periodic image relative to the previous one.

Parameters

coordnp.ndarray, shape = (n_atoms, 3): Input atom coordinates.
box_afloat or int: Box length along x-axis.
box_bfloat or int: Box length along y-axis.
box_cfloat or int: Box length along z-axis.

Returns

new_coordnp.ndarray, shape = (n_atoms, 3): Output atom coordinates.

move_atoms_to_closest_pbc_image(coord, ref_point, box_a, box_b, box_c)

Move the atom coordinates to the periodic image closest to ref_point.

Parameters

coordnp.ndarray, shape = (n_atoms, 3): Input atom coordinates.
ref_pointnp.ndarray, shape = (3,): The reference point coordinate.
box_afloat or int: Box length along x-axis.
box_bfloat or int: Box length along y-axis.
box_cfloat or int: Box length along z-axis.

Returns

new_coordnp.ndarray, shape = (n_atoms, 3): Output atom coordinates.

parse_pdb(pdb_file)

Load pdb file as pandas dataframe. Note there should be only a single frame in the pdb file.

Parameters

pdb_filestr: Path for the pdb file.

Returns

pdb_atomspd.DataFrame: A pandas dataframe includes atom information.

write_pdb(pdb_atoms, pdb_file, write_TER=False)

Write pandas dataframe to pdb file.

Parameters

pdb_atomspd.DataFrame: A pandas dataframe includes atom information.
pdb_filestr: Output path for the pdb file.
write_TERbool: Whether to write TER between two chains.

class TemperatureReplicaExchange(backend, n_replicas, rank, positions, top, system, temperatures, integrator, platform_name='CUDA', properties={'Precision': 'mixed'})

Temperature replica exchange class for performing temperature replica exchange (TRE) simulations.

Note only positions and scaled velocities are exchanged, while other internal states of the simulations are not exchanged. This means the class may not be properly applied to simulations involving other internal states. For example, Nose-Hoover integrator may not work properly as it includes internal chain states.

The parallelization is achieved with torch.distributed.

An example of setting environment variables in a slurm job script:

export WORLD_SIZE=$((${SLURM_NNODES}*${SLURM_NTASKS_PER_NODE}))

export MASTER_PORT=$(expr 30000 + $(echo -n ${SLURM_JOBID} | tail -c 4))

master_addr=$(scontrol show hostnames “${SLURM_JOB_NODELIST}” | head -n 1)

export MASTER_ADDR=${master_addr}

add_reporters(report_interval, report_state=True, report_dcd=True, output_dcd=None)

Add reporters for OpenMM simulation.

Whether to use PBC is read from self.system information.

Parameters

report_intervalint: Report interval.
report_statebool: Whether to report simulation state.
report_dcdbool: Whether to report trajectory as dcd file.
output_dcdstr or None: The output dcd file path. If None, then the output dcd file path is set as output.{self.rank}.dcd.

run_replica_exchange(n_steps, exchange_interval, verbose=True)

Perform replica exchange simulation.

Exchange atom positions and rescaled velocities. Other state variables are not exchanged.

The temperature of each replica remains unchanged.

Parameters

n_stepsint: Total number of simulation steps for each replica.
exchange_intervalint: Exchange interval.
verbosebool: Whether to report exchange acceptance ratio and simulation speed.

find_ca_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)

Find protein CA atom pairs whose residues are in contact.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.
frameint: PDB frame index used for searching pairs.
radiusfloat or int: Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.
bonded_radiusfloat or int: Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.
cutoff: float or int: Cutoff distance for searching neighboring atom pairs.
boxNone or np.ndarray: Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
use_pbcbool: Whether to use periodic boundary condition (PBC).

Returns

df_ca_pairspd.DataFrame: Pandas DataFrame including CA atom pair indices and distances.

find_res_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)

Find native pairs between residues following the shadow algorithm. They algorithm only searches native pairs between residues that do not have 1-2, 1-3, or 1-4 interactions. If two heavy atoms from different residues are in contact, then the two residues are in contact.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.
frameint: PDB frame index used for searching pairs.
radiusfloat or int: Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.
bonded_radiusfloat or int: Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.
cutoff: float or int: Cutoff distance for searching neighboring atom pairs.
boxNone or np.ndarray: Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
use_pbcbool: Whether to use periodic boundary condition (PBC).

Returns

res_pairsnp.ndarray, shape = (n_pairs, 2): Residue pairs found by the shadow algorithm.
df_atomspd.DataFrame: Pandas DataFrame including atom information from file atomistic_pdb.

get_bonded_neighbor_dict(atomistic_pdb)

Find atoms that are directly connected by bonds based on MDTraj.

Parameters

atomistic_pdbstr: Path for the atomistic pdb file.

Returns

bonded_neighbor_dictdict: A dictionary that the value is a list including all the atoms bonded to the atom of index key.

References

https://mdtraj.org/1.9.4/api/generated/mdtraj.Topology.html

get_neighbor_pairs_and_distances(coord, cutoff=0.6, box=None, use_pbc=False)

Find spatially neighboring atom pairs within cutoff range based on MDAnalysis.

Parameters

coordnp.ndarray, shape = (n_atoms, 3): Input atom coordinates.
cutofffloat or int (default = 0.6): Cutoff distance for searching neighboring atom pairs. Note coord and cutoff share the same length unit.
boxNone or np.ndarray: Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
use_pbcbool: Whether to use periodic boundary condition (PBC).

Returns

neighbor_pairsnp.ndarray, shape = (n_pairs, 2): Neighboring atom pairs.
neighbor_pair_distances: np.ndarray, shape = (N,): Neighboring atom pair distances.

References

https://docs.mdanalysis.org/1.1.1/documentation_pages/lib/nsgrid.html

light_is_blocked(d12, d13, d23, r2, r3)

Check if the light from atom1 to atom2 is blocked by atom3.

Parameters

d12: float or int, positive: The distance between atom1 and atom2.
d13: float or int, positive: The distance between atom1 and atom3.
d23: float or int, positive: The distance between atom2 and atom3.
r2: float or int, non-negative: The radius of atom2. Require r2 <= d12.
r3: float or int, non-negative: The radius of atom3. Require r3 <= d13.

Returns

flagbool: Whether the light is blocked.

load_ca_pairs_from_gmx_top(top_file, ca_pdb, frame=0, use_pbc=False)

Load native pairs from GROMACS topology file. Note in GROMACS, atom indices start from 1, while in OpenMM, atom indices start from 0.

Parameters

top_filestr: Path for the GROMACS topology file.
ca_pdbstr: Path for the CA model PDB file.
frame: int: PDB frame index used for computing pair distances.
use_pbcbool: Whether to use periodic boundary condition (PBC).

Returns

df_ca_pairspd.DataFrame: Pandas DataFrame including CA atom pair indices and distances.

insert_molecules(new_pdb, output_pdb, n_mol, radius=0.5, existing_pdb=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)

Insert multiple copies of given molecule into an existing pdb or a new empty box. Currently this function only supports using orthogonal box.

Parameters

new_pdbstr: Path of the new pdb file to be inserted.
output_pdbstr: Path of the output pdb file.
n_molint: Number of copies of the new molecule to be inserted.
radiusfloat or int: Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.
existing_pdbNone or str: Path of an existing pdb file. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_pdb.
max_n_attemptsint: Maximal number of attempts to insert. Ensure this value no smaller than n_mol.
boxlist or numpy array, shape = (3,): New box size in unit nm.
methodstr: Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.
reset_serialbool: Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the final pdb is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.

insert_molecules_dataframe(new_atoms, n_mol, radius=0.5, existing_atoms=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)

Insert multiple copies of given molecule as dataframes into an existing dataframe or a new empty box. Currently this function only supports using orthogonal box.

Parameters

new_atomspd.DataFrame: New atoms of the molecule to be inserted.
n_molint: Number of copies of the new molecule to be inserted.
radiusfloat or int: Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.
existing_atomsNone or pd.DataFrame: Existing atoms. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_atoms.
max_n_attemptsint: Maximal number of attempts to insert. Ensure this value no smaller than n_mol.
boxlist or numpy array, shape = (3,): New box size in unit nm.
methodstr: Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.
reset_serialbool: Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the output dataframe is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.

Returns

atomspd.DataFrame: Output atoms.

parse_stride(stride_file)

Parse stride output file.

Parameters

stride_filestr: The stride output file.

Returns

df_datapd.DataFrame: A dataframe with the secondary structure information.

get_chromatin_rigid_bodies(n_nucl, nrl, n_rigid_bp_per_nucl=73)

Get chromatin rigid bodies. The chromatin should possess uniform linker length without additional linkers on both ends.

Parameters

n_nuclint: Nucleosome number.
nrlint: Nucleosome repeat length.
n_flexible_bp_per_nuclint: The number of flexible nucleosomal base pairs for each nucleosome.

Returns

rigid_bodieslist: List of rigid bodies.

remove_histone_tail_dihedrals(df_dihedrals)

Remove histone tail dihedral potentials from a single histone.

A dihedral potential is removed if at least one atom involved is within histone tail.

Parameters

df_dihedralspd.DataFrame: Dihedral potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).

Returns

new_df_dihedralspd.DataFrame: New dihedral potential.

remove_histone_tail_native_pairs_and_exclusions(df_native_pairs, df_exclusions)

Remove histone tail native pair potentials and corresponding exclusions from a single histone. A native pair potential is removed if at least one atom involved is within histone tail. The corresponding exclusions should also be removed if the native pair is removed. Also remove native pairs between atoms from different histones.

Parameters

df_native_pairspd.DataFrame: Native pair potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).
df_exclusionspd.DataFrame: Nonbonded exclusions.

Returns

new_df_native_pairspd.DataFrame: New native pair potential.
new_df_exclusionspd.DataFrame: New nonbonded exclusions.