Manual
openabc.forcefields
- class CGModel
The general class with general methods that can be inherited by any other CG model classes.
Each child class of CGModel has to set self.bonded_attr_names, which is used to append molecules and set rigid bodies.
- add_plumed(plumed_script_path)
Add OpenMM plumed.
Parameters
- plumed_script_pathstr
Input plumed script path.
Reference
PLUMED website: https://www.plumed.org/
- add_reporters(report_interval, output_dcd='output.dcd', report_dcd=True, report_state_data=True)
Add reporters for OpenMM simulation.
Parameters
- report_intervalint
Report dcd and report state interval.
- output_dcdstr
Output dcd file path.
- report_dcdbool
Whether to output dcd file.
- report_state_databool
Whether to output simulation state and data.
- append_mol(new_mol, verbose=False)
The method can append new molecules by concatenating atoms and bonded interaction information saved in dataframes.
Parameters
- new_mola consistent parser object or CG model object
The object of a new molecule including atom and bonded interaction information.
- verbosebool
Whether to report the appended attributes.
- atoms_to_pdb(cg_pdb, reset_serial=True)
Save atoms to pdb as topology will be read from the pdb file. PDB file is required for OpenMM simulation.
Parameters
- cg_pdbstr
Output path for CG PDB file.
- reset_serialbool
Reset serial to 1, 2, …
- create_system(top, use_pbc=True, box_a=500, box_b=500, box_c=500, remove_cmmotion=True)
Create OpenMM system for simulation. Need to further add forces to this OpenMM system.
Parameters
- topOpenMM Topology
The OpenMM topology.
- use_pbcbool
Whether to use periodic boundary condition (PBC).
- box_afloat or int
Box length along x-axis.
- box_bfloat or int
Box length along y-axis.
- box_cfloat or int
Box length along z-axis.
- remove_cmmotionbool
Whether to add CMMotionRemover to remove center of mass motions.
- move_COM_to_box_center()
Move center of mass (COM) to simulation box center.
- save_state(state_xml='state.xml')
Save simulation state to readable xml format.
Parameters
- state_xmlstr
Output state xml file path.
- save_system(system_xml='system.xml')
Save system to readable xml format.
Parameters
- system_xmlstr
Output system xml file path.
- set_rigid_bodies(rigid_coord, rigid_bodies, keep_unchanged=[])
Set rigid bodies and remove bonded interactions within the same rigid body.
Directly run createRigidBodies if the user does not want to change any bonded interactions or exclusions.
Parameters
- rigid_coordQuantity, shape = (n_atoms, 3)
Atom coordinates to build rigid body restrictions.
- rigid_bodieslist
A list includes many sublists. Atoms in each sublist are recognized as one rigid body.
- keep_unchangedlist
A list including attribute names that user intends to keep unchanged.
- set_simulation(integrator, platform_name='CPU', properties={'Precision': 'mixed'}, init_coord=None)
Set OpenMM simulation.
Parameters
- integratorOpenMM Integrator
OpenMM integrator.
- platform_namestr
OpenMM simulation platform name. It can be one from: Reference or CPU or CUDA or OpenCL.
- propertiesdict
OpenMM simulation platform properties.
- init_coordNone or array-like
Initial coordinate.
- class MOFFMRGModel
Bases:
CGModel
A class for MOFF+MRG model that represents a mixture of MOFF proteins and MRG DNA.
- add_all_default_forces()
Add all the forces with default settings.
- add_contacts(eta=Quantity(value=0.7, unit=/angstrom), r0=Quantity(value=8, unit=angstrom), cutoff=Quantity(value=2.0, unit=nanometer), alpha_protein_dna=0.0016264, alpha_dna_dna=1.678e-05, epsilon_protein_dna=0, epsilon_dna_dna=0, force_group=8)
Add nonbonded contacts for MOFF protein and MRG DNA.
For amino acids, the CA atom type indices are 0-19, and CG nucleotide atom type index is 20.
The potential expression is: alpha/r^12 - 0.5*epsilon*(1 + tanh(eta*(r0 - r)))
Parameters
- etaQuantity
Parameter eta.
- r0Quantity
Parameter r0.
- cutoffQuantity
Cutoff distance
- alpha_protein_dnafloat or int
Protein-DNA interaction parameter alpha.
- alpha_dna_dnafloat or int
DNA-DNA interaction parameter alpha.
- epsilon_protein_dnafloat or int
Protein-DNA interaction parameter epsilon.
- epsilon_dna_dnafloat or int
DNA-DNA interaction parameter epsilon.
- force_groupint
Force group.
- add_elec_switch(salt_conc=Quantity(value=150.0, unit=millimolar), temperature=Quantity(value=300.0, unit=kelvin), cutoff1=Quantity(value=1.2, unit=nanometer), cutoff2=Quantity(value=1.5, unit=nanometer), switch_coeff=[1, 0, 0, -10, 15, -6], add_native_pair_elec=True, force_group=9)
Add electrostatic interaction with switch function.
The switch function switches potential to zero within range cutoff1 < r <= cutoff2.
Parameters
- salt_concQuantity
Monovalent salt concentration.
- temperatureQuantity
Temperature.
- cutoff1Quantity
Cutoff distance 1.
- cutoff2Quantity
Cutoff distance 2.
- switch_coeffsequence-like
Switch function coefficients.
- add_native_pair_elecbool
Whether to add electrostatic interactions between native pairs.
- force_groupint
Force group.
- add_protein_angles(threshold=2.2689280275926285, clip=False, force_group=2, verbose=True)
Add protein angles.
Note that the angle potential is a harmonic angle potential, which may lead to unstable simulation if harmonic potential center theta0 is too large.
Based on some tests, theta0 <= 130 degrees (130*np.pi/180 radians) can facilitate 10 fs timestep.
Parameters
- thresholdfloat or int
The suggested upper limit value of theta0.
- clipbool
Whether to change theta0 values larger than threshold to threshold. If True, all the theta0 values larger than threshold will be changed to threshold. If False, do not change theta0 values.
- force_groupint
Force group.
- verbose: bool
Whether to show warnings if theta0 is larger than the threshold.
- class HPSModel
Bases:
CGModel
The class for HPS model that represents a mixture of HPS model proteins.
This class inherits CGModel class.
- add_all_default_forces()
Add all the forces with default settings.
- add_contacts(hydropathy_scale='Urry', epsilon=0.8368000000000001, mu=1, delta=0.08, force_group=2)
Add nonbonded contacts.
The raw hydropathy scale is scaled and shifted by: mu*lambda - delta
Parameters
- hydropathy_scalestr
Hydropathy scale, can be KR or Urry.
- epsilonfloat or int
Contact strength.
- mufloat or int
Hydropathy scale factor.
- deltafloat or int
Hydropathy shift factor.
- force_groupint
Force group.
- add_dh_elec(ldby=Quantity(value=1, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=3)
Add Debye-Huckel electrostatic interactions.
Parameters
- ldbyQuantity
Debye length.
- dielectric_waterfloat or int
Dielectric constant of water.
- cutoffQuantity
Cutoff distance.
- force_groupint
Force group.
- class MpipiModel
Bases:
CGModel
The class for Mpipi protein and RNA models. This class inherits CGModel class. We follow the parameters provided in the Mpipi LAMMPS input file.
- add_contacts(cutoff_to_sigma_ratio=3, force_group=3)
Add nonbonded contacts, which is of Wang-Frenkel functional form.
Parameters
- add_dh_elec(ldby=Quantity(value=0.7936507936507936, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=4)
Add Debye-Huckel electrostatic interactions.
Parameters
- ldbyQuantity
Debye length.
- dielectric_waterfloat or int
Dielectric constant of water.
- cutoffQuantity
Cutoff distance.
- force_groupint
Force group.
openabc.forcefields.parsers
- class MOFFParser(atomistic_pdb, ca_pdb, default_parse=True)
MOFF protein parser.
- classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)
Initialize a MOFF model protein from atomistic pdb.
Parameters
- atomistic_pdbstr
Atomistic pdb file path.
- ca_pdbstr
CA pdb file path.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
Returns
- resultclass instance
A class instance.
- parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)
Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- exclude14bool
Whether to exclude nonbonded interactions between 1-4 atom pairs.
- exclude_native_pairsbool
Whether to exclude nonbonded interactions between native pairs.
- parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.25, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, ss=None, ordered_ss_names=['H', 'E'])
Parse molecule.
Parameters
- get_native_pairsbool
Whether to get native pairs from atomistic pdb with shadow algorithm.
- frameint
Frame index to compute native structure parameters for both CG and atomistic models.
- radiusfloat or int
Shadow algorithm radius.
- bonded_radiusfloat or int
Shadow algorithm bonded radius.
- cutofffloat or int
Shadow algorithm cutoff.
box : None or np.ndarray Specify box shape. Note setting use_pbc as False is only compatible with orthogonal box. If use_pbc is False, then set box as None. If use_pbc is True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
- use_pbcbool
Whether to use periodic boundary condition (PBC) for searching native pairs.
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- exclude14bool
Whether to exclude nonbonded interactions between 1-4 atom pairs.
- exclude_native_pairsbool
Whether to exclude nonbonded interactions between native pairs.
- mass_dictdict
Mass dictionary.
- charge_dictdict
Charge dictionary.
- ssNone or sequence-like
Secondary structure type of each residue. If None, no secondary structure information is provided and all the native pairs are kept. If not None, then only native pairs within continuous ordered domains are kept.
- ordered_ss_namessequence-like
The names of ordered secondary structures. The default value is [‘H’, ‘E’], where ‘H’ represents alpha-helix, and ‘E’ represents beta strand.
- class MRGdsDNAParser(cg_pdb, default_parse=True)
MRG dsDNA parser. Note this parser only works on one dsDNA!
- classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, write_TER=False, default_parse=True)
Initialize an MRG model dsDNA from atomistic pdb.
Parameters
- atomistic_pdbstr
Path for the atomistic dsDNA pdb file.
- cg_pdbstr
Output path for the CG dsDNA pdb file.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
Returns
- resultclass instance
A class instance.
- parse_exclusions(exclude12=True, exclude13=True)
Parse nonbonded exclusions based on bonds and angles. Note nonbonded interactions are not excluded for atom pairs with fan bonds.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- parse_mol(exclude12=True, exclude13=True, bonded_energy_scale=0.9, mass=325, charge=-1)
Parse molecule.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- bonded_energy_scalefloat or int
Scale factor for all the bonded energies.
- massfloat or int
Mass of CG nucleotide bead.
- chargefloat or int
Charge of each CG nucleotide bead.
- class HPSParser(ca_pdb, default_parse=True)
HPS protein parser.
- classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)
Initialize an HPS model protein from atomistic pdb.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
- ca_pdbstr
Output path for the CA pdb file.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
Returns
- resultclass instance
A class instance.
- parse_mol(exclude12=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.5, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0})
Parse molecule.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atoms.
- mass_dictdict
Mass dictionary.
- charge_dictdict
Charge dictionary.
- class MpipiProteinParser(ca_pdb, default_parse=True)
Mpipi protein parser.
- classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)
Initialize an Mpipi model protein from atomistic pdb.
Parameters
- atomistic_pdbstr
Atomistic pdb file path.
- ca_pdbstr
Output CA pdb file path.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
- parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})
Parse molecule.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atoms.
- mass_dictdict
Mass dictionary.
- charge_dictdict
Charge dictionary.
- class MpipiRNAParser(cg_pdb, default_parse=True)
Mpipi RNA parser.
- classmethod from_sequence(seq, cg_pdb, write_TER=False, default_parse=True)
Initialize an Mpipi model ssRNA from sequence. The initial structure is a straight chain.
Parameters
- seqstr or sequence-like
Input RNA sequence.
- cg_pdbstr
Output CG RNA pdb file path.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
Returns
- resultclass instance
A class instance.
- parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})
Parse molecule.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- mass_dictdict
Mass dictionary.
- chargefloat or int
Nucleotide charge.
- class SMOGParser(atomistic_pdb, ca_pdb, default_parse=True)
SMOG protein parser.
- classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)
Initialize an HPS model protein from atomistic pdb.
Parameters
- atomistic_pdbstr
Path for the input atomistic pdb file.
- ca_pdbstr
Path for the output CA pdb file.
- write_TERbool
Whether to write TER between two chains.
- default_parsebool
Whether to parse with default settings.
- parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)
Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.
Parameters
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- exclude14bool
Whether to exclude nonbonded interactions between 1-4 atom pairs.
- exclude_native_pairsbool
Whether to exclude nonbonded interactions between native pairs.
- parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.0788, 'ARG': 156.1875, 'ASN': 114.1038, 'ASP': 115.0886, 'CYS': 103.1388, 'GLN': 128.1307, 'GLU': 129.1155, 'GLY': 57.0519, 'HIS': 137.1411, 'ILE': 113.1594, 'LEU': 113.1594, 'LYS': 128.1741, 'MET': 131.1926, 'PHE': 147.1766, 'PRO': 97.1167, 'SER': 87.0782, 'THR': 101.1051, 'TRP': 186.2132, 'TYR': 163.176, 'VAL': 99.1326}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.0, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, bonded_energy_scale=2.5)
Parse molecule.
Parameters
- get_native_pairsbool
Whether to get native pairs from atomistic pdb with shadow algorithm.
- frameint
Frame index to compute native configuration parameters for both CG and atomistic models.
- radiusfloat or int
Shadow algorithm radius.
- bonded_radiusfloat or int
Shadow algorithm bonded radius.
- cutofffloat or int
Shadow algorithm cutoff.
box : None or np.ndarray Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
- use_pbcbool
Whether to use periodic boundary condition (PBC) for searching native pairs.
- exclude12bool
Whether to exclude nonbonded interactions between 1-2 atom pairs.
- exclude13bool
Whether to exclude nonbonded interactions between 1-3 atom pairs.
- exclude14bool
Whether to exclude nonbonded interactions between 1-4 atom pairs.
- exclude_native_pairsbool
Whether to exclude nonbonded interactions between native pairs.
- mass_dictdict
Mass dictionary.
- charge_dictdict
Charge dictionary.
- bonded_energy_scalefloat or int
Bonded energy additional scale factor.
- class DNA3SPN2Parser(dna_type='B_curved')
DNA 3SPN2 parser. For B-curved DNA, the parser works best for a single strand ssDNA or WC-paired dsDNA. Please ensure the parsed DNA has unique chainID for each chain.
- static aa_to_cg(aa_atoms, PSB_order=True)
Convert DNA all-atom structure to CG structure. Both input and output structures are saved as pandas dataframe. Notably, this function gives each residue in the output CG model a unique resSeq (index starts from 0). Using unique resSeq can be helpful when comparing with x3dna built template as chainID may be different. This code is long and specific to 3SPN2 model, so we put it here instead of in helper_functions.py.
Parameters
- aa_atomspd.DataFrame
Input all-atom structure.
- PSB_orderbool
Whether to ensure CG atom order in each nucleotide is P-S-B.
Returns
- cg_atomspd.DataFrame
Output CG structure.
- build_x3dna_template(temp_name='dna')
Build template DNA structure with x3dna. We do not need to specify whether to use PSB order, since template atom order is aligned with self.atoms. If self.atoms is one dsDNA molecule and the target sequence is W-C paired, x3dna input sequence is the sequence of the first ssDNA. Else, use the full sequence of the DNA as x3dna input sequence.
Parameters
- temp_namestr
Name for built template files.
Returns
- temp_atomspd.DataFrame
X3DNA built CG DNA template structure.
- static change_sequence(cg_atoms, sequence)
Change DNA sequence. Note sequence includes the sequence of all the ssDNA chains.
Parameters
- cg_atomspd.DataFrame
DNA CG structure.
- sequencestr
New DNA sequence.
Returns
- new_cg_atomspd.DataFrame
DNA CG structure with the new sequence.
- classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, PSB_order=True, new_sequence=None, dna_type='B_curved', default_parse=True, temp_from_x3dna=True, temp_name='dna', input_temp=None)
Create object from atomistic pdb file. Ensure each chain in input pdb_file has unique chainID. If a new sequence is provided, then the CG DNA topology is from input pdb file, but using the new sequence.
Parameters
- atomistic_pdbstr
Path for the input atomistic pdb file.
- cg_pdbstr
Path for the output CG pdb file.
- PSB_orderbool
Whether to ensure CG atom order in each nucleotide is P-S-B.
- new_sequenceNone or str
The new full DNA sequence. If None, keep the original sequence.
- dna_typestr
DNA type.
- default_parsebool
Whether to parse molecule with default settings.
- temp_from_x3dnabool
Whether to use template built by x3dna. If False, use the input pdb file as the template.
- temp_namestr
Template file names.
- input_tempNone or pd.DataFrame
Additional input template.
- get_sequence()
Get all ssDNA sequence from CG atoms as a string. Note each chain has to possess a unique chainID.
Returns
- sequencestr
DNA sequence.
- get_sequence_list()
Get all ssDNA sequence from self.atoms as a list. Note each chain has to possess a unique chainID.
Returns
- sequence_listlist
A list including multiple strings. Each string is the sequence of one chain.
- static get_sequence_list_from_cg_atoms(cg_atoms)
Get all ssDNA sequence from CG atoms as a list. Since we use groupby to group by chainID, each chain has to possess a unique chainID.
Parameters
- cg_atomspd.DataFrame
CG DNA atoms.
Returns
- sequence_listlist
A list including multiple strings. Each string is the sequence of one chain.
- parse_mol(temp_from_x3dna=True, temp_name='dna', input_temp=None)
Parse molecule with given template.
Parameters
- temp_from_x3dnabool
Whether to use x3dna template.
- temp_namestr
Name for x3dna built template files.
- input_tempNone or pd.DataFrame
Additional input template. Use input_temp only if (not ((self.dna_type == ‘B_curved’) and temp_from_x3dna)) and (input_temp is not None).
openabc.utils
- align_protein(input_pdbfile, reference_pdbfile, output_pdbfile, reference_atoms='CA')
Align a protein with the reference protein accoding to positions of alpha carbons.
Parameters
- input_pdbfilestr
Input pdb file to be aligned.
- reference_pdbfilestr
Pdb file that be used as reference. Note ‘input_pdbfile’ and ‘reference_pdbfile’ must represent the same system.
- output_pdbfilestr
Aligned pdb file.
- reference_atomsstr, default=’CA’
Name of the atoms that used as reference atoms. Currently only support alpha carbons.
References
Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.
- combine_proteins(pdbfile_lis, output_pdbflie)
Merge multiple pdb files into a single pdb file.
Parameters
- pdbfile_lisstr
List that contains name of pdb files to be merged.
- output_pdbfilestr
Merged pdb file.
- multiple_chains_CA2AA(input_pdbfile, num_chains, num_residues, REMO_path, debug=False)
Reconstruct all-atom representation of a multiple-chain protein from its alpha carbons. If the name of input file is ‘XXX.pdb’, then output will be saved as ‘XXX_AA.pdb’ in the folder where ‘XXX.pdb’ is located.
Parameters
- input_pdbfilestr
Input pdb file.
- num_chainslist
Number of protein chains in the system. Each element of the list represent one type of protein.
- num_residueslist
Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.
- REMO_pathstr
Path where REMO program locates.
- debugbool, default=False
If True, temporary files will not be deleted.
References
Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.
- single_chain_CA2AA(input_pdbfile, output_pdbfile, REMO_path, debug=False)
Reconstruct all-atom representation for a single-chain protein from its alpha carbons using REMO.
Parameters
- input_pdbfilestr
Input pdb file that contains one protein chain.
- output_pdbfilestr
output pdb file.
- REMO_pathstr
Path where REMO program locates.
- debugbool, default=False
If True, temporary files will not be deleted.
References
Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.
- split_protein(input_pdbfile, num_chains, num_residues, output_path)
Split a PDB file containing multiple chains into individual files, each containing one chain.
Parameters
- input_pdbfilestr
Input pdb file.
- num_chainslist
Number of protein chains in the system. Each element of the list represent one type of protein.
- num_residueslist
Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.
- output_pathstr
The folder where user wants to save output pdb files.
- atomistic_pdb_to_ca_pdb(atomistic_pdb, ca_pdb, write_TER=False)
Convert atomistic pdb to protein CA pdb.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
- ca_pdbstr
Output path for the CA pdb file.
- write_TERbool
Whether to write TER between two chains.
- atomistic_pdb_to_nucleotide_pdb(atomistic_pdb, cg_nucleotide_pdb, write_TER=False)
Convert DNA or RNA atomistic pdb to nucleotide pdb (i.e. one CG bead per nucleotide). The position of each CG nucleotide bead is the geometric center of all the nucleotide atoms in the pdb.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
- cg_nucleotide_pdbstr
Output path for the CG pdb file.
- write_TERbool
Whether to write TER between two chains.
- build_straight_CA_chain(sequence, r0=0.38)
Build a straight portein CA atom chain with given sequence.
Parameters
- sequencestr
Protein chain sequence (1 letter for each amino acid).
- r0: float or int
Distance in unit nm between two neighboring CA atoms.
Returns
- df_atomspd.DataFrame
A pandas dataframe includes atom information.
- build_straight_chain(n_atoms, chainID, r0)
Build a straight chain. Each atom is viewed as one residue.
Parameters
- n_atomsint
The number of atoms along the chain.
- chainIDfloar or int or str
Chain ID.
- r0float or int
Distance in unit nm between neighboring atoms along the chain.
Returns
- df_atomspd.DataFrame
A pandas dataframe includes atom information.
- compute_PE(traj, system, platform_name='Reference', properties={'Precision': 'double'}, groups=-1)
Compute the potential energy of samples in the trajectory with the given openmm system.
Parameters
- trajmdtraj.Trajectory
The trajectory including all the samples.
- systemopenmm.System
The openmm system.
- platform_namestr
The platform name for running openmm to evaulate energy.
- propertiesdict
Properties for running openmm. Only used when platform_name is CUDA or OpenCL.
- groupsint or set
The force groups included when computing energy. See openmm context.getState for details about this parameter.
Returns
- energynp.ndarray, shape = (traj.n_frames,)
The energy of each sample in the trajectory in unit kJ/mol.
- compute_rg(coord, mass)
Compute radius of gyration.
Parameters
- coordnp.ndarray, shape = (n_atoms, 3)
Input atom coordinates.
- massnp.ndarray, shape = (n_atoms,)
Input atom mass.
Returns
- rgfloat
Radius of gyration.
- fix_pdb(pdb_file)
Fix pdb file with pdbfixer. The fixed structure is output as pandas dataframe. This function is adapted from open3SPN2.
Parameters
- pdb_filestr
PDB file path.
Returns
- atomspd.DataFrame
Output fixed structure.
- get_WC_paired_sequence(sequence)
Get WC-paired sequence.
Parameters
- sequencestr
Input sequence.
Returns
- paired_sequencestr
W-C paired sequence.
- make_mol_whole(coord, box_a, box_b, box_c)
Use this function to make the molecule whole. This is useful to recover the whole molecule if it is split by box boundary. The position of the first atom is kept. The other atoms are moved so that each one is in the closest periodic image relative to the previous one.
Parameters
- coordnp.ndarray, shape = (n_atoms, 3)
Input atom coordinates.
- box_afloat or int
Box length along x-axis.
- box_bfloat or int
Box length along y-axis.
- box_cfloat or int
Box length along z-axis.
Returns
- new_coordnp.ndarray, shape = (n_atoms, 3)
Output atom coordinates.
- move_atoms_to_closest_pbc_image(coord, ref_point, box_a, box_b, box_c)
Move the atom coordinates to the periodic image closest to ref_point.
Parameters
- coordnp.ndarray, shape = (n_atoms, 3)
Input atom coordinates.
- ref_pointnp.ndarray, shape = (3,)
The reference point coordinate.
- box_afloat or int
Box length along x-axis.
- box_bfloat or int
Box length along y-axis.
- box_cfloat or int
Box length along z-axis.
Returns
- new_coordnp.ndarray, shape = (n_atoms, 3)
Output atom coordinates.
- parse_pdb(pdb_file)
Load pdb file as pandas dataframe. Note there should be only a single frame in the pdb file.
Parameters
- pdb_filestr
Path for the pdb file.
Returns
- pdb_atomspd.DataFrame
A pandas dataframe includes atom information.
- write_pdb(pdb_atoms, pdb_file, write_TER=False)
Write pandas dataframe to pdb file.
Parameters
- pdb_atomspd.DataFrame
A pandas dataframe includes atom information.
- pdb_filestr
Output path for the pdb file.
- write_TERbool
Whether to write TER between two chains.
- class TemperatureReplicaExchange(backend, n_replicas, rank, positions, top, system, temperatures, integrator, platform_name='CUDA', properties={'Precision': 'mixed'})
Temperature replica exchange class for performing temperature replica exchange (TRE) simulations.
Note only positions and scaled velocities are exchanged, while other internal states of the simulations are not exchanged. This means the class may not be properly applied to simulations involving other internal states. For example, Nose-Hoover integrator may not work properly as it includes internal chain states.
The parallelization is achieved with torch.distributed.
An example of setting environment variables in a slurm job script:
export WORLD_SIZE=$((${SLURM_NNODES}*${SLURM_NTASKS_PER_NODE}))
export MASTER_PORT=$(expr 30000 + $(echo -n ${SLURM_JOBID} | tail -c 4))
master_addr=$(scontrol show hostnames “${SLURM_JOB_NODELIST}” | head -n 1)
export MASTER_ADDR=${master_addr}
- add_reporters(report_interval, report_state=True, report_dcd=True, output_dcd=None)
Add reporters for OpenMM simulation.
Whether to use PBC is read from self.system information.
Parameters
- report_intervalint
Report interval.
- report_statebool
Whether to report simulation state.
- report_dcdbool
Whether to report trajectory as dcd file.
- output_dcdstr or None
The output dcd file path. If None, then the output dcd file path is set as output.{self.rank}.dcd.
- run_replica_exchange(n_steps, exchange_interval, verbose=True)
Perform replica exchange simulation.
Exchange atom positions and rescaled velocities. Other state variables are not exchanged.
The temperature of each replica remains unchanged.
Parameters
- n_stepsint
Total number of simulation steps for each replica.
- exchange_intervalint
Exchange interval.
- verbosebool
Whether to report exchange acceptance ratio and simulation speed.
- find_ca_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)
Find protein CA atom pairs whose residues are in contact.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
- frameint
PDB frame index used for searching pairs.
- radiusfloat or int
Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.
- bonded_radiusfloat or int
Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.
- cutoff: float or int
Cutoff distance for searching neighboring atom pairs.
- boxNone or np.ndarray
Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
- use_pbcbool
Whether to use periodic boundary condition (PBC).
Returns
- df_ca_pairspd.DataFrame
Pandas DataFrame including CA atom pair indices and distances.
- find_res_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)
Find native pairs between residues following the shadow algorithm. They algorithm only searches native pairs between residues that do not have 1-2, 1-3, or 1-4 interactions. If two heavy atoms from different residues are in contact, then the two residues are in contact.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
- frameint
PDB frame index used for searching pairs.
- radiusfloat or int
Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.
- bonded_radiusfloat or int
Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.
- cutoff: float or int
Cutoff distance for searching neighboring atom pairs.
- boxNone or np.ndarray
Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
- use_pbcbool
Whether to use periodic boundary condition (PBC).
Returns
- res_pairsnp.ndarray, shape = (n_pairs, 2)
Residue pairs found by the shadow algorithm.
- df_atomspd.DataFrame
Pandas DataFrame including atom information from file atomistic_pdb.
- get_bonded_neighbor_dict(atomistic_pdb)
Find atoms that are directly connected by bonds based on MDTraj.
Parameters
- atomistic_pdbstr
Path for the atomistic pdb file.
Returns
- bonded_neighbor_dictdict
A dictionary that the value is a list including all the atoms bonded to the atom of index key.
References
- get_neighbor_pairs_and_distances(coord, cutoff=0.6, box=None, use_pbc=False)
Find spatially neighboring atom pairs within cutoff range based on MDAnalysis.
Parameters
- coordnp.ndarray, shape = (n_atoms, 3)
Input atom coordinates.
- cutofffloat or int (default = 0.6)
Cutoff distance for searching neighboring atom pairs. Note coord and cutoff share the same length unit.
- boxNone or np.ndarray
Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).
- use_pbcbool
Whether to use periodic boundary condition (PBC).
Returns
- neighbor_pairsnp.ndarray, shape = (n_pairs, 2)
Neighboring atom pairs.
- neighbor_pair_distances: np.ndarray, shape = (N,)
Neighboring atom pair distances.
References
https://docs.mdanalysis.org/1.1.1/documentation_pages/lib/nsgrid.html
- light_is_blocked(d12, d13, d23, r2, r3)
Check if the light from atom1 to atom2 is blocked by atom3.
Parameters
- d12: float or int, positive
The distance between atom1 and atom2.
- d13: float or int, positive
The distance between atom1 and atom3.
- d23: float or int, positive
The distance between atom2 and atom3.
- r2: float or int, non-negative
The radius of atom2. Require r2 <= d12.
- r3: float or int, non-negative
The radius of atom3. Require r3 <= d13.
Returns
- flagbool
Whether the light is blocked.
- load_ca_pairs_from_gmx_top(top_file, ca_pdb, frame=0, use_pbc=False)
Load native pairs from GROMACS topology file. Note in GROMACS, atom indices start from 1, while in OpenMM, atom indices start from 0.
Parameters
- top_filestr
Path for the GROMACS topology file.
- ca_pdbstr
Path for the CA model PDB file.
- frame: int
PDB frame index used for computing pair distances.
- use_pbcbool
Whether to use periodic boundary condition (PBC).
Returns
- df_ca_pairspd.DataFrame
Pandas DataFrame including CA atom pair indices and distances.
- insert_molecules(new_pdb, output_pdb, n_mol, radius=0.5, existing_pdb=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)
Insert multiple copies of given molecule into an existing pdb or a new empty box. Currently this function only supports using orthogonal box.
Parameters
- new_pdbstr
Path of the new pdb file to be inserted.
- output_pdbstr
Path of the output pdb file.
- n_molint
Number of copies of the new molecule to be inserted.
- radiusfloat or int
Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.
- existing_pdbNone or str
Path of an existing pdb file. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_pdb.
- max_n_attemptsint
Maximal number of attempts to insert. Ensure this value no smaller than n_mol.
- boxlist or numpy array, shape = (3,)
New box size in unit nm.
- methodstr
Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.
- reset_serialbool
Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the final pdb is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.
- insert_molecules_dataframe(new_atoms, n_mol, radius=0.5, existing_atoms=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)
Insert multiple copies of given molecule as dataframes into an existing dataframe or a new empty box. Currently this function only supports using orthogonal box.
Parameters
- new_atomspd.DataFrame
New atoms of the molecule to be inserted.
- n_molint
Number of copies of the new molecule to be inserted.
- radiusfloat or int
Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.
- existing_atomsNone or pd.DataFrame
Existing atoms. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_atoms.
- max_n_attemptsint
Maximal number of attempts to insert. Ensure this value no smaller than n_mol.
- boxlist or numpy array, shape = (3,)
New box size in unit nm.
- methodstr
Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.
- reset_serialbool
Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the output dataframe is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.
Returns
- atomspd.DataFrame
Output atoms.
- parse_stride(stride_file)
Parse stride output file.
Parameters
- stride_filestr
The stride output file.
Returns
- df_datapd.DataFrame
A dataframe with the secondary structure information.
- get_chromatin_rigid_bodies(n_nucl, nrl, n_rigid_bp_per_nucl=73)
Get chromatin rigid bodies. The chromatin should possess uniform linker length without additional linkers on both ends.
Parameters
- n_nuclint
Nucleosome number.
- nrlint
Nucleosome repeat length.
- n_flexible_bp_per_nuclint
The number of flexible nucleosomal base pairs for each nucleosome.
Returns
- rigid_bodieslist
List of rigid bodies.
- remove_histone_tail_dihedrals(df_dihedrals)
Remove histone tail dihedral potentials from a single histone.
A dihedral potential is removed if at least one atom involved is within histone tail.
Parameters
- df_dihedralspd.DataFrame
Dihedral potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).
Returns
- new_df_dihedralspd.DataFrame
New dihedral potential.
- remove_histone_tail_native_pairs_and_exclusions(df_native_pairs, df_exclusions)
Remove histone tail native pair potentials and corresponding exclusions from a single histone. A native pair potential is removed if at least one atom involved is within histone tail. The corresponding exclusions should also be removed if the native pair is removed. Also remove native pairs between atoms from different histones.
Parameters
- df_native_pairspd.DataFrame
Native pair potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).
- df_exclusionspd.DataFrame
Nonbonded exclusions.
Returns
- new_df_native_pairspd.DataFrame
New native pair potential.
- new_df_exclusionspd.DataFrame
New nonbonded exclusions.