neurosnap package

Submodules

neurosnap.api module

class neurosnap.api.NeurosnapAPI(api_key)[source]

Bases: object

BASE_URL = 'https://neurosnap.ai/api'
delete_job_share(job_id)[source]

Disables the sharing feature of a job and makes the job private.

Parameters:

job_id (str) – The ID of the job to be made private.

Return type:

None

get_job_file(job_id, file_type, file_name, save_path, share_id=None)[source]

Fetches a specific file from a completed Neurosnap job and saves it to the specified path.

Parameters:
  • job_id (str) – The ID of the job.

  • file_type (str) – The type of file to fetch.

  • file_name (str) – The name of the specific file to fetch.

  • save_path (str) – The path where the file content will be saved.

  • share_id (str) – The share ID, if any.

Return type:

Tuple[str, bool]

Returns:

Tuple of the form (save_path, download_succeeded)

  • save_path: The path where the file is saved.

  • download_succeeded: True if the file was downloaded successfully, False otherwise.

Raises:

HTTPError – If the API request fails.

get_job_files(job_id, file_type, share_id=None, format_type=None)[source]

Fetches all files from a completed Neurosnap job and optionally prints them.

Parameters:
  • job_id (str) – The ID of the job.

  • file_type (str) – The type of files to fetch.

  • share_id (str) – The share ID, if any.

  • format_type (Optional[str]) –

    • “table”: Prints the files in a tabular format.

    • ”json”: Prints the files in formatted JSON.

    • None (default): No printing.

Return type:

List[str]

Returns:

A list of file names from the job.

Raises:

HTTPError – If the API request fails.

get_job_status(job_id)[source]

Fetches the status of a specified job.

Parameters:

job_id (str) – The ID of the job.

Return type:

Dict

Returns:

The status of the job.

Raises:

HTTPError – If the API request fails.

get_jobs(format_type=None)[source]

Fetches and returns a list of submitted jobs. Optionally prints the jobs.

Parameters:

format_type (Optional[str]) –

  • “table”: Prints jobs in tabular format.

  • ”json”: Prints jobs as formatted JSON.

  • None (default): No printing.

Return type:

List[Dict]

Returns:

Submitted jobs as a list of dictionaries.

Raises:

HTTPError – If the API request fails.

get_services(format_type=None)[source]

Fetches and returns a list of available Neurosnap services. Optionally prints the services.

Parameters:

format_type (Optional[str]) –

  • “table”: Prints services in a tabular format with key fields.

  • ”json”: Prints services as formatted JSON.

  • None (default): No printing.

Return type:

List[Dict]

Returns:

A list of dictionaries representing available services.

Raises:

HTTPError – If the API request fails.

get_team_info(format_type=None)[source]

Fetches your team’s information if you are part of a Neurosnap Team.

Parameters:

format_type (Optional[str]) – The format to print the response: ‘table’, ‘json’, or None for no output.

Return type:

Dict

Returns:

The team information.

Raises:

HTTPError – If the API request fails.

get_team_jobs(format_type=None)[source]

Fetches all the jobs submitted by all members of your Neurosnap Team.

Parameters:

format_type (Optional[str]) – The format to print the response: ‘table’, ‘JSON’, or None for no output.

Return type:

List[Dict]

Returns:

A list of jobs submitted by the team members.

Raises:

HTTPError – If the API request fails.

set_job_note(job_id, note)[source]

Set a note for a submitted job.

Parameters:
  • job_id (str) – The ID of the job for which the note will be set.

  • note (str) – The note to be associated with the job.

Return type:

None

set_job_share(job_id)[source]

Enables the sharing feature of a job and makes it public.

Parameters:

job_id (str) – The ID of the job to be made public.

Return type:

Dict

Returns:

The JSON response containing the share ID.

submit_job(service_name, files, data)[source]

Submit a Neurosnap job.

Parameters:
  • service_name (str) – The name of the service to run.

  • files (Dict[str, str]) – A dictionary mapping file names to file paths.

  • data (Dict[str, str]) – A dictionary of additional data to be passed to the service.

Return type:

Dict

Returns:

The job ID of the submitted job.

Raises:

HTTPError – If the API request fails.

neurosnap.chemicals module

Provides functions and classes related to processing chemical data.

neurosnap.chemicals.fetch_ccd(ccd_code, fpath)[source]

Fetches the ideal SDF (Structure Data File) for a given CCD (Chemical Component Dictionary) code and saves it to the specified file path.

This function retrieves the idealized structure of a chemical component from the RCSB Protein Data Bank (PDB) by downloading the corresponding SDF file. The downloaded file is then saved to the specified location.

Parameters:
  • ccd_code (str) – The three-letter CCD code representing the chemical component (e.g., “ATP”).

  • fpath (str) – The file path where the downloaded SDF file will be saved.

Raises:
  • HTTPError – If the request to fetch the SDF file fails (e.g., 404 or connection error).

  • IOError – If there is an issue saving the SDF file to the specified file path.

Example

>>> fetch_ccd("ATP", "ATP_ideal.sdf")
Fetches the ideal SDF file for the ATP molecule and saves it as "ATP_ideal.sdf".
External Resources:
neurosnap.chemicals.get_ccds(fpath='~/.cache/ccd_codes.json')[source]

Retrieves a set of all CCD (Chemical Component Dictionary) codes from the PDB.

This function checks for a locally cached JSON file with the CCD codes. - If the file exists, it reads and returns the set of codes from the cache. - If the file does not exist, it downloads the full Chemical Component Dictionary

(in mmCIF format) from the Protein Data Bank (PDB), extracts the CCD codes, and caches them in a JSON file for future use.

Parameters:

fpath (str) – The path to store / cache all the stored ccd_codes as a JSON file. Default is “~/.cache/ccd_codes.json”

Returns:

A set of all CCD codes (three-letter codes representing small molecules,

ligands, and post-translational modifications).

Return type:

set

Raises:
  • HTTPError – If the request to the PDB server fails.

  • JSONDecodeError – If the cached JSON file is corrupted.

File Cache:
  • Cached file path: “.cache/ccd_codes.json”

  • The cache is automatically updated if it does not exist.

External Resources:
neurosnap.chemicals.sdf_to_smiles(fpath)[source]

Converts molecules in an SDF file to SMILES strings.

Reads an input SDF file and extracts SMILES strings from its molecules. Invalid or unreadable molecules are skipped, with warnings logged.

Parameters:

fpath (str) – Path to the input SDF file.

Returns:

A list of SMILES strings corresponding to valid molecules in the SDF file.

Return type:

List[str]

Raises:
neurosnap.chemicals.smiles_to_sdf(smiles, output_path)[source]

Converts a SMILES string to an sdf file. Will overwrite existing results.

NOTE: This function does the bare minimum in terms of generating the SDF molecule. The neurosnap.conformers module should be used in most cases.

Parameters:
  • smiles (str) – Smiles string to parse and convert

  • output_path (str) – Path to output SDF file, should end with .sdf

Return type:

None

neurosnap.chemicals.validate_smiles(smiles)[source]

Validates a SMILES (Simplified Molecular Input Line Entry System) string.

Parameters:

smiles (str) – The SMILES string to validate.

Returns:

True if the SMILES string is valid, False otherwise.

Return type:

bool

Raises:

Exception – Logs any exception encountered during validation.

neurosnap.conformers module

Provides functions and classes related to processing and generating conformers.

neurosnap.conformers.find_LCS(mol)[source]

Find the largest common substructure (LCS) between a set of conformers and aligns all conformers to the LCS.

Parameters:

mol (Mol) – Input RDkit molecule object, must already have conformers present

Return type:

Mol

Returns:

Resultant molecule object with all conformers aligned to the LCS

Raises:

Exception – if no LCS is detected

neurosnap.conformers.generate(input_mol, output_name='unique_conformers', write_multi=False, num_confs=1000, min_method='auto', max_atoms=500)[source]

Generate conformers for an input molecule.

Performs the following actions in order: 1. Generate conformers using ETKDG method 2. Minimize energy of all conformers and remove those below a dynamic threshold 3. Align & create RMSD matrix of all conformers 4. Clusters using Butina method to remove structurally redundant conformers 5. Return most energetically favorable conformers in each cluster

Parameters:
  • input_mol (Any) – Input molecule can be a path to a molecule file, a SMILES string, or an instance of rdkit.Chem.rdchem.Mol

  • output_name (str) – Output to write SDF files of passing conformers

  • write_multi (bool) – If True will write all unique conformers to a single SDF file, if False will write all unique conformers in separate SDF files in output_name

  • num_confs (int) – Number of conformers to generate

  • min_method (Optional[str]) – Method for minimization, can be either “auto”, “UFF”, “MMFF94”, “MMFF94s”, or None for no minimization

  • max_atoms (int) – Maximum number of atoms allowed for the input molecule

Return type:

DataFrame

Returns:

A dataframe with all conformer statistics. Note if energy minimization is disabled or fails then energy column will consist of None values.

neurosnap.conformers.minimize(mol, method='MMFF94', percentile=100.0)[source]

Minimize conformer energy (kcal/mol) using RDkit and filter out conformers based on energy percentile.

Parameters:
  • mol (Mol) – RDkit mol object containing the conformers you want to minimize. (rdkit.Chem.rdchem.Mol)

  • method (str) – Can be either UFF, MMFF94, or MMFF94s (str)

  • percentile (float) – Filters out conformers above a given energy percentile (0 to 100). For example, 10.0 will retain conformers within the lowest 10% energy. (float)

Return type:

Tuple[float, Dict[int, float]]

Returns:

A tuple of the form (mol_filtered, energies) - mol_filtered: Molecule object with filtered conformers. - energies: Dictionary where keys are conformer IDs and values are calculated energies in kcal/mol.

neurosnap.log module

class neurosnap.log.CustomLogger(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]

Bases: Formatter

Custom logger with specialized formatting.

Note

[+] logging.DEBUG: Used for all general info

[*] logging.INFO: Used for more important key info that isn’t negative

[-] logging.WARNING: Used for non-severe info that is negative

[!] logging.ERROR: Used for errors that require attention but are super concerning

[!] logging.CRITICAL: Used for very severe errors that require immediate attention and are concerning

format(record)[source]

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

log_format_basic = '%(message)s'
log_format_detailed = '\x1b[90m%(asctime)s\x1b[0m %(message)s \x1b[38;5;204m(%(filename)s:%(lineno)d)\x1b[0m'
class neurosnap.log.c[source]

Bases: object

Terminal colors class

b = '\x1b[38;5;295m'
br = '\x1b[31;1m'
c = '\x1b[38;5;299m'
g = '\x1b[38;5;47m'
grey = '\x1b[90m'
o = '\x1b[38;5;208m'
p = '\x1b[38;5;204m'
r = '\x1b[38;5;1m'
y = '\x1b[38;5;226m'

neurosnap.msa module

Provides functions and classes related to processing protein sequence data.

neurosnap.msa.align_mafft(seqs, ep=0.0, op=1.53)[source]

Generates an alignment using mafft.

Parameters:
  • seqs (Union[str, List[str], Dict[str, str]]) –

    Can be:

    • fasta file path,

    • list of sequences, or

    • dictionary where values are AA sequences and keys are their corresponding names/IDs

  • ep (float) – ep value for mafft, default is 0.00

  • op (float) – op value for mafft, default is 1.53

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of the form (out_names, out_seqs)

  • out_names: list of aligned protein names

  • out_seqs: list of corresponding protein sequences

neurosnap.msa.get_seqid(seq1, seq2)[source]

Calculate the pairwise sequence identity of two same length sequences or alignments. Will not perform any alignment steps.

Parameters:
  • seq1 (str) – The 1st sequence / aligned sequence.

  • seq2 (str) – The 2nd sequence / aligned sequence.

Return type:

float

Returns:

The pairwise sequence identity, 0 means no matches found, 100 means sequences were identical.

neurosnap.msa.pad_seqs(seqs, char='-', truncate=False)[source]

Pads all sequences to the longest sequences length using a character from the right side.

Parameters:
  • seqs (List[str]) – List of sequences to pad

  • chars – The character to perform the padding with, default is “-”

  • truncate (Union[bool, int]) – When set to True will truncate all sequences to the length of the first, set to integer to truncate sequence to that length

Return type:

List[str]

Returns:

The padded sequences

neurosnap.msa.read_msa(input_fasta, size=inf, allow_chars='', drop_chars='', remove_chars='*', uppercase=True)[source]

Reads an MSA, a3m, or fasta file and returns an array of names and seqs. Returned headers will consist of all characters up until the first space with the “|” character replaced with an underscore.

Parameters:
  • input_fasta (Union[str, TextIOBase]) – Path to read input a3m file, fasta as a raw string, or a file-handle like object to read

  • size (float) – Number of rows to read

  • allow_chars (str) – Sequences that contain characters not included within STANDARD_AAs+allow_chars will throw an exception

  • drop_chars (str) – Drop sequences that contain these characters. For example, "-X"

  • remove_chars (str) – Removes these characters from sequences. For example, "*-X"

  • uppercase (bool) – Converts all amino acid chars to uppercase when True

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of the form (names, seqs)

  • names: list of protein names from the a3m file, including gaps

  • seqs: list of protein sequences from the a3m file, including gaps

neurosnap.msa.run_mmseqs2(seqs, output, database='mmseqs2_uniref_env', use_filter=True, use_templates=False, pairing=None, print_citations=True)[source]

Generate an a3m MSA using the ColabFold API. Will write all results to the output directory including templates, MSAs, and accompanying files.

Code originally adapted from: https://github.com/sokrypton/ColabFold/

Parameters:
  • seqs (str) – Amino acid sequences for protein to generate an MSA of

  • output (str) – Output directory path, will overwrite existing results

  • database (str) – Choose the database to use, must be either “mmseqs2_uniref_env” or “mmseqs2_uniref”

  • use_filter (bool) – Enables the diversity and msa filtering steps that ensures the MSA will not become enormously large (described in manuscript methods section of ColabFold paper)

  • use_templates (bool) – Download templates as well using the mmseqs2 results

  • pairing (Optional[str]) – Can be set to either “greedy”, “complete”, or None for no pairing

  • print_citations (bool) – Prints citations

Returns:

list of a3m lines - template_paths: list of template paths

Return type:

  • a3m_lines

neurosnap.msa.run_mmseqs2_modes(seq, output, cov=50, id=90, max_msa=2048, mode='unpaired_paired', print_citations=True)[source]

Generate a multiple sequence alignment (MSA) for the given sequence(s) using Colabfold’s API. Key difference between this function and run_mmseqs2 is that this function supports different modes. The final a3m and most useful a3m file will be written as “output/final.a3m”. Code originally adapted from: https://github.com/sokrypton/ColabFold/

Parameters:
  • seq (Union[str, List[str]]) – Sequence(s) to generate the MSA for. If a list of sequences is provided, they will be considered as a single protein for the MSA.

  • output (str) – Output directory path, will overwrite existing results.

  • cov (int) – Coverage of the MSA

  • id (int) – Identity threshold for the MSA

  • max_msa (int) – Maximum number of sequences in the MSA

  • mode (str) – Mode to run the MSA generation in. Must be in ["unpaired", "paired", "unpaired_paired"]

  • print_citations (bool) – Whether to print the citations in the output.

neurosnap.msa.run_phmmer(query, database, evalue=10.0, cpu=2)[source]

Run phmmer using a query sequence against a database and return all the sequences that are considered as hits. Shamelessly stolen and adapted from https://github.com/seanrjohnson/protein_gibbs_sampler/blob/a5de349d5f6a474407fc0f19cecf39a0447a20a6/src/pgen/utils.py#L263

Parameters:
  • query (str) – Amino acid sequence of the protein you want to find hits for

  • database (str) – Path to reference database of sequences you want to search for hits and create and alignment with, must be a protein fasta file

  • evalue (float) – The threshold E value for the phmmer hit to be reported

  • cpu (int) – The number of CPU cores to be used to run phmmer

Return type:

List[str]

Returns:

List of hits ranked by how good the hits are

neurosnap.msa.run_phmmer_mafft(query, ref_db_path, size=inf, in_name='input_sequence')[source]

Generate MSA using phmmer and mafft from reference sequences.

Parameters:
  • query (str) – Amino acid sequence of the protein whose MSA you want to create

  • ref_db_path (str) – Path to reference database of sequences with which you want to search for hits and create and alignment

  • size (int) – Top n number of sequences to keep

  • in_name (str) – Optional name for input sequence to put in the output

Return type:

Tuple[List[str], List[str]]

Returns:

A tuple of the form (out_names, out_seqs)

  • out_names: list of aligned protein names

  • out_seqs: list of corresponding protein sequences

neurosnap.msa.write_msa(output_path, names, seqs)[source]

Writes an MSA, a3m, or fasta to a file. Makes no assumptions about the validity of names or sequences. Will throw an exception if len(names) != len(seqs)

Parameters:
  • output_path (str) – Path to output file to write, will overwrite existing files

  • names (List[str]) – List of proteins names from the file

  • seqs (List[str]) – List of proteins sequences from the file

neurosnap.protein module

Provides functions and classes related to processing protein data as well as a feature rich wrapper around protein structures using BioPython.

class neurosnap.protein.Protein(pdb, format='auto')[source]

Bases: object

__call__(model=None, chain=None, res_type=None)[source]

Returns a selection of a copy of the internal dataframe that matches the provided query. If no queries are provided, will return a copy of the internal dataframe.

Parameters:
  • model (Optional[int]) – If provided, returned atoms must match this model

  • chain (Optional[int]) – If provided, returned atoms must match this chain

  • res_type (Optional[int]) – If provided, returned atoms must match this res_type

Return type:

DataFrame

Returns:

Copy of the internal dataframe that matches the input query

__init__(pdb, format='auto')[source]

Class that wraps around a protein structure.

Utilizes the biopython protein structure under the hood. Atoms that are not part of a chain will automatically be added to a new chain that does not overlap with any existing chains.

Parameters:
  • pdb (Union[str, IOBase]) – Can be either a file handle, PDB or mmCIF filepath, PDB ID, or UniProt ID

  • format (str) – File format of the input (“pdb”, “mmcif”, or “auto” to infer format from extension)

__sub__(other_protein)[source]

Automatically calculate the RMSD of two proteins. Model used will naively be the first models that have identical backbone shapes. Essentially just wraps around self.calculate_rmsd()

Parameters:

other_protein (Protein) – Another Protein object to compare against

Return type:

DataFrame

Returns:

Copy of the internal dataframe that matches the input query

align(other_protein, model1=0, model2=0)[source]

Align another Protein object’s structure to the self.structure of the current object. The other Protein will be transformed and aligned. Only compares backbone atoms (N, CA, C).

Parameters:
  • other_protein (Protein) – Another Protein object to compare against

  • model1 (int) – Model ID of reference protein to align to

  • model2 (int) – Model ID of other protein to transform and align to reference

calculate_center_of_mass(model=None, chain=None)[source]

Calculate the center of mass of the protein. Considers only atoms with defined masses.

Parameters:
  • model (Optional[int]) – Model ID to calculate for, if not provided calculates for all models

  • chain (Optional[str]) – Chain ID to calculate for, if not provided calculates for all chains

Returns:

A 3D numpy array representing the center of mass

Return type:

center_of_mass

calculate_distance_matrix(model=None, chain=None)[source]

Calculate the distance matrix for all alpha-carbon (CA) atoms in the chain. Useful for creating contact maps or proximity analyses.

Parameters:
  • model (Optional[int]) – The model ID to calculate the distance matrix for, if not provided will use first model found

  • chain (Optional[str]) – The chain ID to calculate, if not provided calculates for all chains

Return type:

ndarray

Returns:

A 2D numpy array representing the distance matrix

calculate_hydrogen_bonds(model=None, chain=None, chain_other=None, donor_acceptor_cutoff=3.5, angle_cutoff=120.0)[source]

Calculate the number of hydrogen bonds in the protein structure. Hydrogen atoms must be explicitly defined within the structure as implicit hydrogens will not computed. We recommend using a tool like reduce to add missing hydrogens.

Hydrogen bonds are detected based on distance and angle criteria: - Distance between donor and acceptor must be less than donor_acceptor_cutoff. - The angle formed by donor-hydrogen-acceptor must be greater than angle_cutoff.

If model is set to None, hydrogen bonds are calculated only for the first model in the structure.

If chain_other is None:
  • Hydrogen bonds are calculated for the specified chain or all chains if chain is also None.

If chain_other is set to a specific chain:
  • Hydrogen bonds are calculated only between atoms of chain and chain_other.

If chain_other is specified but chain is not, an exception is raised.

Parameters:
  • model (Optional[int]) – Model ID to calculate for. If None, only the first model is considered.

  • chain (Optional[str]) – Chain ID to calculate for. If None, all chains in the selected model are considered.

  • chain_other (Optional[str]) – Secondary chain ID for inter-chain hydrogen bonds. If None, intra-chain bonds are calculated.

  • donor_acceptor_cutoff (float) – Maximum distance between donor and acceptor (in Å). Default is 3.5 Å.

  • angle_cutoff (float) – Minimum angle for a hydrogen bond (in degrees). Default is 120°.

Return type:

int

Returns:

The total number of hydrogen bonds in the structure.

Raises:

ValueError – If chain_other is specified but chain is not.

calculate_protein_volume(model=0, chain=None)[source]

Compute an estimate of the protein volume using the van der Waals radii. Uses the sum of atom radii to compute the volume.

Parameters:
  • model (int) – Model ID to compute volume for, defaults to 0

  • chain (Optional[str]) – Chain ID to compute, if not provided computes for all chains

Return type:

float

Returns:

Estimated volume in ų

calculate_rmsd(other_protein, model1=0, model2=0, chain1=None, chain2=None, align=True)[source]

Calculate RMSD between the current structure and another protein. Only compares backbone atoms (N, CA, C). RMSD is in angstroms (Å).

Parameters:
  • other_protein (Protein) – Another Protein object to compare against

  • model1 (int) – Model ID of original protein to compare

  • model2 (int) – Model ID of other protein to compare

  • chain1 (Optional[str]) – Chain ID of original protein, if not provided compares all chains

  • chain2 (Optional[str]) – Chain ID of other protein, if not provided compares all chains

  • align (bool) – Whether to align the structures first using Superimposer

Return type:

float

Returns:

The root-mean-square deviation between the two structures

calculate_surface_area(model=0, level='R')[source]

Calculate the solvent-accessible surface area (SASA) of the protein. Utilizes Biopython’s SASA module.

Parameters:
  • model (int) – The model ID to calculate SASA for, defaults to 0.

  • level (str) – The level at which ASA values are assigned, which can be one of “A” (Atom), “R” (Residue), “C” (Chain), “M” (Model), or “S” (Structure). The ASA value of an entity is the sum of all ASA values of its children.

Return type:

float

Returns:

Solvent-accessible surface area in Ų

chains(model=0)[source]

Returns a list of all the chain names/IDs.

Parameters:

model (int) – The ID of the model you want to fetch the chains of, defaults to 0

Return type:

List[str]

Returns:

Chain names/IDs found within the PDB file

distances_from_com(model=None, chain=None)[source]

Calculate the distances of all atoms from the center of mass (COM) of the protein.

This method computes the Euclidean distance between the coordinates of each atom and the center of mass of the structure. The center of mass is calculated for the specified model and chain, or for all models and chains if none are provided.

Parameters:
  • model (Optional[int]) – The model ID to calculate for. If not provided, calculates for all models.

  • chain (Optional[str]) – The chain ID to calculate for. If not provided, calculates for all chains.

Returns:

A 1D NumPy array containing the distances (in Ångströms) between each atom and the center of mass.

find_disulfide_bonds(threshold=2.05)[source]

Find disulfide bonds between Cysteine residues in the structure. Looks for SG-SG bonds within a threshold distance.

Parameters:

threshold (float) – Maximum distance to consider a bond between SG atoms, in angstroms. Default is 2.05 Å.

Return type:

List[Tuple]

Returns:

List of tuples of residue pairs forming disulfide bonds

find_hydrophobic_residues(model=None, chain=None)[source]

Identify hydrophobic residues in the structure.

Parameters:
  • model (Optional[int]) – Model ID to extract from. If None, all models are checked.

  • chain (Optional[str]) – Chain ID to extract from. If None, all chains are checked.

Return type:

List[Tuple]

Returns:

List of tuples (model_id, chain_id, residue) for hydrophobic residues

find_missing_residues(chain=None)[source]

Identify missing residues in the structure based on residue numbering. Useful for identifying gaps in the structure.

Parameters:

chain (Optional[str]) – Chain ID to inspect. If None, all chains are inspected.

Returns:

List of missing residue positions

Return type:

missing_residues

find_salt_bridges(model=None, chain=None, cutoff=4.0)[source]

Identify salt bridges between oppositely charged residues. A salt bridge is defined as an interaction between a positively charged residue (Lys, Arg) and a negatively charged residue (Asp, Glu) within a given cutoff distance.

Parameters:
  • model (Optional[int]) – Model ID to search. If None, all models are searched.

  • chain (Optional[str]) – Chain ID to search. If None, all chains are searched.

  • cutoff (float) – Maximum distance for a salt bridge (float)

Return type:

List[Tuple]

Returns:

List of residue pairs forming salt bridges

generate_df()[source]

Generate the biopandas-like dataframe and update the value of self.df to the new dataframe. This method should be called whenever the internal protein structure is modified or has a transformation applied to it.

Inspired by: https://biopandas.github.io/biopandas

get_aas(model, chain)[source]

Returns the amino acid sequence of a target chain. Ligands, small molecules, and nucleotides are ignored.

Parameters:
  • model (int) – The ID of the model containing the target chain

  • chain (str) – The ID of the chain you want to fetch the AA sequence of

Return type:

str

Returns:

The amino acid sequence of the found chain

get_backbone(model=None, chain=None)[source]

Extract backbone atoms (N, CA, C) from the structure. If model or chain is not provided, extracts from all models/chains.

Parameters:
  • model (Optional[int]) – Model ID to extract from. If None, all models are included.

  • chain (Optional[str]) – Chain ID to extract from. If None, all chains are included.

Return type:

ndarray

Returns:

A numpy array of backbone coordinates (Nx3)

models()[source]

Returns a list of all the model names/IDs.

Returns:

Chain names/IDs found within the PDB file

Return type:

models

remove(model, chain=None, resi_start=None, resi_end=None)[source]

Completely removes all parts of a selection from self.structure. If a residue range is provided then all residues between resi_start and resi_end will be removed from the structure (inclusively). If a residue range is not provided then all residues in a chain will be removed.

Parameters:
  • model (int) – ID of model to remove from

  • chain (Optional[str]) – ID of chain to remove from, if not provided will remove all chains in the model

  • resi_start (Optional[int]) – Index of first residue in the range you want to remove

  • resi_end (Optional[int]) – Index of last residues in the range you want to remove

remove_non_biopolymers(model=None, chain=None)[source]

Removes all ligands, heteroatoms, and non-biopolymer residues from the selected structure. Non-biopolymer residues are considered to be any residues that are not standard amino acids or standard nucleotides (DNA/RNA). If no model or chain is provided, it will remove from the entire structure.

Parameters:
  • model (Optional[int]) – The model ID to process. If None, will use all models.

  • chain (Optional[str]) – The chain ID to process. If None, will use all chains.

remove_nucleotides(model=None, chain=None)[source]

Removes all nucleotides (DNA and RNA) from the structure. If no model or chain is provided, it will remove nucleotides from the entire structure.

Parameters:
  • model (Optional[int]) – The model ID to process. If None, will use all models.

  • chain (Optional[str]) – The chain ID to process. If None, will use all chains.

remove_waters()[source]

Removes all water molecules (residues named ‘WAT’ or ‘HOH’) from the structure. It is suggested to call renumber() afterwards as well.

renumber(model=None, chain=None, start=1)[source]

Renumbers all selected residues. If selection does not exist this function will do absolutely nothing.

Parameters:
  • model (Optional[int]) – The model ID to renumber. If None, will use all models.

  • chain (Optional[int]) – The chain ID to renumber. If None, will use all models.

  • start (int) – Starting value to increment from, defaults to 1.

save(fpath, format='auto')[source]

Save the structure as a PDB or mmCIF file. Will overwrite any existing files.

Parameters:
  • fpath (str) – File path where you want to save the structure

  • format (str) – File format to save in, either ‘pdb’ or ‘mmcif’, set to ‘auto’ to infer format from extension.

select_residues(selectors, model=None)[source]

Select residues from a protein structure using a string selector.

This method allows for flexible selection of residues in a protein structure based on a string query. The query must be a comma-delimited list of selectors following these patterns:

  • “C”: Select all residues in chain C.

  • “B1”: Select residue with identifier 1 in chain B only.

  • “A10-20”: Select residues with identifiers 10 to 20 (inclusive) in chain A.

  • “A15,A20-23,B”: Select residues 15, 20, 21, 22, 23, and all residues in chain B.

If any selector does not match residues in the structure, an exception is raised.

Parameters:
  • selectors (str) – A string specifying the residue selection query.

  • model (Optional[int]) – The ID of the model to select from. If None, the first model is used.

Returns:

A dictionary where keys are chain IDs and values are sorted

lists of residue sequence numbers that match the query.

Return type:

dict

Raises:

ValueError – If a specified chain or residue in the selector does not exist in the structure.

to_sdf(fpath)[source]

Save the current protein structure as an SDF file. Will export all models and chains. Use remove() method to get rid of undesired regions.

Parameters:

fpath (str) – Path to the output SDF file

neurosnap.protein.animate_pseudo_3D(fig, ax, frames, titles='Protein Animation', interval=200, repeat_delay=0, repeat=True)[source]

Animate multiple Pseudo 3D LineCollection objects.

Parameters:
  • fig (Figure) – Matplotlib figure that contains all the frames

  • ax (Axes) – Matplotlib axes for the figure that contains all the frames

  • frames (LineCollection) – List of LineCollection objects

  • titles (Union[str, List[str]]) – Single title or list of titles corresponding to each frame

  • interval (int) – Delay between frames in milliseconds

  • repeat_delay (int) – The delay in milliseconds between consecutive animation runs, if repeat is True

  • repeat (bool) – Whether the animation repeats when the sequence of frames is completed

Return type:

ArtistAnimation

Returns:

Animation of all the different frames

neurosnap.protein.calc_lDDT(ref_pdb, sample_pdb)[source]

Calculates the lDDT (Local Distance Difference Test) between two proteins.

Parameters:
  • ref_pdb (str) – Filepath for reference protein

  • sample_pdb (str) – Filepath for sample protein

Return type:

float

Returns:

The lDDT score of the two proteins which ranges between 0-1

neurosnap.protein.extract_non_biopolymers(pdb_file, output_dir, min_atoms=0)[source]

Extracts all non-biopolymer molecules (ligands, heteroatoms, etc.) from the specified PDB file and writes them to SDF files. Each molecule is saved as a separate SDF file in the output directory. Automatically adds hydrogens to molecules. Attempts to sanitize the molecule if possible; logs a warning if sanitization fails.

Parameters:
  • pdb_file (str) – Path to the input PDB file.

  • output_dir (str) – Directory where the SDF files will be saved. Will overwrite existing directory.

  • min_atoms (int) – Minimum number of atoms a molecule must have to be saved. Molecules with fewer atoms are skipped.

Perform a protein structure search using the Foldseek API.

Parameters:
  • protein (Union[Protein, str]) – Either a Protein object or a path to a PDB file.

  • mode (str) – Search mode. Must be on of “3diaa” or “tm-align”.

  • databases (List[str]) – List of databases to search. Defaults to a predefined list if not provided.

  • max_retries (int) – Maximum number of retries to check the job status.

  • retry_interval (int) – Time in seconds between retries for checking job status.

  • output_format (str) – Format of the output, either “json” or “dataframe”.

Return type:

Union[str, DataFrame]

Returns:

Search results in the specified format (JSON string or pandas DataFrame).

Raises:
neurosnap.protein.getAA(query)[source]

Efficiently get any amino acid using either their 1 letter code, 3 letter abbreviation, or full name. See AAs_FULL_TABLE for a list of all supported amino acids and codes.

Parameters:

query (str) – Amino acid code, abbreviation, or name

Return type:

Tuple[str, str, str]

Returns:

A triple of the form (code, abr, name).

  • code is the amino acid 1 letter abbreviation / code

  • abr is the amino acid 3 letter abbreviation / code

  • name is the amino acid full name

neurosnap.protein.plot_pseudo_3D(xyz, c=None, ax=None, chainbreak=5, Ls=None, cmap='gist_rainbow', line_w=2.0, cmin=None, cmax=None, zmin=None, zmax=None, shadow=0.95)[source]

Plot the famous Pseudo 3D projection of a protein.

Algorithm originally written By Dr. Sergey Ovchinnikov. Adapted from https://github.com/sokrypton/ColabDesign/blob/16e03c23f2a30a3dcb1775ac25e107424f9f7352/colabdesign/shared/plot.py

Parameters:
  • xyz (Union[ndarray, DataFrame]) – XYZ coordinates of the protein

  • c (ndarray) – 1D array of all the values to use to color the protein, defaults to residue index

  • ax (Axes) – Matplotlib axes object to add the figure to

  • chainbreak (int) – Minimum distance in angstroms between chains / segments before being considered a chain break (int)

  • Ls (Optional[List]) – Allows handling multiple chains or segments by providing the lengths of each chain, ensuring that chains are visualized separately without unwanted connections

  • cmap (str) – Matplotlib color map to use for coloring the protein

  • line_w (float) – Line width

  • cmin (Optional[float]) – Minimum value for coloring, automatically calculated if None

  • cmax (Optional[float]) – Maximum value for coloring, automatically calculated if None

  • zmin (Optional[float]) – Minimum z coordinate values, automatically calculated if None

  • zmax (Optional[float]) – Maximum z coordinate values, automatically calculated if None

  • shadow (float) – Shadow intensity between 0 and 1 inclusive, lower numbers mean darker more intense shadows

Return type:

LineCollection

Returns:

LineCollection object of what’s been drawn

neurosnap.protein.run_blast(sequence, email, matrix='BLOSUM62', alignments=250, scores=250, evalue=10.0, filter=False, gapalign=True, database='uniprotkb_refprotswissprot', output_format=None, output_path=None, return_df=True)[source]

Submits a BLAST job to the EBI NCBI BLAST web service, checks the status periodically, and retrieves the result. The result can be saved either as an XML or FASTA file. Optionally, a DataFrame with alignment details can be returned.

Parameters:
  • sequence (Union[str, Protein]) –

    The input amino acid sequence as a string or a Protein object.

    If a Protein object is provided with multiple chains, an error will be raised, and the user will be prompted to provide a single chain sequence using the Protein.get_aas method.

  • email (str) – The email address to use for communication if there is a problem.

  • matrix (str) –

    The scoring matrix to use, default is "BLOSUM62".

    Must be one of:

    ["BLOSUM45", "BLOSUM62", "BLOSUM80", "PAM30", "PAM70"].
    

  • alignments (int) –

    The number of alignments to display in the result (default is 250). the number alignments must be one of the following:

    [50, 100, 250, 500, 750, 1000]
    

  • scores (int) – The number of scores to display in the result, default is 250.

  • evalue (float) –

    The E threshold for alignments (default is 10.0). Must be one of:

    [0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
    

  • filter (bool) – Whether to filter low complexity regions (default is False).

  • gapalign (bool) – Whether to allow gap alignments (default is True).

  • database (str) –

    The database to search in, default is "uniprotkb_refprotswissprot".

    Must be one of:

    ["uniprotkb_refprotswissprot", "uniprotkb_pdb", "uniprotkb", "afdb", "uniprotkb_reference_proteomes", "uniprotkb_swissprot", "uniref100", "uniref90", "uniref50", "uniparc"]
    

  • output_format (Optional[str]) – The format in which to save the result, either "xml" or "fasta". If None, which is the default, no file will be saved.

  • output_path (Optional[str]) – The file path to save the output. This is required if output_format is specified.

  • return_df (bool) – Whether to return a DataFrame with alignment details, default is True.

Return type:

Optional[DataFrame]

Returns:

A pandas DataFrame with BLAST hit and alignment information, if return_df is True.

The DataFrame contains the following columns: - “Hit ID”: The identifier of the hit sequence. - “Accession”: The accession number of the hit sequence. - “Description”: The description of the hit sequence. - “Length”: The length of the hit sequence. - “Score”: The score of the alignment. - “Bits”: The bit score of the alignment. - “Expectation”: The E-value of the alignment. - “Identity (%)”: The percentage identity of the alignment. - “Gaps”: The number of gaps in the alignment. - “Query Sequence”: The query sequence in the alignment. - “Match Sequence”: The matched sequence in the alignment.

Raises:

AssertionError – If sequence is provided as a Protein object with multiple chains.

Module contents