neurosnap package
Submodules
neurosnap.api module
- class neurosnap.api.NeurosnapAPI(api_key)[source]
Bases:
object
- BASE_URL = 'https://neurosnap.ai/api'
Disables the sharing feature of a job and makes the job private.
- get_job_file(job_id, file_type, file_name, save_path, share_id=None)[source]
Fetches a specific file from a completed Neurosnap job and saves it to the specified path.
- Parameters:
- Return type:
- Returns:
Tuple of the form
(save_path, download_succeeded)
save_path
: The path where the file is saved.download_succeeded
: True if the file was downloaded successfully, False otherwise.
- Raises:
HTTPError – If the API request fails.
- get_job_files(job_id, file_type, share_id=None, format_type=None)[source]
Fetches all files from a completed Neurosnap job and optionally prints them.
- Parameters:
- Return type:
- Returns:
A list of file names from the job.
- Raises:
HTTPError – If the API request fails.
- get_jobs(format_type=None)[source]
Fetches and returns a list of submitted jobs. Optionally prints the jobs.
- get_services(format_type=None)[source]
Fetches and returns a list of available Neurosnap services. Optionally prints the services.
- Parameters:
“table”: Prints services in a tabular format with key fields.
”json”: Prints services as formatted JSON.
None (default): No printing.
- Return type:
- Returns:
A list of dictionaries representing available services.
- Raises:
HTTPError – If the API request fails.
- get_team_info(format_type=None)[source]
Fetches your team’s information if you are part of a Neurosnap Team.
- get_team_jobs(format_type=None)[source]
Fetches all the jobs submitted by all members of your Neurosnap Team.
Enables the sharing feature of a job and makes it public.
neurosnap.chemicals module
Provides functions and classes related to processing chemical data.
- neurosnap.chemicals.fetch_ccd(ccd_code, fpath)[source]
Fetches the ideal SDF (Structure Data File) for a given CCD (Chemical Component Dictionary) code and saves it to the specified file path.
This function retrieves the idealized structure of a chemical component from the RCSB Protein Data Bank (PDB) by downloading the corresponding SDF file. The downloaded file is then saved to the specified location.
- Parameters:
- Raises:
HTTPError – If the request to fetch the SDF file fails (e.g., 404 or connection error).
IOError – If there is an issue saving the SDF file to the specified file path.
Example
>>> fetch_ccd("ATP", "ATP_ideal.sdf") Fetches the ideal SDF file for the ATP molecule and saves it as "ATP_ideal.sdf".
- External Resources:
CCD Information: https://www.wwpdb.org/data/ccd
SDF File Download: https://files.rcsb.org/ligands/download/{CCD_CODE}_ideal.sdf
- neurosnap.chemicals.get_ccds(fpath='~/.cache/ccd_codes.json')[source]
Retrieves a set of all CCD (Chemical Component Dictionary) codes from the PDB.
This function checks for a locally cached JSON file with the CCD codes. - If the file exists, it reads and returns the set of codes from the cache. - If the file does not exist, it downloads the full Chemical Component Dictionary
(in mmCIF format) from the Protein Data Bank (PDB), extracts the CCD codes, and caches them in a JSON file for future use.
- Parameters:
fpath (
str
) – The path to store / cache all the stored ccd_codes as a JSON file. Default is “~/.cache/ccd_codes.json”- Returns:
- A set of all CCD codes (three-letter codes representing small molecules,
ligands, and post-translational modifications).
- Return type:
- Raises:
HTTPError – If the request to the PDB server fails.
JSONDecodeError – If the cached JSON file is corrupted.
- File Cache:
Cached file path: “.cache/ccd_codes.json”
The cache is automatically updated if it does not exist.
- External Resources:
CCD information: https://www.wwpdb.org/data/ccd
CCD data download: https://files.wwpdb.org/pub/pdb/data/monomers/components.cif
- neurosnap.chemicals.sdf_to_smiles(fpath)[source]
Converts molecules in an SDF file to SMILES strings.
Reads an input SDF file and extracts SMILES strings from its molecules. Invalid or unreadable molecules are skipped, with warnings logged.
- Parameters:
fpath (str) – Path to the input SDF file.
- Returns:
A list of SMILES strings corresponding to valid molecules in the SDF file.
- Return type:
List[str]
- Raises:
FileNotFoundError – If the SDF file cannot be found.
IOError – If the file cannot be read.
- neurosnap.chemicals.smiles_to_sdf(smiles, output_path)[source]
Converts a SMILES string to an sdf file. Will overwrite existing results.
NOTE: This function does the bare minimum in terms of generating the SDF molecule. The
neurosnap.conformers
module should be used in most cases.
neurosnap.conformers module
Provides functions and classes related to processing and generating conformers.
- neurosnap.conformers.find_LCS(mol)[source]
Find the largest common substructure (LCS) between a set of conformers and aligns all conformers to the LCS.
- neurosnap.conformers.generate(input_mol, output_name='unique_conformers', write_multi=False, num_confs=1000, min_method='auto', max_atoms=500)[source]
Generate conformers for an input molecule.
Performs the following actions in order: 1. Generate conformers using ETKDG method 2. Minimize energy of all conformers and remove those below a dynamic threshold 3. Align & create RMSD matrix of all conformers 4. Clusters using Butina method to remove structurally redundant conformers 5. Return most energetically favorable conformers in each cluster
- Parameters:
input_mol (
Any
) – Input molecule can be a path to a molecule file, a SMILES string, or an instance of rdkit.Chem.rdchem.Moloutput_name (
str
) – Output to write SDF files of passing conformerswrite_multi (
bool
) – If True will write all unique conformers to a single SDF file, if False will write all unique conformers in separate SDF files in output_namenum_confs (
int
) – Number of conformers to generatemin_method (
Optional
[str
]) – Method for minimization, can be either “auto”, “UFF”, “MMFF94”, “MMFF94s”, or None for no minimizationmax_atoms (
int
) – Maximum number of atoms allowed for the input molecule
- Return type:
- Returns:
A dataframe with all conformer statistics. Note if energy minimization is disabled or fails then energy column will consist of None values.
- neurosnap.conformers.minimize(mol, method='MMFF94', percentile=100.0)[source]
Minimize conformer energy (kcal/mol) using RDkit and filter out conformers based on energy percentile.
- Parameters:
mol (
Mol
) – RDkit mol object containing the conformers you want to minimize. (rdkit.Chem.rdchem.Mol)method (
str
) – Can be either UFF, MMFF94, or MMFF94s (str)percentile (
float
) – Filters out conformers above a given energy percentile (0 to 100). For example, 10.0 will retain conformers within the lowest 10% energy. (float)
- Return type:
- Returns:
A tuple of the form
(mol_filtered, energies)
-mol_filtered
: Molecule object with filtered conformers. -energies
: Dictionary where keys are conformer IDs and values are calculated energies in kcal/mol.
neurosnap.log module
- class neurosnap.log.CustomLogger(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]
Bases:
Formatter
Custom logger with specialized formatting.
Note
[+] logging.DEBUG
: Used for all general info[*] logging.INFO
: Used for more important key info that isn’t negative[-] logging.WARNING
: Used for non-severe info that is negative[!] logging.ERROR
: Used for errors that require attention but are super concerning[!] logging.CRITICAL
: Used for very severe errors that require immediate attention and are concerning- format(record)[source]
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- log_format_basic = '%(message)s'
- log_format_detailed = '\x1b[90m%(asctime)s\x1b[0m %(message)s \x1b[38;5;204m(%(filename)s:%(lineno)d)\x1b[0m'
neurosnap.msa module
Provides functions and classes related to processing protein sequence data.
- neurosnap.msa.align_mafft(seqs, ep=0.0, op=1.53)[source]
Generates an alignment using mafft.
- Parameters:
- Return type:
- Returns:
A tuple of the form
(out_names, out_seqs)
out_names
: list of aligned protein namesout_seqs
: list of corresponding protein sequences
- neurosnap.msa.get_seqid(seq1, seq2)[source]
Calculate the pairwise sequence identity of two same length sequences or alignments. Will not perform any alignment steps.
- neurosnap.msa.pad_seqs(seqs, char='-', truncate=False)[source]
Pads all sequences to the longest sequences length using a character from the right side.
- Parameters:
- Return type:
- Returns:
The padded sequences
- neurosnap.msa.read_msa(input_fasta, size=inf, allow_chars='', drop_chars='', remove_chars='*', uppercase=True)[source]
Reads an MSA, a3m, or fasta file and returns an array of names and seqs. Returned headers will consist of all characters up until the first space with the “|” character replaced with an underscore.
- Parameters:
input_fasta (
Union
[str
,TextIOBase
]) – Path to read input a3m file, fasta as a raw string, or a file-handle like object to readsize (
float
) – Number of rows to readallow_chars (
str
) – Sequences that contain characters not included within STANDARD_AAs+allow_chars will throw an exceptiondrop_chars (
str
) – Drop sequences that contain these characters. For example,"-X"
remove_chars (
str
) – Removes these characters from sequences. For example,"*-X"
uppercase (
bool
) – Converts all amino acid chars to uppercase when True
- Return type:
- Returns:
A tuple of the form
(names, seqs)
names
: list of protein names from the a3m file, including gapsseqs
: list of protein sequences from the a3m file, including gaps
- neurosnap.msa.run_mmseqs2(seqs, output, database='mmseqs2_uniref_env', use_filter=True, use_templates=False, pairing=None, print_citations=True)[source]
Generate an a3m MSA using the ColabFold API. Will write all results to the output directory including templates, MSAs, and accompanying files.
Code originally adapted from: https://github.com/sokrypton/ColabFold/
- Parameters:
seqs (
str
) – Amino acid sequences for protein to generate an MSA ofoutput (
str
) – Output directory path, will overwrite existing resultsdatabase (
str
) – Choose the database to use, must be either “mmseqs2_uniref_env” or “mmseqs2_uniref”use_filter (
bool
) – Enables the diversity and msa filtering steps that ensures the MSA will not become enormously large (described in manuscript methods section of ColabFold paper)use_templates (
bool
) – Download templates as well using the mmseqs2 resultspairing (
Optional
[str
]) – Can be set to either “greedy”, “complete”, or None for no pairingprint_citations (
bool
) – Prints citations
- Returns:
list of a3m lines -
template_paths
: list of template paths- Return type:
a3m_lines
- neurosnap.msa.run_mmseqs2_modes(seq, output, cov=50, id=90, max_msa=2048, mode='unpaired_paired', print_citations=True)[source]
Generate a multiple sequence alignment (MSA) for the given sequence(s) using Colabfold’s API. Key difference between this function and run_mmseqs2 is that this function supports different modes. The final a3m and most useful a3m file will be written as “output/final.a3m”. Code originally adapted from: https://github.com/sokrypton/ColabFold/
- Parameters:
seq (
Union
[str
,List
[str
]]) – Sequence(s) to generate the MSA for. If a list of sequences is provided, they will be considered as a single protein for the MSA.output (
str
) – Output directory path, will overwrite existing results.cov (
int
) – Coverage of the MSAid (
int
) – Identity threshold for the MSAmax_msa (
int
) – Maximum number of sequences in the MSAmode (
str
) – Mode to run the MSA generation in. Must be in["unpaired", "paired", "unpaired_paired"]
print_citations (
bool
) – Whether to print the citations in the output.
- neurosnap.msa.run_phmmer(query, database, evalue=10.0, cpu=2)[source]
Run phmmer using a query sequence against a database and return all the sequences that are considered as hits. Shamelessly stolen and adapted from https://github.com/seanrjohnson/protein_gibbs_sampler/blob/a5de349d5f6a474407fc0f19cecf39a0447a20a6/src/pgen/utils.py#L263
- Parameters:
query (
str
) – Amino acid sequence of the protein you want to find hits fordatabase (
str
) – Path to reference database of sequences you want to search for hits and create and alignment with, must be a protein fasta fileevalue (
float
) – The threshold E value for the phmmer hit to be reportedcpu (
int
) – The number of CPU cores to be used to run phmmer
- Return type:
- Returns:
List of hits ranked by how good the hits are
- neurosnap.msa.run_phmmer_mafft(query, ref_db_path, size=inf, in_name='input_sequence')[source]
Generate MSA using phmmer and mafft from reference sequences.
- Parameters:
query (
str
) – Amino acid sequence of the protein whose MSA you want to createref_db_path (
str
) – Path to reference database of sequences with which you want to search for hits and create and alignmentsize (
int
) – Top n number of sequences to keepin_name (
str
) – Optional name for input sequence to put in the output
- Return type:
- Returns:
A tuple of the form
(out_names, out_seqs)
out_names
: list of aligned protein namesout_seqs
: list of corresponding protein sequences
neurosnap.protein module
Provides functions and classes related to processing protein data as well as a feature rich wrapper around protein structures using BioPython.
- class neurosnap.protein.Protein(pdb, format='auto')[source]
Bases:
object
- __call__(model=None, chain=None, res_type=None)[source]
Returns a selection of a copy of the internal dataframe that matches the provided query. If no queries are provided, will return a copy of the internal dataframe.
- Parameters:
- Return type:
- Returns:
Copy of the internal dataframe that matches the input query
- __init__(pdb, format='auto')[source]
Class that wraps around a protein structure.
Utilizes the biopython protein structure under the hood. Atoms that are not part of a chain will automatically be added to a new chain that does not overlap with any existing chains.
- __sub__(other_protein)[source]
Automatically calculate the RMSD of two proteins. Model used will naively be the first models that have identical backbone shapes. Essentially just wraps around
self.calculate_rmsd()
- align(other_protein, model1=0, model2=0)[source]
Align another Protein object’s structure to the self.structure of the current object. The other Protein will be transformed and aligned. Only compares backbone atoms (N, CA, C).
- calculate_center_of_mass(model=None, chain=None)[source]
Calculate the center of mass of the protein. Considers only atoms with defined masses.
- calculate_distance_matrix(model=None, chain=None)[source]
Calculate the distance matrix for all alpha-carbon (CA) atoms in the chain. Useful for creating contact maps or proximity analyses.
- Parameters:
- Return type:
- Returns:
A 2D numpy array representing the distance matrix
- calculate_hydrogen_bonds(model=None, chain=None, chain_other=None, donor_acceptor_cutoff=3.5, angle_cutoff=120.0)[source]
Calculate the number of hydrogen bonds in the protein structure. Hydrogen atoms must be explicitly defined within the structure as implicit hydrogens will not computed. We recommend using a tool like reduce to add missing hydrogens.
Hydrogen bonds are detected based on distance and angle criteria: - Distance between donor and acceptor must be less than donor_acceptor_cutoff. - The angle formed by donor-hydrogen-acceptor must be greater than angle_cutoff.
If model is set to None, hydrogen bonds are calculated only for the first model in the structure.
- If chain_other is None:
Hydrogen bonds are calculated for the specified chain or all chains if chain is also None.
- If chain_other is set to a specific chain:
Hydrogen bonds are calculated only between atoms of chain and chain_other.
If chain_other is specified but chain is not, an exception is raised.
- Parameters:
model (
Optional
[int
]) – Model ID to calculate for. If None, only the first model is considered.chain (
Optional
[str
]) – Chain ID to calculate for. If None, all chains in the selected model are considered.chain_other (
Optional
[str
]) – Secondary chain ID for inter-chain hydrogen bonds. If None, intra-chain bonds are calculated.donor_acceptor_cutoff (
float
) – Maximum distance between donor and acceptor (in Å). Default is 3.5 Å.angle_cutoff (
float
) – Minimum angle for a hydrogen bond (in degrees). Default is 120°.
- Return type:
- Returns:
The total number of hydrogen bonds in the structure.
- Raises:
ValueError – If chain_other is specified but chain is not.
- calculate_protein_volume(model=0, chain=None)[source]
Compute an estimate of the protein volume using the van der Waals radii. Uses the sum of atom radii to compute the volume.
- calculate_rmsd(other_protein, model1=0, model2=0, chain1=None, chain2=None, align=True)[source]
Calculate RMSD between the current structure and another protein. Only compares backbone atoms (N, CA, C). RMSD is in angstroms (Å).
- Parameters:
other_protein (
Protein
) – Another Protein object to compare againstmodel1 (
int
) – Model ID of original protein to comparemodel2 (
int
) – Model ID of other protein to comparechain1 (
Optional
[str
]) – Chain ID of original protein, if not provided compares all chainschain2 (
Optional
[str
]) – Chain ID of other protein, if not provided compares all chainsalign (
bool
) – Whether to align the structures first using Superimposer
- Return type:
- Returns:
The root-mean-square deviation between the two structures
- calculate_surface_area(model=0, level='R')[source]
Calculate the solvent-accessible surface area (SASA) of the protein. Utilizes Biopython’s SASA module.
- Parameters:
- Return type:
- Returns:
Solvent-accessible surface area in Ų
- distances_from_com(model=None, chain=None)[source]
Calculate the distances of all atoms from the center of mass (COM) of the protein.
This method computes the Euclidean distance between the coordinates of each atom and the center of mass of the structure. The center of mass is calculated for the specified model and chain, or for all models and chains if none are provided.
- Parameters:
- Returns:
A 1D NumPy array containing the distances (in Ångströms) between each atom and the center of mass.
- find_disulfide_bonds(threshold=2.05)[source]
Find disulfide bonds between Cysteine residues in the structure. Looks for SG-SG bonds within a threshold distance.
- find_hydrophobic_residues(model=None, chain=None)[source]
Identify hydrophobic residues in the structure.
- find_missing_residues(chain=None)[source]
Identify missing residues in the structure based on residue numbering. Useful for identifying gaps in the structure.
- find_salt_bridges(model=None, chain=None, cutoff=4.0)[source]
Identify salt bridges between oppositely charged residues. A salt bridge is defined as an interaction between a positively charged residue (Lys, Arg) and a negatively charged residue (Asp, Glu) within a given cutoff distance.
- Parameters:
- Return type:
- Returns:
List of residue pairs forming salt bridges
- generate_df()[source]
Generate the biopandas-like dataframe and update the value of self.df to the new dataframe. This method should be called whenever the internal protein structure is modified or has a transformation applied to it.
Inspired by: https://biopandas.github.io/biopandas
- get_aas(model, chain)[source]
Returns the amino acid sequence of a target chain. Ligands, small molecules, and nucleotides are ignored.
- get_backbone(model=None, chain=None)[source]
Extract backbone atoms (N, CA, C) from the structure. If model or chain is not provided, extracts from all models/chains.
- models()[source]
Returns a list of all the model names/IDs.
- Returns:
Chain names/IDs found within the PDB file
- Return type:
models
- remove(model, chain=None, resi_start=None, resi_end=None)[source]
Completely removes all parts of a selection from self.structure. If a residue range is provided then all residues between resi_start and resi_end will be removed from the structure (inclusively). If a residue range is not provided then all residues in a chain will be removed.
- Parameters:
model (
int
) – ID of model to remove fromchain (
Optional
[str
]) – ID of chain to remove from, if not provided will remove all chains in the modelresi_start (
Optional
[int
]) – Index of first residue in the range you want to removeresi_end (
Optional
[int
]) – Index of last residues in the range you want to remove
- remove_non_biopolymers(model=None, chain=None)[source]
Removes all ligands, heteroatoms, and non-biopolymer residues from the selected structure. Non-biopolymer residues are considered to be any residues that are not standard amino acids or standard nucleotides (DNA/RNA). If no model or chain is provided, it will remove from the entire structure.
- remove_nucleotides(model=None, chain=None)[source]
Removes all nucleotides (DNA and RNA) from the structure. If no model or chain is provided, it will remove nucleotides from the entire structure.
- remove_waters()[source]
Removes all water molecules (residues named ‘WAT’ or ‘HOH’) from the structure. It is suggested to call
renumber()
afterwards as well.
- renumber(model=None, chain=None, start=1)[source]
Renumbers all selected residues. If selection does not exist this function will do absolutely nothing.
- save(fpath, format='auto')[source]
Save the structure as a PDB or mmCIF file. Will overwrite any existing files.
- select_residues(selectors, model=None)[source]
Select residues from a protein structure using a string selector.
This method allows for flexible selection of residues in a protein structure based on a string query. The query must be a comma-delimited list of selectors following these patterns:
“C”: Select all residues in chain C.
“B1”: Select residue with identifier 1 in chain B only.
“A10-20”: Select residues with identifiers 10 to 20 (inclusive) in chain A.
“A15,A20-23,B”: Select residues 15, 20, 21, 22, 23, and all residues in chain B.
If any selector does not match residues in the structure, an exception is raised.
- Parameters:
- Returns:
- A dictionary where keys are chain IDs and values are sorted
lists of residue sequence numbers that match the query.
- Return type:
- Raises:
ValueError – If a specified chain or residue in the selector does not exist in the structure.
- neurosnap.protein.animate_pseudo_3D(fig, ax, frames, titles='Protein Animation', interval=200, repeat_delay=0, repeat=True)[source]
Animate multiple Pseudo 3D LineCollection objects.
- Parameters:
fig (
Figure
) – Matplotlib figure that contains all the framesax (
Axes
) – Matplotlib axes for the figure that contains all the framesframes (
LineCollection
) – List of LineCollection objectstitles (
Union
[str
,List
[str
]]) – Single title or list of titles corresponding to each frameinterval (
int
) – Delay between frames in millisecondsrepeat_delay (
int
) – The delay in milliseconds between consecutive animation runs, if repeat is Truerepeat (
bool
) – Whether the animation repeats when the sequence of frames is completed
- Return type:
- Returns:
Animation of all the different frames
- neurosnap.protein.calc_lDDT(ref_pdb, sample_pdb)[source]
Calculates the lDDT (Local Distance Difference Test) between two proteins.
- neurosnap.protein.extract_non_biopolymers(pdb_file, output_dir, min_atoms=0)[source]
Extracts all non-biopolymer molecules (ligands, heteroatoms, etc.) from the specified PDB file and writes them to SDF files. Each molecule is saved as a separate SDF file in the output directory. Automatically adds hydrogens to molecules. Attempts to sanitize the molecule if possible; logs a warning if sanitization fails.
- neurosnap.protein.foldseek_search(protein, mode='3diaa', databases=None, max_retries=10, retry_interval=5, output_format='json')[source]
Perform a protein structure search using the Foldseek API.
- Parameters:
protein (
Union
[Protein
,str
]) – Either a Protein object or a path to a PDB file.mode (
str
) – Search mode. Must be on of “3diaa” or “tm-align”.databases (
List
[str
]) – List of databases to search. Defaults to a predefined list if not provided.max_retries (
int
) – Maximum number of retries to check the job status.retry_interval (
int
) – Time in seconds between retries for checking job status.output_format (
str
) – Format of the output, either “json” or “dataframe”.
- Return type:
- Returns:
Search results in the specified format (JSON string or pandas DataFrame).
- Raises:
RuntimeError – If the job fails.
TimeoutError – If the job does not complete within the allotted retries.
ValueError – If an invalid output_format is specified.
- neurosnap.protein.getAA(query)[source]
Efficiently get any amino acid using either their 1 letter code, 3 letter abbreviation, or full name. See AAs_FULL_TABLE for a list of all supported amino acids and codes.
- neurosnap.protein.plot_pseudo_3D(xyz, c=None, ax=None, chainbreak=5, Ls=None, cmap='gist_rainbow', line_w=2.0, cmin=None, cmax=None, zmin=None, zmax=None, shadow=0.95)[source]
Plot the famous Pseudo 3D projection of a protein.
Algorithm originally written By Dr. Sergey Ovchinnikov. Adapted from https://github.com/sokrypton/ColabDesign/blob/16e03c23f2a30a3dcb1775ac25e107424f9f7352/colabdesign/shared/plot.py
- Parameters:
xyz (
Union
[ndarray
,DataFrame
]) – XYZ coordinates of the proteinc (
ndarray
) – 1D array of all the values to use to color the protein, defaults to residue indexax (
Axes
) – Matplotlib axes object to add the figure tochainbreak (
int
) – Minimum distance in angstroms between chains / segments before being considered a chain break (int)Ls (
Optional
[List
]) – Allows handling multiple chains or segments by providing the lengths of each chain, ensuring that chains are visualized separately without unwanted connectionscmap (
str
) – Matplotlib color map to use for coloring the proteinline_w (
float
) – Line widthcmin (
Optional
[float
]) – Minimum value for coloring, automatically calculated if Nonecmax (
Optional
[float
]) – Maximum value for coloring, automatically calculated if Nonezmin (
Optional
[float
]) – Minimum z coordinate values, automatically calculated if Nonezmax (
Optional
[float
]) – Maximum z coordinate values, automatically calculated if Noneshadow (
float
) – Shadow intensity between 0 and 1 inclusive, lower numbers mean darker more intense shadows
- Return type:
- Returns:
LineCollection object of what’s been drawn
- neurosnap.protein.run_blast(sequence, email, matrix='BLOSUM62', alignments=250, scores=250, evalue=10.0, filter=False, gapalign=True, database='uniprotkb_refprotswissprot', output_format=None, output_path=None, return_df=True)[source]
Submits a BLAST job to the EBI NCBI BLAST web service, checks the status periodically, and retrieves the result. The result can be saved either as an XML or FASTA file. Optionally, a DataFrame with alignment details can be returned.
- Parameters:
sequence (
Union
[str
,Protein
]) –The input amino acid sequence as a string or a Protein object.
If a Protein object is provided with multiple chains, an error will be raised, and the user will be prompted to provide a single chain sequence using the
Protein.get_aas
method.email (
str
) – The email address to use for communication if there is a problem.matrix (
str
) –The scoring matrix to use, default is
"BLOSUM62"
.Must be one of:
["BLOSUM45", "BLOSUM62", "BLOSUM80", "PAM30", "PAM70"].
alignments (
int
) –The number of alignments to display in the result (default is 250). the number alignments must be one of the following:
[50, 100, 250, 500, 750, 1000]
scores (
int
) – The number of scores to display in the result, default is250
.evalue (
float
) –The E threshold for alignments (default is 10.0). Must be one of:
[0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
filter (
bool
) – Whether to filter low complexity regions (default is False).gapalign (
bool
) – Whether to allow gap alignments (default is True).database (
str
) –The database to search in, default is
"uniprotkb_refprotswissprot"
.Must be one of:
["uniprotkb_refprotswissprot", "uniprotkb_pdb", "uniprotkb", "afdb", "uniprotkb_reference_proteomes", "uniprotkb_swissprot", "uniref100", "uniref90", "uniref50", "uniparc"]
output_format (
Optional
[str
]) – The format in which to save the result, either"xml"
or"fasta"
. IfNone
, which is the default, no file will be saved.output_path (
Optional
[str
]) – The file path to save the output. This is required if output_format is specified.return_df (
bool
) – Whether to return a DataFrame with alignment details, default isTrue
.
- Return type:
- Returns:
A pandas DataFrame with BLAST hit and alignment information, if return_df is True.
The DataFrame contains the following columns: - “Hit ID”: The identifier of the hit sequence. - “Accession”: The accession number of the hit sequence. - “Description”: The description of the hit sequence. - “Length”: The length of the hit sequence. - “Score”: The score of the alignment. - “Bits”: The bit score of the alignment. - “Expectation”: The E-value of the alignment. - “Identity (%)”: The percentage identity of the alignment. - “Gaps”: The number of gaps in the alignment. - “Query Sequence”: The query sequence in the alignment. - “Match Sequence”: The matched sequence in the alignment.
- Raises:
AssertionError – If
sequence
is provided as a Protein object with multiple chains.