singlecellmultiomics.features package

Submodules

singlecellmultiomics.features.exonGTFtoIntronGTF module

singlecellmultiomics.features.exonGTFtoIntronGTF.decodeKvPairs(kv)[source]
singlecellmultiomics.features.exonGTFtoIntronGTF.exonGTF_to_intronGTF(exon_path, id)[source]
singlecellmultiomics.features.exonGTFtoIntronGTF.generate_introns(geneToExonRanges, id, id_to_features)[source]

singlecellmultiomics.features.features module

class singlecellmultiomics.features.features.FeatureAnnotatedObject(features, stranded, capture_locations, auto_set_intron_exon_features)[source]

Bases: object

get_hit_df()[source]

Obtain dataframe with hits :returns: pd.DataFrame

set_intron_exon_features()[source]
set_spliced(is_spliced)[source]

Set wether the transcript is spliced, False has priority over True

write_tags()[source]
class singlecellmultiomics.features.features.FeatureContainer(verbose=False)[source]

Bases: singlecellmultiomics.utils.prefetch.Prefetcher

addFeature(chromosome, start, end, name, strand=None, data=None)[source]
addVariant(chromosome, start, value=None, name='SNP', variantType='SNP', end=None)[source]
annotateUTRs(utrs=['three_prime_utr', 'five_prime_utr'])[source]

flag the exons that contain a utr

debugMsg(msg)[source]
findFeaturesAt[source]
findFeaturesAtPysamAlign(pysamRead, strand=None, method=1)[source]

Obtain all features mapping the pysam aligned segment. method 0: Query EVERY base method 1: Query every subsequent block of reads (pysam aligned segment .get_blocks)

findFeaturesBetween(chromosome, sampleStart, sampleEnd, strand=None)[source]
findFeaturesBetweenBRK(chromosome, lookupCoordinateStart, lookupCoordinateEnd, strand=None)[source]

Obtain all features between Start and end coordinate.

findNearestFeature[source]
findNearestLeftFeature(chromosome, lookupCoordinate, strand=None)[source]
findNearestRightFeature(chromosome, lookupCoordinate, strand=None)[source]
getCentroids()[source]
getReferenceList()[source]
get_gene_to_location_dict(meta_key='gene_name', with_strand=False)[source]

generate dictionary, {gene_name: contig,start,end}

Parameters:meta_key (str) – key of the meta information used to use as primary key for the returned gene_locations
Returns:gene_locations(dict)
instance(arg_update)[source]
loadBED(path, ignChr=False, parseBlocks=True)[source]

Load UCSC based table.

loadGTF(path, thirdOnly=None, identifierFields=['gene_id'], ignChr=False, select_feature_type=None, exon_select=None, head=None, store_all=False, contig=None, offset=-1, region_start=None, region_end=None)[source]

Load annotations from a GTF file. ignChr: ignore the chr part of the Annotation chromosome

loadSNPSFromVcf(vcfFilePath, locations=None)[source]
prefetch(contig, start, end)[source]
preload_GTF(**kwargs)[source]
sort()[source]

Build coordinate sorted datastructure to perform fast lookups.

singlecellmultiomics.features.features.get_gene_id_to_gene_name_conversion_table(annotation_path_exons, featureTypes=['gene_name'])[source]
Create a dictionary converting a gene id to other gene features,
such as gene_name/gene_biotype etc.
Parameters:
  • annotation_path_exons (str) – path to GTF file (can be gzipped)
  • featureTypes (list) – list of features to convert to, for example [‘gene_name’,’gene_biotype’]
Returns:

{ gene_id : ‘firstFeature_secondFeature’}

Return type:

conversion_dict(dict)

singlecellmultiomics.features.features.massIdConvert(baseIds, pathToIdMapping='/media/sf_data/references/human/HUMAN_9606_idmapping_selected.tab.gz', targetCol=1)[source]

Convert GENE identifiers into another format. Get a conversion table from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/

Module contents