singlecellmultiomics.features package¶

Submodules¶

singlecellmultiomics.features.exonGTFtoIntronGTF module¶

singlecellmultiomics.features.exonGTFtoIntronGTF.decodeKvPairs(kv)[source]¶

singlecellmultiomics.features.exonGTFtoIntronGTF.exonGTF_to_intronGTF(exon_path, id)[source]¶

singlecellmultiomics.features.exonGTFtoIntronGTF.generate_introns(geneToExonRanges, id, id_to_features)[source]¶

singlecellmultiomics.features.features module¶

class singlecellmultiomics.features.features.FeatureAnnotatedObject(features, stranded, capture_locations, auto_set_intron_exon_features)[source]¶

Bases: object

get_hit_df()[source]¶: Obtain dataframe with hits :returns: pd.DataFrame

set_intron_exon_features()[source]¶

set_spliced(is_spliced)[source]¶: Set wether the transcript is spliced, False has priority over True

write_tags()[source]¶

class singlecellmultiomics.features.features.FeatureContainer(verbose=False)[source]¶

Bases: singlecellmultiomics.utils.prefetch.Prefetcher

addFeature(chromosome, start, end, name, strand=None, data=None)[source]¶

addVariant(chromosome, start, value=None, name='SNP', variantType='SNP', end=None)[source]¶

annotateUTRs(utrs=['three_prime_utr', 'five_prime_utr'])[source]¶: flag the exons that contain a utr

debugMsg(msg)[source]¶

findFeaturesAt[source]¶

findFeaturesAtPysamAlign(pysamRead, strand=None, method=1)[source]¶: Obtain all features mapping the pysam aligned segment. method 0: Query EVERY base method 1: Query every subsequent block of reads (pysam aligned segment .get_blocks)

findFeaturesBetween(chromosome, sampleStart, sampleEnd, strand=None)[source]¶

findFeaturesBetweenBRK(chromosome, lookupCoordinateStart, lookupCoordinateEnd, strand=None)[source]¶: Obtain all features between Start and end coordinate.

findNearestFeature[source]¶

findNearestLeftFeature(chromosome, lookupCoordinate, strand=None)[source]¶

findNearestRightFeature(chromosome, lookupCoordinate, strand=None)[source]¶

getCentroids()[source]¶

getReferenceList()[source]¶

get_gene_to_location_dict(meta_key='gene_name', with_strand=False)[source]¶

generate dictionary, {gene_name: contig,start,end}

Parameters:	meta_key (str) – key of the meta information used to use as primary key for the returned gene_locations
Returns:	gene_locations(dict)

instance(arg_update)[source]¶

loadBED(path, ignChr=False, parseBlocks=True)[source]¶: Load UCSC based table.

loadGTF(path, thirdOnly=None, identifierFields=['gene_id'], ignChr=False, select_feature_type=None, exon_select=None, head=None, store_all=False, contig=None, offset=-1, region_start=None, region_end=None)[source]¶: Load annotations from a GTF file. ignChr: ignore the chr part of the Annotation chromosome

loadSNPSFromVcf(vcfFilePath, locations=None)[source]¶

prefetch(contig, start, end)[source]¶

preload_GTF(**kwargs)[source]¶

sort()[source]¶: Build coordinate sorted datastructure to perform fast lookups.

singlecellmultiomics.features.features.get_gene_id_to_gene_name_conversion_table(annotation_path_exons, featureTypes=['gene_name'])[source]¶

Create a dictionary converting a gene id to other gene features,: such as gene_name/gene_biotype etc.

Parameters:	annotation_path_exons (str) – path to GTF file (can be gzipped) featureTypes (list) – list of features to convert to, for example [‘gene_name’,’gene_biotype’]
Returns:	{ gene_id : ‘firstFeature_secondFeature’}
Return type:	conversion_dict(dict)

singlecellmultiomics.features.features.massIdConvert(baseIds, pathToIdMapping='/media/sf_data/references/human/HUMAN_9606_idmapping_selected.tab.gz', targetCol=1)[source]¶: Convert GENE identifiers into another format. Get a conversion table from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/

singlecellmultiomics.features package¶

Submodules¶

singlecellmultiomics.features.exonGTFtoIntronGTF module¶

singlecellmultiomics.features.features module¶

Module contents¶