singlecellmultiomics.molecule package

Submodules

singlecellmultiomics.molecule.chic module

class singlecellmultiomics.molecule.chic.AnnotatedCHICMolecule(fragment, features, **kwargs)[source]

Bases: singlecellmultiomics.molecule.chic.CHICMolecule, singlecellmultiomics.molecule.featureannotatedmolecule.FeatureAnnotatedMolecule

Chic Molecule which is annotated with features (genes/exons/introns, .. )

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • features (singlecellmultiomics.features.FeatureContainer) – container to use to obtain features from
  • **kwargs – extra args
write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
class singlecellmultiomics.molecule.chic.CHICMolecule(fragment, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

CHIC Molecule class

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • **kwargs – extra args
get_cut_site()[source]

For restriction based protocol data, obtain genomic location of cut site

Returns:None if site is not available

chromosome (str) position (int) strand (bool)

get_fragment_span_sequence(reference=None)[source]

Obtain the sequence between the start and end of the molecule

Parameters:reference (pysam.FastaFile) – reference to use. If not specified self.reference is used
Returns:sequence (str)
is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

update_ligation_motif()[source]

Extract lh tag from associated reads and set most common one to the RZ tag

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
class singlecellmultiomics.molecule.chic.CHICNLAMolecule(fragment, reference, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

CHIC NLA Molecule class

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • **kwargs – extra args
get_upstream_site(scan_extra_bp=3)[source]
write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)

singlecellmultiomics.molecule.consensus module

singlecellmultiomics.molecule.consensus.base_calling_matrix_to_df(x, ref_info=None, NUC_RADIUS=1, USE_RT=True)[source]

Convert numpy base calling feature matrix to pandas dataframe with annotated columns

Parameters:
  • x (np.array) – feature matrix
  • ref_info (list) – reference position annotations (will be used as index)
  • NUC_RADIUS (int) – generate kmer features target nucleotide
  • USE_RT (bool) – use RT reaction features
Returns:

df (pd.DataFrame)

singlecellmultiomics.molecule.consensus.calculate_consensus(molecule, consensus_model, molecular_identifier, out, **model_kwargs)[source]

Create consensus read for molecule

Parameters:
  • molecule (singlecellmultiomics.molecule.Molecule) –
  • consensus_model
  • molecular_identifier (str) – identier for this molecule, will be suffixed to the reference_id
  • out (pysam.AlingmentFile) – target bam file
  • **model_kwargs – arguments passed to the consensus model
singlecellmultiomics.molecule.consensus.get_consensus_training_data(molecule_iterator, mask_variants=None, n_train=100000, skip_already_covered_bases=True, **feature_matrix_args)[source]

Create a tensor/matrix containing alignment and base calling information, which can be used for consensus calling. This function also creates a vector containing the corresponding reference bases, which can be used for training a consensus model.

Parameters:
  • molecule_iterator – generator which generates molecules from which base calling feature matrices are extracted
  • mask_variants (pysam.VariantFile) – variant locations which should be excluded from the matrix
  • n_train (int) – amount of rows in the matrix
  • skip_already_covered_bases (bool) – when True every reference position is at most a single row in the output matrix, this prevents overfitting
  • **feature_matrix_args – Arguments to pass to the feature matrix function of the molecules.
singlecellmultiomics.molecule.consensus.train_consensus_model(molecule_iterator, mask_variants=None, classifier=None, n_train=100000, skip_already_covered_bases=True, **feature_matrix_args)[source]

singlecellmultiomics.molecule.featureannotatedmolecule module

class singlecellmultiomics.molecule.featureannotatedmolecule.FeatureAnnotatedMolecule(fragment, features, stranded=None, auto_set_intron_exon_features=False, capture_locations=False, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

Molecule which is annotated with features (genes/exons/introns, .. )

annotate(method=0)[source]
Parameters:method (int) – 0, obtain blocks and then obtain features. 1, try to obtain features for every aligned base
get_hit_df()[source]

Obtain dataframe with hits :returns: pd.DataFrame

set_intron_exon_features()[source]
set_spliced(is_spliced)[source]

Set wether the transcript is spliced, False has priority over True

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads, call_super=True)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.featureannotatedmolecule.TranscriptMolecule(fragment, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)

singlecellmultiomics.molecule.filter module

singlecellmultiomics.molecule.filter.molecule_iterator_filter(molecule_iterator, min_fragments=None, max_fragments=None, min_mapping_qual=None, max_mapping_qual=None, min_ivt_duplicates=None, max_ivt_duplicates=None, both_pairs_mapped=None, min_span=None, max_span=None)[source]

Filter iterable with molecules

molecule_iterator (iterable) : molecules to filter from min_fragments (int) : minimum amount of fragments associated to molecule max_fragments (int) : maximum amount of fragments associated to molecule min_mapping_qual(int) : minimum maximum mapping quality for a single associated fragment max_mapping_qual(int) : maximum maximum mapping quality for a single associated fragment min_ivt_duplicates(int) : minimum amount of in vitro transcription copies max_ivt_duplicates(int) : maximum amount of in vitro transcription copies both_pairs_mapped(bool) : molecule should have at least one fragment with both pairs mapped min_span(int) : minimum amount of bases aligned with reference max_span : maximum amount of bases aligned with reference

singlecellmultiomics.molecule.fourthiouridine module

class singlecellmultiomics.molecule.fourthiouridine.FourThiouridine(fragments=None, classifier=None, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

obtain_conversions(classifier=None)[source]

This methods obtains the amount of converted bases and stores them to self.converted_bases and the 4U tag :param classifier: classifier used for consensus determination

Returns:None

singlecellmultiomics.molecule.iterator module

class singlecellmultiomics.molecule.iterator.MoleculeIterator(alignments, molecule_class=<class 'singlecellmultiomics.molecule.molecule.Molecule'>, fragment_class=<class 'singlecellmultiomics.fragment.fragment.Fragment'>, check_eject_every=10000, molecule_class_args={}, fragment_class_args={}, perform_qflag=True, pooling_method=1, yield_invalid=False, yield_overflow=True, query_name_flagger=None, ignore_collisions=True, every_fragment_as_molecule=False, yield_secondary=False, yield_supplementary=False, max_buffer_size=None, iterator_class=<class 'pysamiterators.iterators.iterators.MatePairIterator'>, skip_contigs=None, progress_callback_function=None, min_mapping_qual=None, perform_allele_clustering=False, **pysamArgs)[source]

Bases: object

Iterate over molecules in pysam.AlignmentFile or reads from a generator or list

Example

>>> !wget https://github.com/BuysDB/SingleCellMultiOmics/blob/master/data/mini_nla_test.bam?raw=true -O mini_nla_test.bam
>>> !wget https://github.com/BuysDB/SingleCellMultiOmics/blob/master/data/mini_nla_test.bam.bai?raw=true -O mini_nla_test.bam.bai
>>> import pysam
>>> from singlecellmultiomics.molecule import NlaIIIMolecule, MoleculeIterator
>>> from singlecellmultiomics.fragment import NlaIIIFragment
>>> import pysamiterators
>>> alignments = pysam.AlignmentFile('mini_nla_test.bam')
>>> for molecule in MoleculeIterator(
>>>             alignments=alignments,
>>>             molecule_class=singlecellmultiomics.molecule.NlaIIIMolecule,
>>>             fragment_class=singlecellmultiomics.fragment.NlaIIIFragment,
>>>         ):
>>>     break
>>> molecule
NlaIIIMolecule
with 1 assinged fragments
Allele :No allele assigned
    Fragment:
    sample:APKS1P25-NLAP2L2_57
    umi:CCG
    span:chr1 164834728-164834868
    strand:+
    has R1: yes
    has R2: no
    randomer trimmed: no
    DS:164834865
RS:0
RZ:CAT
Restriction site:('chr1', 164834865)

It is also possible to supply and iterator instead of a SAM/BAM file handle .. rubric:: Example

>>> from singlecellmultiomics.molecule import MoleculeIterator
>>> from singlecellmultiomics.fragment import Fragment
>>> import pysam
>>> # Create SAM file to write some example reads to:
>>> test_sam = pysam.AlignmentFile('test.sam','w',reference_names=['chr1','chr2'],reference_lengths=[1000,1000])
>>> read_A = pysam.AlignedSegment(test_sam.header)
>>> read_A.set_tag('SM','CELL_1')
>>> read_A.set_tag('RX','CAT')
>>> read_A.reference_name = 'chr1'
>>> read_A.reference_start = 100
>>> read_A.query_sequence = 'ATCGGG'
>>> read_A.cigarstring = '6M'
>>> read_A.mapping_quality = 60
>>> # Create a second read which is a duplicate of the previous
>>> read_B = pysam.AlignedSegment(test_sam.header)
>>> read_B.set_tag('SM','CELL_1')
>>> read_B.set_tag('RX','CAT')
>>> read_B.reference_name = 'chr1'
>>> read_B.reference_start = 100
>>> read_B.query_sequence = 'ATCGG'
>>> read_B.cigarstring = '5M'
>>> read_B.mapping_quality = 60
>>> # Create a thids read which is belonging to another cell
>>> read_C = pysam.AlignedSegment(test_sam.header)
>>> read_C.set_tag('SM','CELL_2')
>>> read_C.set_tag('RX','CAT')
>>> read_C.reference_name = 'chr1'
>>> read_C.reference_start = 100
>>> read_C.query_sequence = 'ATCGG'
>>> read_C.cigarstring = '5M'
>>> read_C.mapping_quality = 60
>>> # Set up an iterable containing the reads:
>>> reads = [  read_A,read_B,read_C ]
>>> molecules = []
>>> for molecule in MoleculeIterator( reads ):
>>>     print(molecule)
Molecule
    with 2 assinged fragments
    Allele :No allele assigned
        Fragment:
        sample:CELL_1
        umi:CAT
        span:chr1 100-106
        strand:+
        has R1: yes
        has R2: no
        randomer trimmed: no
    Fragment:
        sample:CELL_1
        umi:CAT
        span:chr1 100-105
        strand:+
        has R1: yes
        has R2: no
        randomer trimmed: no
Molecule
        with 1 assinged fragments
        Allele :No allele assigned
            Fragment:
            sample:CELL_2
            umi:CAT
            span:chr1 100-105
            strand:+
            has R1: yes
            has R2: no
            randomer trimmed: no

In the next example the molecules overlapping with a single location on chromosome ‘1’ position 420000 are extracted Don’t forget to supply check_eject_every = None, this allows non-sorted data to be passed to the MoleculeIterator.

Example

>>> from singlecellmultiomics.bamProcessing import mate_pileup
>>> from singlecellmultiomics.molecule import MoleculeIterator
>>> with pysam.AlignmentFile('example.bam') as alignments:
>>>     for molecule in MoleculeIterator(
>>>         mate_pileup(alignments, contig='1', position=420000, check_eject_every=None)
>>>     ):
>>>         pass

Warning

Make sure the reads being supplied to the MoleculeIterator sorted by genomic coordinate! If the reads are not sorted set check_eject_every=None

get_molecule_cache_size()[source]
yield_func(molecule_to_be_emitted)[source]
class singlecellmultiomics.molecule.iterator.ReadIterator(handle, performProperPairCheck=True, ignore_collisions=False, **kwargs)[source]

Bases: pysamiterators.iterators.iterators.MatePairIterator

singlecellmultiomics.molecule.molecule module

class singlecellmultiomics.molecule.molecule.Molecule(fragments: Optional[Iterable[T_co]] = None, cache_size: int = 10000, reference: Union[pysam.libcfaidx.FastaFile, pysamiterators.iterators.iterators.CachedFasta] = None, min_max_mapping_quality: Optional[int] = None, mapability_reader: Optional[singlecellmultiomics.bamProcessing.bamFunctions.MapabilityReader] = None, allele_resolver: Optional[singlecellmultiomics.alleleTools.alleleTools.AlleleResolver] = None, max_associated_fragments=None, allele_assingment_method=1, **kwargs)[source]

Bases: object

Molecule class, contains one or more associated fragments

fragments

associated fragments

Type:list
spanStart

starting coordinate of molecule, None if not available

Type:int
spanEnd

ending coordinate of molecule, None if not available

Type:int
chromosome

mapping chromosome of molecule, None if not available

Type:str
cache_size

radius of molecule assignment cache

Type:int
reference

reference file, used to obtain base contexts and correct aligned_pairs iteration when the MD tag is not correct

Type:pysam.FastaFile
strand

mapping strand. True when strand is REVERSE False when strand is FORWARD None when strand is not determined

Type:bool
__finalise__()[source]

This function is called when all associated fragments have been gathered

__getitem__(index)[source]

Obtain a fragment belonging to this molecule.

Parameters:index (int) – index of the fragment [0 ,1 , 2 ..]
Returns:fragment (singlecellmultiomics.fragment.Fragment)
__iter__()[source]

Iterate over all associated fragments

Yields:singlecellmultiomics.fragment.Fragment
__len__()[source]

Obtain the amount of fragments associated to the molecule

add_fragment(fragment, use_hash=True)[source]

Associate a fragment with this Molecule

Parameters:fragment (singlecellmultiomics.fragment.Fragment) – Fragment to associate
Returns:Returns False when the fragments which have already been associated to the molecule refuse the fragment
Return type:has_been_added (bool)
Raises:OverflowError – when too many fragments have been associated already control this with .max_associated_fragments attribute
add_molecule(other)[source]

Merge other molecule into this molecule. Merges by assigning all fragments in other to this molecule.

aligned_length
allele
allele_confidence

Returns confidence(int) : a phred scalled confidence value for the allele assignment, returns zero when no allele is associated to the molecule

allele_informative_base_probabilities
allele_likelihoods

Per allele likelihood

Returns:{allele_name : likelihood}
Return type:likelihoods (dict)
allele_probabilities

Per allele probability

Returns:{allele_name : prob}
Return type:likelihoods (dict)
base_confidences
base_likelihoods
base_probabilities
calculate_allele_likelihoods()[source]
calculate_consensus(consensus_model, molecular_identifier, out, **model_kwargs)[source]

Create consensus read for molecule

Parameters:
  • consensus_model
  • molecular_identifier (str) – identier for this molecule, will be suffixed to the reference_id
  • out (pysam.AlingmentFile) – target bam file
  • **model_kwargs – arguments passed to the consensus model
can_be_split_into_allele_molecules
can_be_yielded(chromosome, position)[source]

Check if the molecule is far enough away from the supplied location to be ejected from a buffer.

Parameters:
  • chromosome (str) – chromosome / contig of location to test
  • position (int) – genomic location of location to test
Returns:

True when the molecule is far enough away from the supplied location to be ejected from a buffer.

Return type:

can_be_yielded (bool)

check_variants(variants, exclude_other_calls=True)[source]

Verify variants in molecule

Parameters:variants (pysam.VariantFile) – Variant file handle to extract variants from
Returns:{ (chrom,pos) : ( call (str) ): observations (int) }
Return type:dict (defaultdict( Counter ))
contains_valid_fragment()[source]

Check if an associated fragment exists which returns True for is_valid()

Returns:True when any associated fragment is_valid()
Return type:contains_valid_fragment (bool)
deduplicate_majority(target_bam, read_name, max_N_span=None)[source]
deduplicate_to_single(target_bam, read_name, classifier, reference=None)[source]

Deduplicate all reads associated to this molecule to a single pseudoread

Parameters:
  • target_bam (pysam.AlignmentFile) – file to associate the read with
  • read_name (str) – name of the pseudoread
  • classifier (sklearn classifier) – classifier for consensus prediction
Returns:

Pseudo-read containing aggregated information

Return type:

read (pysam.AlignedSegment)

deduplicate_to_single_CIGAR_spaced(target_bam, read_name, classifier=None, max_N_span=300, reference=None, **feature_matrix_args)[source]

Deduplicate all associated reads to a single pseudoread, when the span is larger than max_N_span the read is split up in multi-segments. Uncovered locations are spaced using N’s in the CIGAR.

Parameters:
  • target_bam (pysam.AlignmentFile) – file to associate the read with
  • read_name (str) – name of the pseudoread
  • classifier (sklearn classifier) – classifier for consensus prediction
Returns:

reads( list [ pysam.AlignedSegment ] )

deduplicate_to_single_CIGAR_spaced_from_dict(target_bam, read_name, base_call_dict, max_N_span=300)[source]

Deduplicate all associated reads to a single pseudoread, when the span is larger than max_N_span the read is split up in multi-segments. Uncovered locations are spaced using N’s in the CIGAR.

Parameters:
  • target_bam (pysam.AlignmentFile) – file to associate the read with
  • read_name (str) – name of the pseudoread
  • classifier (sklearn classifier) – classifier for consensus prediction
Returns:

reads( list [ pysam.AlignedSegment ] )

estimated_max_length

Obtain the estimated size of the fragment, returns None when estimation is not possible Takes into account removed bases (R2) Assumes inwards sequencing orientation

extract_stretch_from_dict(base_call_dict, alignment_start, alignment_end)[source]
generate_partial_reads(obs, max_N_span=None)[source]
get_CIGAR(reference=None)[source]

Get alignment of all associated reads

Returns:reference bases CIGAR : alignment of feature matrix to reference tuples (operation, count) reference(pysam.FastaFile) : reference to fetch reference bases from, if not supplied the MD tag is used
Return type:y
get_a_reference_id()[source]

Obtain a reference id for a random associated mapped read

get_aligned_blocks()[source]

get all consecutive blocks of aligned reference positions

Returns:[ (start, end), (start, end) ]
Return type:sorted list of aligned blocks (list)
get_aligned_reference_bases_dict()[source]

Get dictionary containing all reference bases to which this molecule aligns :returns: { (chrom,pos) : ‘A’, (chrom,pos):’T’, .. } :rtype: aligned_reference_positions (dict)

get_alignment_stats()[source]

Get dictionary containing mean amount clip/insert/deletion/matches per base

Returns:
dictionary {
clips_per_bp(int), deletions_per_bp(int), matches_per_bp(int), inserts_per_bp(int), alternative_hits_per_read(int),

}

Return type:cigar_stats(dict)
get_alignment_tensor(max_reads, window_radius=20, centroid=None, mask_centroid=False, refence_backed=False, skip_missing_reads=False)[source]

Obtain a tensor representation of the molecule alignment around the given centroid

Parameters:
  • max_reads (int) – maximum amount of reads returned in the tensor, this will be the amount of rows/4 of the returned feature matrix
  • window_radius (int) – radius of bp around centroid
  • centroid (int) – center of extracted window, when not specified the cut location of the molecule is used
  • mask_centroid (bool) – when True, mask reference base at centroid with N
  • refence_backed (bool) – when True the molecules reference is used to emit reference bases instead of the MD tag
Returns:

(4*window_radius*2*max_reads) dimensional feature matrix

Return type:

tensor_repr(np.array)

get_allele(allele_resolver=None, return_allele_informative_base_dict=False)[source]

Obtain the allele(s) this molecule maps to

Parameters:
  • allele_resolver (singlecellmultiomics.alleleTools.AlleleResolver) – resolver used
  • return_allele_informative_base_dict (bool) – return dictionary containing the bases used for allele determination
  • defaultdict(list,
  • {'allele1' – [(‘chr18’, 410937, ‘T’), (‘chr18’, 410943, ‘G’), (‘chr18’, 410996, ‘G’), (‘chr18’, 411068, ‘A’)]})
Returns:

Set of strings containing associated alleles

Return type:

alleles(set( str ))

get_allele_likelihoods()[source]

Obtain the allele(s) this molecule maps to

Parameters:
  • allele_resolver (singlecellmultiomics.alleleTools.AlleleResolver) – resolver used
  • return_allele_informative_base_dict (bool) – return dictionary containing the bases used for allele determination
  • defaultdict(list,
  • {'allele1'

    [ (‘chr18’, 410937, ‘T’), (‘chr18’, 410943, ‘G’), (‘chr18’, 410996, ‘G’), (‘chr18’, 411068, ‘A’)]})

Returns:

likelihood, ‘allele_b’:likelihood }

Return type:

{ ‘allele_a’

get_barcode_sequences()[source]

Obtain (Cell) barcode sequences associated to molecule

Returns:barcode sequence(s)
Return type:barcode sequences (set)
get_base_calling_feature_matrix(return_ref_info=False, start=None, end=None, reference=None, NUC_RADIUS=1, USE_RT=True, select_read_groups=None)[source]

Obtain feature matrix for base calling

Parameters:
  • return_ref_info (bool) – return both X and array with feature information
  • start (int) – start of range, genomic position
  • end (int) – end of range (inclusive), genomic position
  • reference (pysam.FastaFile) – reference to fetch reference bases from, if not supplied the MD tag is used
  • NUC_RADIUS (int) – generate kmer features target nucleotide
  • USE_RT (bool) – use RT reaction features
  • select_read_groups (set) – only use reads from these read groups to generate features
get_base_calling_feature_matrix_spaced[source]

Obtain a base-calling feature matrix for all reference aligned bases.

Returns:feature matrix y : reference bases CIGAR : alignment of feature matrix to reference tuples (operation, count) reference(pysam.FastaFile) : reference to fetch reference bases from, if not supplied the MD tag is used
Return type:X
get_base_calling_training_data(mask_variants=None, might_be_variant_function=None, reference=None, **feature_matrix_args)[source]
get_base_confidence_dict()[source]

Get dictionary containing base calls per position and the corresponding confidences

Returns:(contig (str), position (int) ) : base (str) : prob correct (list)
Return type:obs (dict)
get_base_observation_dict(return_refbases=False, allow_N=False, allow_unsafe=True, one_call_per_frag=False, min_cycle_r1=None, max_cycle_r1=None, min_cycle_r2=None, max_cycle_r2=None, use_cache=True, min_bq=None)[source]

Obtain observed bases at reference aligned locations

Parameters:
  • return_refbases (bool) – return both observed bases and reference bases
  • allow_N (bool) – Keep N base calls in observations
  • min_cycle_r1 (int) – Exclude read 1 base calls with a cycle smaller than this value (excludes bases which are trimmed before mapping)
  • max_cycle_r1 (int) – Exclude read 1 base calls with a cycle larger than this value (excludes bases which are trimmed before mapping)
  • min_cycle_r2 (int) – Exclude read 2 base calls with a cycle smaller than this value (excludes bases which are trimmed before mapping)
  • max_cycle_r2 (int) – Exclude read 2 base calls with a cycle larger than this value (excludes bases which are trimmed before mapping)
Returns:

base (string) : obs (int) } and { genome_location (tuple) : base (string) if return_refbases is True }

Return type:

{ genome_location (tuple)

get_base_observation_dict_NOREF(allow_N=False)[source]

identical to get_base_observation_dict but does not obtain reference bases, has to be used when no MD tag is present :param return_refbases: return both observed bases and reference bases :type return_refbases: bool

Returns:base (string) : obs (int) } and { genome_location (tuple) : base (string) if return_refbases is True }
Return type:{ genome_location (tuple)
get_consensus(dove_safe: bool = False, only_include_refbase: str = None, allow_N=False, with_probs_and_obs=False, **get_consensus_dictionaries_kwargs)[source]

Obtain consensus base-calls for the molecule

get_consensus_base(contig, position, classifier=None)[source]

Obtain base call at single position of the molecule

Parameters:
  • contig (str) – contig to extract base call from
  • position (int) – position to extract base call from (zero based)
  • classifier (obj) – base calling classifier
Returns:

base call, or None when no base call could be made

Return type:

base_call (str)

get_consensus_base_frequencies(allow_N=False)[source]

Obtain the frequency of bases in the molecule consensus sequence

Returns:Counter containing base frequecies, for example: { ‘A’:10,’T’:3, C:4 }
Return type:base_frequencies (Counter)
get_consensus_gc_ratio()[source]

Obtain the GC ratio of the molecule consensus sequence

Returns:GC ratio
Return type:gc_ratio(float)
get_consensus_old(base_obs=None, classifier=None, store_consensus=True, reuse_cached_consensus=True, allow_unsafe=False, allow_N=False)[source]

Get dictionary containing consensus calls in respect to reference. By default mayority voting is used to determine the consensus base. If a classifier is supplied the classifier is used to determine the consensus base.

Parameters:
  • base_obs (defaultdict(Counter)) – { genome_location (tuple) : base (string) : obs (int) }
  • classifier – fitted classifier to use for consensus calling. When no classifier is provided the consensus is determined by majority voting
  • store_consensus (bool) – Store the generated consensus for re-use
Returns:

{location : base}

Return type:

consensus (dict)

get_consensus_read(target_file, read_name, consensus=None, phred_scores=None, cigarstring=None, mdstring=None, start=None, supplementary=False)[source]

get pysam.AlignedSegment containing aggregated molecule information

Parameters:
  • target_file (pysam.AlignmentFile) – File to create the read for
  • read_name (str) – name of the read to write
Returns:

read(pysam.AlignedSegment)

get_cut_site()[source]

For restriction based protocol data, obtain genomic location of cut site

Returns:None if site is not available

chromosome (str) position (int) strand (bool)

get_dedup_reads(read_name, target_bam, obs, max_N_span=None)[source]
get_empty_clone(fragments=None)[source]
get_feature_vector()[source]

Obtain a feature vector representation of the molecule

Returns:feature_vector(np.array)
get_fragment_span_sequence(reference=None)[source]
get_html(reference=None, consensus=None, show_reference_sequence=True, show_consensus_sequence=True, reference_bases=None)[source]

Get html representation of the molecule :returns: Html representation of the molecule :rtype: html_rep(str)

get_match_mismatch_frequency(ignore_locations=None)[source]
Get amount of base-calls matching and mismatching the reference sequence,
mismatches in every read are counted
Parameters:ignore_locations (iterable(tuple([chrom(str),pos(int)]))) – Locations not to take into account for the match and mismatch frequency
Returns:matches(int), mismatches(int)
get_max_mapping_qual()[source]

Get max mapping quality of the molecule :returns: max_mapping_qual (float)

get_mean_base_quality(chromosome, position, base=None, not_base=None)[source]

Get the mean phred score at the supplied coordinate and base-call

Parameters:
  • chromosome (str) –
  • position (int) –
  • base (str) – select only reads with this base
  • not_base (str) – select only reads without this base
Returns:

mean_phred_score (float)

get_mean_cycle(chromosome, position, base=None, not_base=None)[source]

Get the mean cycle at the supplied coordinate and base-call

Parameters:
  • chromosome (str) –
  • position (int) –
  • base (str) – select only reads with this base
  • not_base (str) – select only reads without this base
Returns:

mean cycle for R1 and R2

Return type:

mean_cycles (tuple)

get_mean_mapping_qual()[source]

Get mean mapping quality of the molecule

Returns:mean_mapping_qual (float)
get_mean_rt_fragment_size()[source]

Obtain the mean RT reaction fragment size

Returns:mean_rt_size (float)
get_methylated_count(context=3)[source]

Get the total amount of methylated bases

Parameters:context (int) – 3 or 4 base context
Returns:sum of methylated bases in contexts
Return type:r (Counter)
get_methylation_dict()[source]

Obtain methylation dictionary

Returns:(read.reference_name, rpos) : times seen methylated
methylated_state (dict):
{(read.reference_name, rpos) : 1/0/-1 } 1 for methylated 0 for unmethylated -1 for unknown
Return type:methylated_positions (Counter)
get_min_mapping_qual() → float[source]

Get min mapping quality of the molecule :returns: min_mapping_qual (float)

get_raw_barcode_sequences()[source]

Obtain (Cell) barcode sequences associated to molecule, not hamming corrected

Returns:barcode sequence(s)
Return type:barcode sequences (set)
get_rt_reaction_fragment_sizes(max_N_distance=1)[source]

Obtain all RT reaction fragment sizes :returns: rt_sizes (list of ints)

get_rt_reactions() → dict[source]

Obtain RT reaction dictionary

Returns:{(primer,pos) : [fragment, fragment..] }
Return type:rt_dict (dict)
get_safely_aligned_length()[source]

Get the amount of safely aligned bases (excludes primers) :returns:

Amount of safely aligned bases
or None when this cannot be determined because both mates are not mapped
Return type:aligned_bases (int)
get_sample()[source]

Obtain sample

Returns:Sample associated with the molecule. Usually extracted from SM tag. Calls fragment.get_sample() to obtain the sample
Return type:sample (str)
get_span_sequence(reference=None)[source]

Obtain the sequence between the start and end of the molecule :param reference: reference to use.

If not specified self.reference is used
Returns:sequence (str)
get_strand()[source]

Obtain mapping strand of molecule

Returns:
True,False,None
True when strand is REVERSE False when strand is FORWARD None when strand is not determined
Return type:strand
get_strand_repr(unknown='?')[source]

Get string representation of mapping strand

Parameters:unknown (str) – set what character/string to return when the strand is not available
Returns:
  • forward, - reverse, ? unknown
Return type:strand_repr (str)
get_tag_counter()[source]

Obtain a dictionary with tag -> value -> frequency

Returns:{ tag(str) : { value(int/str): frequency:(int) }
Return type:tag_obs (defaultdict(Counter))
get_umi()[source]

Obtain umi of molecule

Returns:return main umi associated with this molecule
Return type:umi (str)
get_umi_error_rate()[source]

Obtain fraction of fragments that are associated to the molecule with a exact matching UMI vs total amount of associated fragments :returns: exact_matching_fraction (float)

has_valid_span()[source]

Check if the span of the molecule is determined

Returns:has_valid_span (bool)
is_completely_matching

checks if all cigar operations are M, Returns True when all cigar operations are M, False otherwise

Type:Checks if all associated reads are completely mapped
is_multimapped()[source]

Check if the molecule is multimapping

Returns:True when multimapping
Return type:is_multimapped (bool)
is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

iter_reads()[source]

Iterate over all associated reads :returns: generator (pysam.AlignedSegment)

library

Associated library

Returns:Library associated with the first read of this molecule
Return type:library (str)
set_meta(tag, value)[source]

Set meta information to all fragments

Parameters:
  • tag (str) – 2 letter tag
  • value – value to set
set_methylation_call_tags(call_dict, bismark_call_tag='XM', total_methylated_tag='MC', total_unmethylated_tag='uC', total_methylated_CPG_tag='sZ', total_unmethylated_CPG_tag='sz', total_methylated_CHH_tag='sH', total_unmethylated_CHH_tag='sh', total_methylated_CHG_tag='sX', total_unmethylated_CHG_tag='sx', reads=None)[source]

Set methylation call tags given a methylation dictionary

This method sets multiple tags in every read associated to the molecule. The tags being set are the bismark_call_tag, every aligned base is annotated with a zZxXhH or “.”, and a tag for both the total methylated C’s and unmethylated C’s

Parameters:
  • call_dict (dict) – Dictionary containing bismark calls (chrom,pos) : {‘context’:letter,’reference_base’: letter , ‘consensus’: letter, optiona: ‘qual’: pred_score (int) }
  • bismark_call_tag (str) – tag to write bismark call string
  • total_methylated_tag (str) – tag to write total methylated bases
  • total_unmethylated_tag (str) – tag to write total unmethylated bases
  • reads (iterable) – reads to write the tags to, when not supplied, the tags are written to all associated reads
Returns:

can_be_yielded (bool)

set_rejection_reason(reason, set_qcfail=False)[source]

Add rejection reason to all fragments associated to this molecule

Parameters:
  • reason (str) – rejection reason to set
  • set_qcfail (bool) – set qcfail bit to True for all associated reads
span_len
split_into_allele_molecules()[source]

Split this molecule into multiple molecules, associated to multiple alleles :returns: list :rtype: list_of_molecules

update_mapability(set_mq_zero=False)[source]

Update mapability of this molecule. mapping qualities are set to 0 if the mapability_reader returns False for site_is_mapable

The mapability_reader can be set when initiating the molecule, or added later.

Parameters:
  • set_mq_zero (bool) – set mapping quality of associated reads to 0 when the
  • reader returns a bad verdict (mappability) –

Tip

Use createMapabilityIndex.py to create an index to feed to the mapability_reader

update_umi()[source]

Set UMI sets self.umi (str) sets the most common umi associated to the molecule

write_allele_phasing_information_tag(allele_resolver=None, tag='ap', reads=None)[source]

Write allele phasing information to ap tag

For every associated read a tag wil be written containing: chromosome,postion,base,allele_name|chromosome,postion,base,allele_name|… for all variants found by the AlleleResolver

write_pysam(target_file, consensus=False, no_source_reads=False, consensus_name=None, consensus_read_callback=None, consensus_read_callback_kwargs=None)[source]

Write all associated reads to the target file

Parameters:
  • target_file (pysam.AlignmentFile) – Target file
  • consensus (bool) – write consensus
  • no_source_reads (bool) – while in consensus mode, don’t write original reads
  • consensus_read_callback (function) – this function is called with every consensus read as an arguments
  • consensus_read_callback_kwargs (dict) – arguments to pass to the callback function
write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

singlecellmultiomics.molecule.molecule.consensii_default_vector()[source]

a numpy vector with 5 elements initialsed as zeros

singlecellmultiomics.molecule.molecule.detect_alleles(molecules, contig, position, min_cell_obs=3, base_confidence_threshold=None, classifier=None)[source]

Detect the alleles (variable bases) present at the selected location

Parameters:
  • molecules – generator to extract molecules from
  • variant_location (tuple) – (contig, position) zero based location of the location to test
  • min_cell_obs (int) – minimum amount of cells containing the allele to be emitted
  • confidence_threshold (float) – minimum confidence of concensus base-call to be taken into account
  • classifier (obj) – classifier used for consensus call, when no classifier is supplied a mayority vote is used
singlecellmultiomics.molecule.molecule.get_variant_phase(molecules, contig, position, variant_base, allele_resolver, phasing_ratio_threshold=None)[source]
singlecellmultiomics.molecule.molecule.might_be_variant[source]

Returns True if a variant exists at the given coordinate

singlecellmultiomics.molecule.molecule.molecule_to_random_primer_dict(molecule, primer_length=6, primer_read=2, max_N_distance=0)[source]

singlecellmultiomics.molecule.nlaIII module

class singlecellmultiomics.molecule.nlaIII.AnnotatedNLAIIIMolecule(fragment, features, **kwargs)[source]

Bases: singlecellmultiomics.molecule.featureannotatedmolecule.FeatureAnnotatedMolecule, singlecellmultiomics.molecule.nlaIII.NlaIIIMolecule

Nla III based Molecule which is annotated with features (genes/exons/introns, .. )

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • features (singlecellmultiomics.features.FeatureContainer) – container to use to obtain features from
  • **kwargs – extra args
write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.nlaIII.NlaIIIMolecule(fragment, site_has_to_be_mapped=False, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

Nla III based Molecule class

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • **kwargs – extra args
get_fragment_span_sequence(reference=None)[source]

Obtain the sequence between the start and end of the molecule :param reference: reference to use.

If not specified self.reference is used
Returns:sequence (str)
get_undigested_site_count(reference=None)[source]

Obtain the amount of undigested sites in the span of the molecule

Parameters:reference (pysam.FastaFile) – reference handle
Returns:int amount of undigested cut sites in the mapping span of the molecule
Return type:undigested_site_count
Raises:ValueError – when the span of the molecule is not properly defined
is_valid(set_rejection_reasons=False, reference=None)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads, call_super=True)[source]

Write molecule information to the supplied reads as BAM tags

singlecellmultiomics.molecule.rna module

class singlecellmultiomics.molecule.rna.VASA(fragment, features, **kwargs)[source]

Bases: singlecellmultiomics.molecule.featureannotatedmolecule.FeatureAnnotatedMolecule

VASA seq molecule

Parameters:
  • fragments (singlecellmultiomics.fragment.Fragment) – Fragments to associate to the molecule
  • features (singlecellmultiomics.features.FeatureContainer) – container to use to obtain features from
  • **kwargs – extra args

singlecellmultiomics.molecule.taps module

class singlecellmultiomics.molecule.taps.AnnotatedTAPSCHICMolecule(fragments=None, features=None, taps_strand='R', taps=None, **kwargs)[source]

Bases: singlecellmultiomics.molecule.chic.AnnotatedCHICMolecule, singlecellmultiomics.molecule.taps.TAPSMolecule

Molecule class for combined TAPS, CHIC and transcriptome

is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.taps.AnnotatedTAPSNlaIIIMolecule(fragments=None, features=None, taps=None, **kwargs)[source]

Bases: singlecellmultiomics.molecule.nlaIII.AnnotatedNLAIIIMolecule, singlecellmultiomics.molecule.taps.TAPSMolecule

Molecule class for combined TAPS, NLAIII and transcriptome

is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.taps.TAPS(reference=None, reference_variants=None, taps_strand='F', **kwargs)[source]

Bases: object

molecule_to_context_call_dict(molecule)[source]

Extract bismark call_string dictionary from a molecule

Parameters:molecule (singlecellmultiomics.molecule.Molecule) – molecule to extract calls from
Returns:{(chrom,pos):bismark call letter}
Return type:context_call_dict(dict)
overlap_tag = None

z unmethylated C in CpG context (CG) Z methylated C in CpG context (CG) x unmethylated C in CHG context ( C[ACT]G ) X methylated C in CHG context ( C[ACT]G ) h unmethylated C in CHH context ( C[ACT][ACT] ) H methylated C in CHH context ( C[ACT][ACT] )

position_to_context(chromosome, position, ref_base, observed_base='N', strand=None, reference=None)[source]

Extract bismark call letter from a chromosomal location given the observed base

Parameters:
  • chromosome (str) – chromosome / contig of location to test
  • position (int) – genomic location of location to test
  • observed_base (str) – base observed in the read
  • strand (int) – mapping strand, False: forward, True: Reverse
Returns:

3 basepair context bismark_letter(str) : bismark call

Return type:

context(str)

class singlecellmultiomics.molecule.taps.TAPSCHICMolecule(fragments=None, taps=None, taps_strand='R', **kwargs)[source]

Bases: singlecellmultiomics.molecule.chic.CHICMolecule, singlecellmultiomics.molecule.taps.TAPSMolecule

Molecule class for combined TAPS and CHIC

is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.taps.TAPSMolecule(fragments=None, taps=None, classifier=None, taps_strand='F', allow_unsafe_base_calls=False, methylation_consensus_kwargs=None, **kwargs)[source]

Bases: singlecellmultiomics.molecule.molecule.Molecule

add_cpg_color_tag_to_read(read)[source]
is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

obtain_methylation_calls(**get_consensus_dictionaries_kwargs)[source]

This methods returns a methylation call dictionary

Returns:(chrom,pos) : {‘consensus’: consensusBase, ‘reference’: referenceBase, ‘call’: call}
Return type:mcalls(dict)
write_tags_to_psuedoreads(reads, call_super=True)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.taps.TAPSNlaIIIMolecule(fragments=None, taps=None, taps_strand='R', **kwargs)[source]

Bases: singlecellmultiomics.molecule.nlaIII.NlaIIIMolecule, singlecellmultiomics.molecule.taps.TAPSMolecule

Molecule class for combined TAPS and NLAIII

is_valid(set_rejection_reasons=False)[source]

Check if the molecule is valid All of the following requirements should be met: - no multimapping - no low mapping mapping_quality (Change molecule.min_max_mapping_quality to set the threshold) - molecule is associated with at least one valid fragment

Parameters:
  • set_rejection_reasons (bool) – When set to True, all reads get a
  • reason (rejection) –
Returns:

True when all requirements are met, False otherwise

Return type:

is_valid (bool)

write_tags()[source]

Write BAM tags to all reads associated to this molecule

This function sets the following tags:
  • mI : most common umi
  • DA : allele
  • af : amount of associated fragments
  • rt : rt_reaction_index
  • rd : rt_duplicate_index
  • TR : Total RT reactions
  • ap : phasing information (if allele_resolver is set)
  • TF : total fragments
  • ms : size of the molecule (largest fragment)
write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

class singlecellmultiomics.molecule.taps.TAPSPTaggedMolecule(fragments=None, features=None, taps=None, **kwargs)[source]

Bases: singlecellmultiomics.molecule.taps.AnnotatedTAPSNlaIIIMolecule

write_tags_to_psuedoreads(reads)[source]

Write molecule information to the supplied reads as BAM tags

singlecellmultiomics.molecule.taps.strip_array_tags(molecule)[source]

Module contents