singlecellmultiomics.universalBamTagger package

Submodules

singlecellmultiomics.universalBamTagger.4SUtagger module

singlecellmultiomics.universalBamTagger.4SUtagger.substitution_plot_stranded(pattern_counts: dict, figsize: tuple = (12, 4), conversion_colors: tuple = ('b', 'k', 'r', 'grey', 'g', 'pink', 'b', 'k', 'r', 'k', 'w', 'g'), ylabel: str = '# conversions per molecule', add_main_group_labels: bool = True, ax=None, fig=None, **plot_args)[source]

Create 3bp substitution plot

Parameters:
  • pattern_counts (OrderedDict) –

    Dictionary containing the substitutions to plot. Use variants.vcf_to_variant_contexts to create it. Format: ```OrderedDict([((‘ACA’, ‘A’), 0),

    ((‘ACC’, ‘A’), 1), ((‘ACG’, ‘A’), 0), … ((‘TTG’, ‘G’), 0), ((‘TTT’, ‘G’), 0)])```
  • figsize (tuple) – size of the figure to create
  • conversion_colors (tuple) – colors to use for the conversion groups
  • ylabel (str) – y axis label
  • add_main_group_labels (bool) –

    Add conversion group labels to top of plot

    **plot_args : Additional argument to pass to .plot()

Returns
fig : handle to the figure ax : handle to the axis

Example

>>> from singlecellmultiomics.variants import vcf_to_variant_contexts, substitution_plot
>>> import matplotlib.pyplot as plt
>>> pobs = vcf_to_variant_contexts('variants.vcf.gz', 'reference.fasta')
>>> for sample, conversions in pobs.items():
>>>     fig, ax = substitution_plot(conversions)
>>>     ax.set_title(sample)
>>>     plt.show()

singlecellmultiomics.universalBamTagger.bamtagmultiome module

singlecellmultiomics.universalBamTagger.bamtagmultiome.run_multiome_tagging(args)[source]

Run multiome tagging adds molecule information

Parameters:
  • bamin (str) – bam file to process, sam files can also be supplied but will be converted
  • o (str) – path to output bam file
  • method (str) – Protocol to tag, select from:nla, qflag, chic, nla_transcriptome, vasa, cs, nla_taps ,chic_taps, nla_no_overhang, scartrace
  • qflagger (str) – Query flagging algorithm to use, this algorithm extracts UMI and sample information from your reads. When no query flagging algorithm is specified, the singlecellmultiomics.universalBamTagger.universalBamTagger.QueryNameFlagger is used
  • method – Method name, what kind of molecules need to be extracted. Select from: nla (Data with digested by Nla III enzyme) qflag (Only add basic tags like sampple and UMI, no molecule assignment) chic (Data digested using mnase fusion) nla_transcriptome (Data with transcriptome and genome digested by Nla III ) vasa (VASA transcriptomic data) cs (CELseq data, 1 and 2) cs_feature_counts (Single end, deduplicate using a bam file tagged using featurecounts, deduplicates a umi per gene) fl_feature_counts (deduplicate using a bam file tagged using featurecounts, deduplicates based on fragment position) nla_taps (Data with digested by Nla III enzyme and methylation converted by TAPS) chic_taps (Data with digested by mnase enzyme and methylation converted by TAPS), chic_taps_transcriptome (Same as chic_taps, but includes annotations) chic_nla scartrace (lineage tracing protocol)
  • custom_flags (str) – Arguments passed to the query name flagger, comma separated “MI,RX,bi,SM”
  • ref (str) – Path to reference fasta file, autodected from bam header when not supplied
  • umi_hamming_distance (int) – Max hamming distance on UMI’s
  • head (int) – Amount of molecules to process
  • contig (str) – only process this contig
  • region_start (int) – Zero based start coordinate of single region to process
  • region_end (int) – Zero based end coordinate of single region to process, None: all contigs when contig is not set, complete contig when contig is set.
  • alleles (str) – path to allele VCF
  • allele_samples (str) – Comma separated samples to extract from the VCF file. For example B6,SPRET
  • unphased_alleles (str) – Path to VCF containing unphased germline SNPs
  • mapfile (str) – ‘Path to *.safe.bgzf file, used to decide if molecules are uniquely mappable, generate one using createMapabilityIndex.py
  • annotmethod (int) – Annotation resolving method. 0: molecule consensus aligned blocks. 1: per read per aligned base
  • cluster (bool) – Run contigs in separate cluster jobs
  • resolve_unproperly_paired_reads (bool) – When enabled bamtagmultiome will look through the complete bam file in a hunt for the mate, the two mates will always end up in 1 molecule if both present in the bam file. This also works when the is_proper_pair bit is not set. Use this option when you want to find the breakpoints of genomic re-arrangements.
  • no_rejects (bool) – Do not write rejected reads
  • mem (int) – Amount of gigabytes to request for cluster jobs
  • time (int) – amount of wall clock hours to request for cluster jobs
  • exons (str) – Path to exon annotation GTF file
  • introns (str) – Path to intron annotation GTF file
  • consensus (bool) – Calculate molecule consensus read, this feature is _VERY_ experimental
  • consensus_model (str) – Path to consensus calling model, when none specified, this is learned based on the supplied bam file, ignoring sites supplied by -consensus_mask_variants
  • consensus_mask_variants (str) – Path VCF file masked for training on consensus caller
  • consensus_n_train (int) – Amount of bases used for training the consensus model
  • no_source_reads (bool) – Do not write original reads, only consensus
  • scartrace_r1_primers (str) – comma separated list of R1 primers used in scartrace protocol
singlecellmultiomics.universalBamTagger.bamtagmultiome.run_multiome_tagging_cmd(commandline)[source]
singlecellmultiomics.universalBamTagger.bamtagmultiome.tag_multiome_multi_processing(input_bam_path: str, out_bam_path: str, molecule_iterator: Generator[T_co, T_contra, V_co] = None, molecule_iterator_args: dict = None, ignore_bam_issues: bool = False, head: int = None, no_source_reads: bool = False, fragment_size: int = None, blacklist_path: str = None, bp_per_job: int = None, bp_per_segment: int = None, temp_folder_root: str = '/tmp/scmo', max_time_per_segment: int = None, use_pool: bool = True, one_contig_per_process: bool = False, additional_args: dict = None, n_threads=None, job_bed_file: str = None)[source]
singlecellmultiomics.universalBamTagger.bamtagmultiome.tag_multiome_single_thread(input_bam_path, out_bam_path, molecule_iterator=None, molecule_iterator_args=None, consensus_model=None, consensus_model_args={}, ignore_bam_issues=False, head=None, no_source_reads=False)[source]
singlecellmultiomics.universalBamTagger.bamtagmultiome.write_status(output_path, message)[source]

singlecellmultiomics.universalBamTagger.customreads module

class singlecellmultiomics.universalBamTagger.customreads.BulkFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

digest(reads)[source]
class singlecellmultiomics.universalBamTagger.customreads.CustomAssingmentQueryNameFlagger(block_assignments, **kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

This query name flagger converts values between colons “:” to tags

digest(reads)[source]
class singlecellmultiomics.universalBamTagger.customreads.VaninsbergheQueryNameFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

digest(reads)[source]

singlecellmultiomics.universalBamTagger.digest module

class singlecellmultiomics.universalBamTagger.digest.DigestFlagger(reference=None, alleleResolver=None, moleculeRadius=0, verbose=False, **kwargs)[source]

Bases: object

addAlleleInfo(reads)[source]
appendTag(read, tag, value)[source]
getTotalConsumption()[source]
increaseAndRecordOversequencing(sample, chrom, pos, siteInfo=())[source]
setAllele(read, allele)[source]
setFragmentSize(read, size)[source]
setFragmentTrust(read, start, end)[source]
setLigationSite(read, site)[source]
setRandomPrimer(R1, R2, hstart, hseq)[source]
setRecognizedSequence(read, sequence)[source]
setRejectionReason(read, reason)[source]
setSiteCoordinate(read, coordinate)[source]
setSiteOversequencing(read, moleculeIndex=1)[source]
setSource(read, source)[source]
setStrand(read, strand)[source]
class singlecellmultiomics.universalBamTagger.digest.RangeCache(maxRange=1200)[source]

Bases: object

get(chrom, pos)[source]
getWithinRange(chrom, center, radius)[source]
purge(chrom, pos)[source]
set(chrom, pos, data)[source]

singlecellmultiomics.universalBamTagger.mspjI module

class singlecellmultiomics.universalBamTagger.mspjI.MSPJIFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

addSite(reads, strand, context, restrictionChrom, restrictionPos, ligationSite)[source]
digest(reads)[source]

singlecellmultiomics.universalBamTagger.nlaIII module

class singlecellmultiomics.universalBamTagger.nlaIII.NlaIIIFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

addSite(reads, strand, restrictionChrom, restrictionPos)[source]
digest(reads)[source]

singlecellmultiomics.universalBamTagger.rna module

class singlecellmultiomics.universalBamTagger.rna.RNA_Flagger(reference=None, alleleResolver=None, moleculeRadius=0, verbose=False, exon_gtf=None, intron_gtf=None, **kwargs)[source]

Bases: object

digest(reads)[source]

singlecellmultiomics.universalBamTagger.scar module

class singlecellmultiomics.universalBamTagger.scar.ScarFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

addSite(reads, scarChromosome, scarPrimerStart)[source]
digest(reads)[source]

singlecellmultiomics.universalBamTagger.scchic module

class singlecellmultiomics.universalBamTagger.scchic.ChicSeqFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

addSite(reads, strand, restrictionChrom, restrictionPos, is_trimmed=False)[source]
digest(reads)[source]

singlecellmultiomics.universalBamTagger.tag module

class singlecellmultiomics.universalBamTagger.tag.TagFlagger(tag=None, **kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

addSite(reads, strand, siteDef)[source]
digest(reads)[source]

singlecellmultiomics.universalBamTagger.taps module

class singlecellmultiomics.universalBamTagger.taps.TAPSFlagger(reference, **kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

digest(reads)[source]

singlecellmultiomics.universalBamTagger.tapsTabulator module

singlecellmultiomics.universalBamTagger.tapsTabulator.finish_bam(output, args, temp_out)[source]

singlecellmultiomics.universalBamTagger.tapsTagger module

class singlecellmultiomics.universalBamTagger.tapsTagger.Fraction[source]

Bases: object

singlecellmultiomics.universalBamTagger.universalBamTagger module

class singlecellmultiomics.universalBamTagger.universalBamTagger.AlleleTagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

digest(reads)[source]
class singlecellmultiomics.universalBamTagger.universalBamTagger.MoleculeIterator_OLD(alignmentfile, look_around_radius=100000, umi_hamming_distance=0, sample_select=None, **pysam_kwargs)[source]

Bases: object

Iterate over molecules in a bam file

Parameters:
  • alignmentfile (pysam.AlignmentFile) – file to read the molecules from
  • look_around_radius (int) – buffer to accumulate molecules in. All fragments belonging to one molecule should fit this radius
  • umi_hamming_distance (int) – Edit distance on UMI, 0: only exact match, 1: single base distance
  • sample_select (iterable) – Iterable of samples to only select molecules from
Yields:
  • list of molecules (list [ pysam.AlignedSegment ])
  • [ (R1,R2), (R1,R2) … ]
assignment_function(fragment)[source]
eq_function(assignment_a, assignment_b)[source]
get_cached_fragment_count()[source]
get_fragment_chromosome(fragment)[source]
localisation_function(fragment)[source]
sample_assignment_function(fragment)[source]
class singlecellmultiomics.universalBamTagger.universalBamTagger.QueryNameFlagger(**kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.digest.DigestFlagger

digest(reads)[source]
class singlecellmultiomics.universalBamTagger.universalBamTagger.TranscriptIterator(look_around_radius=100, informative_read=2, assignment_radius=10, **kwargs)[source]

Bases: singlecellmultiomics.universalBamTagger.universalBamTagger.MoleculeIterator_OLD

assignment_function(fragment)[source]
localisation_function(fragment)[source]
singlecellmultiomics.universalBamTagger.universalBamTagger.molecule_to_random_primer_dict(molecule, primer_length=6, primer_read=2)[source]

Module contents