singlecellmultiomics.universalBamTagger package¶
Submodules¶
singlecellmultiomics.universalBamTagger.4SUtagger module¶
-
singlecellmultiomics.universalBamTagger.4SUtagger.
substitution_plot_stranded
(pattern_counts: dict, figsize: tuple = (12, 4), conversion_colors: tuple = ('b', 'k', 'r', 'grey', 'g', 'pink', 'b', 'k', 'r', 'k', 'w', 'g'), ylabel: str = '# conversions per molecule', add_main_group_labels: bool = True, ax=None, fig=None, **plot_args)[source]¶ Create 3bp substitution plot
Parameters: - pattern_counts (OrderedDict) –
Dictionary containing the substitutions to plot. Use variants.vcf_to_variant_contexts to create it. Format: ```OrderedDict([((‘ACA’, ‘A’), 0),
((‘ACC’, ‘A’), 1), ((‘ACG’, ‘A’), 0), … ((‘TTG’, ‘G’), 0), ((‘TTT’, ‘G’), 0)])``` - figsize (tuple) – size of the figure to create
- conversion_colors (tuple) – colors to use for the conversion groups
- ylabel (str) – y axis label
- add_main_group_labels (bool) –
Add conversion group labels to top of plot
**plot_args : Additional argument to pass to .plot()
- Returns
- fig : handle to the figure ax : handle to the axis
Example
>>> from singlecellmultiomics.variants import vcf_to_variant_contexts, substitution_plot >>> import matplotlib.pyplot as plt >>> pobs = vcf_to_variant_contexts('variants.vcf.gz', 'reference.fasta') >>> for sample, conversions in pobs.items(): >>> fig, ax = substitution_plot(conversions) >>> ax.set_title(sample) >>> plt.show()
- pattern_counts (OrderedDict) –
singlecellmultiomics.universalBamTagger.bamtagmultiome module¶
-
singlecellmultiomics.universalBamTagger.bamtagmultiome.
run_multiome_tagging
(args)[source]¶ Run multiome tagging adds molecule information
Parameters: - bamin (str) – bam file to process, sam files can also be supplied but will be converted
- o (str) – path to output bam file
- method (str) – Protocol to tag, select from:nla, qflag, chic, nla_transcriptome, vasa, cs, nla_taps ,chic_taps, nla_no_overhang, scartrace
- qflagger (str) – Query flagging algorithm to use, this algorithm extracts UMI and sample information from your reads. When no query flagging algorithm is specified, the singlecellmultiomics.universalBamTagger.universalBamTagger.QueryNameFlagger is used
- method – Method name, what kind of molecules need to be extracted. Select from: nla (Data with digested by Nla III enzyme) qflag (Only add basic tags like sampple and UMI, no molecule assignment) chic (Data digested using mnase fusion) nla_transcriptome (Data with transcriptome and genome digested by Nla III ) vasa (VASA transcriptomic data) cs (CELseq data, 1 and 2) cs_feature_counts (Single end, deduplicate using a bam file tagged using featurecounts, deduplicates a umi per gene) fl_feature_counts (deduplicate using a bam file tagged using featurecounts, deduplicates based on fragment position) nla_taps (Data with digested by Nla III enzyme and methylation converted by TAPS) chic_taps (Data with digested by mnase enzyme and methylation converted by TAPS), chic_taps_transcriptome (Same as chic_taps, but includes annotations) chic_nla scartrace (lineage tracing protocol)
- custom_flags (str) – Arguments passed to the query name flagger, comma separated “MI,RX,bi,SM”
- ref (str) – Path to reference fasta file, autodected from bam header when not supplied
- umi_hamming_distance (int) – Max hamming distance on UMI’s
- head (int) – Amount of molecules to process
- contig (str) – only process this contig
- region_start (int) – Zero based start coordinate of single region to process
- region_end (int) – Zero based end coordinate of single region to process, None: all contigs when contig is not set, complete contig when contig is set.
- alleles (str) – path to allele VCF
- allele_samples (str) – Comma separated samples to extract from the VCF file. For example B6,SPRET
- unphased_alleles (str) – Path to VCF containing unphased germline SNPs
- mapfile (str) – ‘Path to *.safe.bgzf file, used to decide if molecules are uniquely mappable, generate one using createMapabilityIndex.py
- annotmethod (int) – Annotation resolving method. 0: molecule consensus aligned blocks. 1: per read per aligned base
- cluster (bool) – Run contigs in separate cluster jobs
- resolve_unproperly_paired_reads (bool) – When enabled bamtagmultiome will look through the complete bam file in a hunt for the mate, the two mates will always end up in 1 molecule if both present in the bam file. This also works when the is_proper_pair bit is not set. Use this option when you want to find the breakpoints of genomic re-arrangements.
- no_rejects (bool) – Do not write rejected reads
- mem (int) – Amount of gigabytes to request for cluster jobs
- time (int) – amount of wall clock hours to request for cluster jobs
- exons (str) – Path to exon annotation GTF file
- introns (str) – Path to intron annotation GTF file
- consensus (bool) – Calculate molecule consensus read, this feature is _VERY_ experimental
- consensus_model (str) – Path to consensus calling model, when none specified, this is learned based on the supplied bam file, ignoring sites supplied by -consensus_mask_variants
- consensus_mask_variants (str) – Path VCF file masked for training on consensus caller
- consensus_n_train (int) – Amount of bases used for training the consensus model
- no_source_reads (bool) – Do not write original reads, only consensus
- scartrace_r1_primers (str) – comma separated list of R1 primers used in scartrace protocol
-
singlecellmultiomics.universalBamTagger.bamtagmultiome.
run_multiome_tagging_cmd
(commandline)[source]¶
-
singlecellmultiomics.universalBamTagger.bamtagmultiome.
tag_multiome_multi_processing
(input_bam_path: str, out_bam_path: str, molecule_iterator: Generator[T_co, T_contra, V_co] = None, molecule_iterator_args: dict = None, ignore_bam_issues: bool = False, head: int = None, no_source_reads: bool = False, fragment_size: int = None, blacklist_path: str = None, bp_per_job: int = None, bp_per_segment: int = None, temp_folder_root: str = '/tmp/scmo', max_time_per_segment: int = None, use_pool: bool = True, one_contig_per_process: bool = False, additional_args: dict = None, n_threads=None, job_bed_file: str = None)[source]¶
singlecellmultiomics.universalBamTagger.customreads module¶
-
class
singlecellmultiomics.universalBamTagger.customreads.
BulkFlagger
(**kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
-
class
singlecellmultiomics.universalBamTagger.customreads.
CustomAssingmentQueryNameFlagger
(block_assignments, **kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
This query name flagger converts values between colons “:” to tags
-
class
singlecellmultiomics.universalBamTagger.customreads.
VaninsbergheQueryNameFlagger
(**kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
singlecellmultiomics.universalBamTagger.digest module¶
singlecellmultiomics.universalBamTagger.mspjI module¶
singlecellmultiomics.universalBamTagger.nlaIII module¶
singlecellmultiomics.universalBamTagger.rna module¶
singlecellmultiomics.universalBamTagger.scar module¶
singlecellmultiomics.universalBamTagger.scchic module¶
singlecellmultiomics.universalBamTagger.tag module¶
singlecellmultiomics.universalBamTagger.taps module¶
-
class
singlecellmultiomics.universalBamTagger.taps.
TAPSFlagger
(reference, **kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
singlecellmultiomics.universalBamTagger.tapsTabulator module¶
singlecellmultiomics.universalBamTagger.tapsTagger module¶
singlecellmultiomics.universalBamTagger.universalBamTagger module¶
-
class
singlecellmultiomics.universalBamTagger.universalBamTagger.
AlleleTagger
(**kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
-
class
singlecellmultiomics.universalBamTagger.universalBamTagger.
MoleculeIterator_OLD
(alignmentfile, look_around_radius=100000, umi_hamming_distance=0, sample_select=None, **pysam_kwargs)[source]¶ Bases:
object
Iterate over molecules in a bam file
Parameters: - alignmentfile (pysam.AlignmentFile) – file to read the molecules from
- look_around_radius (int) – buffer to accumulate molecules in. All fragments belonging to one molecule should fit this radius
- umi_hamming_distance (int) – Edit distance on UMI, 0: only exact match, 1: single base distance
- sample_select (iterable) – Iterable of samples to only select molecules from
Yields: - list of molecules (list [ pysam.AlignedSegment ])
- [ (R1,R2), (R1,R2) … ]
-
class
singlecellmultiomics.universalBamTagger.universalBamTagger.
QueryNameFlagger
(**kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.digest.DigestFlagger
-
class
singlecellmultiomics.universalBamTagger.universalBamTagger.
TranscriptIterator
(look_around_radius=100, informative_read=2, assignment_radius=10, **kwargs)[source]¶ Bases:
singlecellmultiomics.universalBamTagger.universalBamTagger.MoleculeIterator_OLD