singlecellmultiomics.fragment package¶
Submodules¶
singlecellmultiomics.fragment.chic module¶
singlecellmultiomics.fragment.fragment module¶
-
class
singlecellmultiomics.fragment.fragment.
FeatureCountsFullLengthFragment
(reads, R1_primer_length=4, R2_primer_length=6, assignment_radius=10000, umi_hamming_distance=1, invert_strand=False, **kwargs)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.FeatureCountsSingleEndFragment
Class for fragments annotated with featureCounts, with multiple reads covering a gene
Extracts annotated gene from the XT tag. Deduplicates using XT tag and UMI Reads without XT tag are flagged as invalid
-
class
singlecellmultiomics.fragment.fragment.
FeatureCountsSingleEndFragment
(reads, R1_primer_length=4, R2_primer_length=6, assignment_radius=100000, umi_hamming_distance=1, invert_strand=False, **kwargs)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
Class for fragments annotated with featureCounts
Extracts annotated gene from the XT tag. Deduplicates using XT tag and UMI Reads without XT tag are flagged as invalid
-
class
singlecellmultiomics.fragment.fragment.
Fragment
(reads, assignment_radius: int = 0, umi_hamming_distance: int = 1, R1_primer_length: int = 0, R2_primer_length: int = 6, tag_definitions: list = None, max_fragment_size: int = None, mapping_dir=(False, True), max_NUC_stretch: int = None, read_group_format: int = 0, library_name: str = None, single_end: bool = False)[source]¶ Bases:
object
This class holds 1 or more reads which are derived from the same cluster
Example
Generate a Fragment with a single associated read:
>>> from singlecellmultiomics.molecule import Molecule >>> from singlecellmultiomics.fragment import Fragment >>> import pysam >>> read = pysam.AlignedSegment() >>> read.reference_start = 30 >>> read.query_name = 'R1' >>> read.mapping_quality = 30 >>> read.set_tag('SM','CELL_1') # The sample to which the sample belongs is extracted from the SM tag >>> read.set_tag('RX','CAT') # The UMI is extracted from the RX tag >>> read.query_sequence = "CATGTATCCGGGCTTAA" >>> read.query_qualities = [30] * len(read.query_sequence) >>> read.cigarstring = f'{len(read.query_sequence)}M' >>> Fragment([read]) Fragment: sample:CELL_1 umi:CAT span:None 30-47 strand:+ has R1: yes has R2: no randomer trimmed: no
Warning
Make sure the RX and SM tags of the read are set! If these are encoded in the read name, use singlecellmultiomics.universalBamTagger.customreads for conversion.
-
R1
¶
-
R2
¶
-
__eq__
(other)[source]¶ Check equivalence between two Fragments or Fragment and Molecule.
Parameters: other (Fragment or Molecule) – object to compare against - Returns
- is_eq (bool) : True when the other object is (likely) derived from the same molecule
Example
>>> from singlecellmultiomics.molecule import Molecule >>> from singlecellmultiomics.fragment import Fragment >>> import pysam >>> # Create sam file to write some reads to: >>> test_sam = pysam.AlignmentFile('test.sam','w',reference_names=['chr1','chr2'],reference_lengths=[1000,1000]) >>> read_A = pysam.AlignedSegment(test_sam.header) >>> read_A.set_tag('SM','CELL_1') # The sample to which the sample belongs is extracted from the SM tag >>> read_A.set_tag('RX','CAT') # The UMI is extracted from the RX tag >>> # By default the molecule assignment is done based on the mapping location of read 1: >>> read_A.reference_name = 'chr1' >>> read_A.reference_start = 100 >>> read_A.query_sequence = 'ATCGGG' >>> read_A.cigarstring = '6M'
>>> read_B = pysam.AlignedSegment(test_sam.header) >>> read_B.set_tag('SM','CELL_1') >>> read_B.set_tag('RX','CAT') >>> read_B.reference_start = 100 >>> read_B.query_sequence = 'ATCGG' >>> read_B.cigarstring = '5M'
>>> frag_A = Fragment([read_A],umi_hamming_distance=0) >>> frag_A Fragment: sample:CELL_1 umi:CAT span:chr1 100-106 strand:+ has R1: yes has R2: no randomer trimmed: no >>> frag_B = Fragment([read_B],umi_hamming_distance=0) >>> frag_A == frag_B True # Fragment A and fragment B belong to the same molecule, # the UMI is identical, the starting position of R1 is identical and # the sample name matches
When we move one of the reads, the Fragments are not equivalent any more
Example
>>> read_B.reference_start = 150 >>> frag_B = Fragment([read_B],umi_hamming_distance=0) >>> frag_A == frag_B False
Except if the difference <= the assignment_radius
Example
>>> read_B.reference_start = 150 >>> read_A.reference_start = 100 >>> frag_B = Fragment([read_B],assignment_radius=300) >>> frag_A = Fragment([read_A],assignment_radius=300) >>> frag_A == frag_B True
When the UMI’s are too far apart, the eq function returns FalseExample
>>> read_B.reference_start = 100 >>> read_A.reference_start = 100 >>> read_A.set_tag('RX','GGG') >>> frag_B = Fragment([read_B]) >>> frag_A = Fragment([read_A]) >>> frag_A == frag_B False
When the sample of the Fragments are not identical, the eq function returns False
Example
>>> read_B.reference_start = 100 >>> read_A.reference_start = 100 >>> read_A.set_tag('RX','AAA') >>> read_B.set_tag('RX','AAA') >>> read_B.set_tag('SM', 'CELL_2' ) >>> frag_B = Fragment([read_B]) >>> frag_A = Fragment([read_A]) >>> frag_A == frag_B False
-
__getitem__
(index)[source]¶ Get a read from the fragment
Parameters: index (int) – 0 : Read 1 1: Read 2 ..
-
__len__
()[source]¶ Obtain the amount of associated reads to the fragment
Returns: assocoiated reads (int)
-
aligned_length
¶
-
estimated_length
¶ Obtain the estimated size of the fragment, returns None when estimation is not possible Takes into account removed bases (R2_primer_length attribute) Assumes inwards sequencing orientation, except when self.single_end is set
-
get_R1
()[source]¶ Obtain the AlignedSegment of read 1 of the fragment
Returns: - Read 1 of the fragment, returns None
- when R1 is not mapped
Return type: R1 (pysam.AlignedSegment)
-
get_R2
()[source]¶ Obtain the AlignedSegment of read 2 of the fragment
Returns: - Read 2 of the fragment, returns None
- when R2 is not mapped
Return type: R1 (pysam.AlignedSegment)
-
get_consensus
(only_include_refbase: str = None, dove_safe: bool = False, **get_consensus_dictionaries_kwargs) → dict[source]¶ a dictionary of (reference_pos) : (qbase, quality, reference_base) tuples
Parameters: - only_include_refbase (str) – Only report bases aligned to this reference base, uppercase only
- dove_safe (bool) – Only report bases supported within R1 and R2 start and end coordinates
Returns: {reference_position: (qbase, quality)
Return type: consensus(dict)
-
get_html
(chromosome=None, span_start=None, span_end=None, show_read1=None, show_read2=None)[source]¶ Get HTML representation of the fragment
Parameters: - chromosome (str) – chromosome to view
- span_start (int) – first base to show
- span_end (int) – last base to show
- show_read1 (bool) – show read1
- show_read2 (bool) – show read2
Returns: html representation of the fragment
Return type: html(string)
-
get_random_primer_hash
()[source]¶ Obtain hash describing the random primer this assumes the random primer is on the end of R2 and has a length of self.R2_primer_length When the rS tag is set, the value of this tag is used as random primer sequence Returns None,None when the random primer cannot be described
Returns: reference_name (str) or None reference_start (int) : Int or None sequence (str) : Int or None
-
get_sample
() → str[source]¶ Obtain the sample name associated with the fragment The sample name is extracted from the SM tag of any of the associated reads.
Returns: sample name (str)
-
get_strand
()[source]¶ Obtain strand
Returns: False for Forward, True for reverse Return type: strand (bool)
-
set_duplicate
(is_duplicate)[source]¶ Define this fragment as duplicate, sets the corresponding bam bit flag :param value: is_duplicate :type value: bool
-
set_sample
(sample: str = None, library_name: str = None)[source]¶ Force sample name or obtain sample name from associated reads
-
set_strand
(strand: bool)[source]¶ Set mapping strand
Parameters: strand (bool) – False for Forward, True for reverse
-
umi_eq
(other)[source]¶ Hamming distance measurement to another Fragment or Molecule,
- Returns :
- is_close (bool) : returns True when the hamming distance between the two objects <= umi_hamming_distance
Example
>>> from singlecellmultiomics.molecule import Molecule >>> from singlecellmultiomics.fragment import Fragment >>> import pysam >>> # Create reads (read_A, and read_B), they both belong to the same >>> # cell and have 1 hamming distance between their UMI's >>> read_A = pysam.AlignedSegment() >>> read_A.set_tag('SM','CELL_1') # The sample to which the sample belongs is extracted from the SM tag >>> read_A.set_tag('RX','CAT') # The UMI is extracted from the RX tag >>> read_B = pysam.AlignedSegment() >>> read_B.set_tag('SM','CELL_1') # The sample to which the sample belongs is extracted from the SM tag >>> read_B.set_tag('RX','CAG') # The UMI is extracted from the RX tag
>>> # Create fragment objects for read_A and B: >>> frag_A = Fragment([read_A],umi_hamming_distance=0) >>> frag_B = Fragment([read_B],umi_hamming_distance=0) >>> frag_A.umi_eq(frag_B) # This returns False, the distance is 1, which is higher than 0 (umi_hamming_distance) False
>>> frag_A = Fragment([read_A],umi_hamming_distance=1) >>> frag_B = Fragment([read_B],umi_hamming_distance=1) >>> frag_A.umi_eq(frag_B) # This returns True, the distance is 1, which is the (umi_hamming_distance) True
-
update_span
()[source]¶ Update the span (the location the fragment maps to) stored in Fragment
The span is a single stretch of coordinates on a single contig. The contig is determined by the reference_name assocated to the first mapped read in self.reads
This calculation assumes the reads are sequenced inwards and dove-tails of the molecule cannot be trusted
-
write_pysam
(pysam_handle)[source]¶ Write all associated reads to the target file
Parameters: target_file (pysam.AlignmentFile) – Target file
-
write_tensor
(chromosome=None, span_start=None, span_end=None, height=30, index_start=0, base_content_table=None, base_mismatches_table=None, base_indel_table=None, base_qual_table=None, base_clip_table=None, mask_reference_bases=None, reference=None, skip_missing_reads=False)[source]¶ Write tensor representation of the fragment to supplied 2d arrays
Parameters: - chromosome (str) – chromosome to view
- span_start (int) – first base to show
- span_end (int) – last base to show
- height (int) – height of the tensor (reads)
- index_start (int) – start writing tensor at this row
- base_content_table (np.array) – 2d array to write base contents to
- base_indel_table (np.array) – 2d array to write indel information to
- base_content_table – 2d array to write base contents to
- base_content_table – 2d array to write base contents to
- mask_reference_bases (set) – mask reference bases in this set with a N set( (chrom,pos), … )
- reference (pysam.FastaFile) – Handle to reference file to use instead of MD tag. If None: MD tag is used.
- skip_missing_reads (bool) – when enabled only existing (non None) reads are added to the tensor. Use this option when mapping single-end
Returns: None
-
-
class
singlecellmultiomics.fragment.fragment.
FragmentStartPosition
(reads, assignment_radius: int = 0, umi_hamming_distance: int = 1, R1_primer_length: int = 0, R2_primer_length: int = 6, tag_definitions: list = None, max_fragment_size: int = None, mapping_dir=(False, True), max_NUC_stretch: int = None, read_group_format: int = 0, library_name: str = None, single_end: bool = False)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
Fragment without a specific location on a contig
-
class
singlecellmultiomics.fragment.fragment.
FragmentWithoutPosition
(reads, assignment_radius: int = 0, umi_hamming_distance: int = 1, R1_primer_length: int = 0, R2_primer_length: int = 6, tag_definitions: list = None, max_fragment_size: int = None, mapping_dir=(False, True), max_NUC_stretch: int = None, read_group_format: int = 0, library_name: str = None, single_end: bool = False)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
Fragment without a specific location on a contig
-
class
singlecellmultiomics.fragment.fragment.
FragmentWithoutUMI
(reads, **kwargs)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
Use this class when no UMI information is available
-
class
singlecellmultiomics.fragment.fragment.
SingleEndTranscriptFragment
(reads, features, assignment_radius=0, stranded=None, capture_locations=False, auto_set_intron_exon_features=True, **kwargs)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
,singlecellmultiomics.features.features.FeatureAnnotatedObject
singlecellmultiomics.fragment.nlaIII module¶
-
class
singlecellmultiomics.fragment.nlaIII.
NlaIIIFragment
(reads, R1_primer_length=4, R2_primer_length=6, assignment_radius=1000, umi_hamming_distance=1, invert_strand=False, check_motif=True, no_overhang=False, cut_location_offset=-4, reference=None, allow_cycle_shift=False, use_allele_tag=False, no_umi_cigar_processing=False, **kwargs)[source]¶ Bases:
singlecellmultiomics.fragment.fragment.Fragment
-
get_undigested_site_count
(reference_handle)[source]¶ Obtain the amount of undigested sites in the span of the fragment
Parameters: reference (pysam.FastaFile or similiar) – Returns: - undigested_site_count (int) – amount of undigested cut sites in the mapping span of the fragment
- Raises
- ——-
- ValueError (when the span of the molecule is not properly defined)
-