singlecellmultiomics.utils package¶
Submodules¶
singlecellmultiomics.utils.binning module¶
-
singlecellmultiomics.utils.binning.
bp_chunked
(job_generator, bp_per_job)[source]¶ Chunk an iterator containing coordinate sorted tasks in chunks of a total size of roughly bp_per_job
Parameters: - job_generator – iterable of commands, format (contig, start, end, *task)
- bp_per_job (int) – Amount of bp per chunk of jobs/tasks
Yields: chunk(list) – [(contig, start, end, *task),(contig, start, end, *task),..]
@todo: contig is not used, this function expects that only bins on a single contig are supplied
-
singlecellmultiomics.utils.binning.
coordinate_to_bins
(point, bin_size, sliding_increment)[source]¶ Convert a single value to a list of overlapping bins
Parameters: - point (int) – coordinate to look up
- bin_size (int) – bin size
- sliding_increment (int) – sliding window offset, this is the increment between bins
Returns: list
Return type: [(bin_start,bin_end), . ]
-
singlecellmultiomics.utils.binning.
coordinate_to_sliding_bin_locations
(dp, bin_size, sliding_increment)[source]¶ Convert a single value to a list of overlapping bins
Parameters —– —– point : int
coordinate to look up- bin_size : int
- bin size
- sliding_increment : int
- sliding window offset, this is the increment between bins
Returns: - start (int) – the start coordinate of the first overlapping bin
- end (int) – the end of the last overlapping bin
- start_id (int) – the index of the first overlapping bin
- end_id (int) – the index of the last overlapping bin
singlecellmultiomics.utils.blockzip module¶
-
class
singlecellmultiomics.utils.blockzip.
BlockZip
(path, mode='r', read_all=False)[source]¶ Bases:
object
-
__getitem__
(contig_position_strand)[source]¶ Obtain data at the supplied contig position and strand :param contig_position_strand: tuple of (
contig(str) postion(int) strand(bool))Returns: data stored for the genomic location, returns None when no data is available Return type: result (str)
-
write
(contig, position, strand, data)[source]¶ Write information for location contig/postion/strand !! Write the contig data per contig, random mixing of contigs will result in a corrupted file :param contig: :type contig: str :param postion: :type postion: int :param strand: :type strand: bool :param data: :type data: str
-
singlecellmultiomics.utils.html module¶
-
singlecellmultiomics.utils.html.
style_str
(s, color='black', weight=300)[source]¶ Style the supplied string with HTML tags
Parameters: - s (str) – string to format
- color (str) – color to show the string in
- weight (int) – how thick the string will be displayed
Returns: html representation of the string
Return type: html(string)
singlecellmultiomics.utils.iteration module¶
singlecellmultiomics.utils.sequtils module¶
-
class
singlecellmultiomics.utils.sequtils.
Reference
[source]¶ Bases:
singlecellmultiomics.utils.prefetch.Prefetcher
This is a picklable wrapper to pass reference handles
-
singlecellmultiomics.utils.sequtils.
complement
(seq)[source]¶ Obtain complement of seq
Returns: complement (str)
-
singlecellmultiomics.utils.sequtils.
create_MD_tag
(reference_seq, query_seq)[source]¶ Create MD tag :param reference_seq: reference sequence of alignment :type reference_seq: str :param query_seq: query bases of alignment :type query_seq: str
Returns: md description of the alignment Return type: md_tag(str)
-
singlecellmultiomics.utils.sequtils.
create_fasta_dict_file
(refpath: str, skip_if_exists=True)[source]¶ Create index dict file for the reference fasta at refpath
Parameters: - refpath – path to fasta file
- skip_if_exists – do not generate the index if it exists
Returns: path to the dict index file
Return type: dpath (str)
-
singlecellmultiomics.utils.sequtils.
get_chromosome_number
(chrom: str) → int[source]¶ - Get chromosome number (index) of the supplied chromosome:
- ‘1’ -> 1, chr1 -> 1, returns -1 when not available, chrM -> -1
-
singlecellmultiomics.utils.sequtils.
get_consensus_dictionaries
(R1, R2, only_include_refbase=None, dove_safe=False, min_phred_score=None, skip_first_n_cycles_R1=None, skip_last_n_cycles_R1=None, skip_first_n_cycles_R2=None, skip_last_n_cycles_R2=None, dove_R2_distance=0, dove_R1_distance=0)[source]¶
-
singlecellmultiomics.utils.sequtils.
get_context
(contig: str, position: int, reference: pysam.libcfaidx.FastaFile, ibase: str = None, k_rad: int = 1)[source]¶ Parameters: - contig – contig of the location to extract context
- position – zero based position
- reference – pysam.FastaFile handle or similar object which supports .fetch()
- ibase – single base to inject into the middle of the context
- k_rad – radius to extract
Returns: extracted context with length k_rad*2 + 1
Return type: context(str)
-
singlecellmultiomics.utils.sequtils.
get_contig_lengths_from_resource
(resource) → dict[source]¶ Extract contig lengts from the supplied resouce (Fasta file or Bam/Cram/Sam ) :returns: lengths(dict)
-
singlecellmultiomics.utils.sequtils.
get_contig_list_from_fasta
(fasta_path: str, with_length: bool = False) → list[source]¶ - Obtain list of contigs froma fasta file,
- all alternative contigs are pooled into the string MISC_ALT_CONTIGS_SCMO
Parameters: - fasta_path (str or pysam.FastaFile) – Path or handle to fasta file
- with_length (bool) – return list of lengths
Returns: List of contigs + [‘MISC_ALT_CONTIGS_SCMO’] if any alt contig is present in the fasta file
Return type: contig_list (list )
-
singlecellmultiomics.utils.sequtils.
get_file_type
(s: str)[source]¶ Guess the file type of the input string, returns None when the file type can not be determined
-
singlecellmultiomics.utils.sequtils.
is_autosome
(chrom: str) → bool[source]¶ Returns True when the chromsome is an autosomal chromsome, not an alternative allele, mitochrondrial or sex chromosome
Parameters: chrom (str) – chromosome name Returns: True when the chromsome is an autosome Return type: is_main(bool)
-
singlecellmultiomics.utils.sequtils.
is_main_chromosome
(chrom: str, exclude_mt=False) → bool[source]¶ Returns True when the chromsome is a main chromsome, not an alternative locus, scaffold, decoy or spike-in
Parameters: chrom (str) – chromosome name Returns: True when the chromsome is a main chromsome Return type: is_main(bool)
-
singlecellmultiomics.utils.sequtils.
phred_to_prob
(phred)[source]¶ Convert a phred score (ASCII) or integer to a numeric probability :param phred: score to convert :type phred: str/int
Returns: probability(float)
-
singlecellmultiomics.utils.sequtils.
phredscores_to_base_call
(probs: dict)[source]¶ Perform base calling on a observation dictionary. Returns N when there are multiple options with the same likelihood
Parameters: - probs – dictionary with confidence scores probs = { ‘A’:[0.95,0.99,0.9], ‘T’:[0.1],
- } –
Returns: Called base phred(float) : probability of the call to be correct
Return type: base(str)
-
singlecellmultiomics.utils.sequtils.
pick_best_base_call
(*calls) → tuple[source]¶ Pick the best base-call from a list of base calls
Example
>>> pick_best_base_call( ('A',32), ('C',22) ) ) ('A', 32)
>>> pick_best_base_call( ('A',32), ('C',32) ) ) None
Parameters: calls (generator) – generator/list containing tuples Returns: tuple (best_base, best_q) or (‘N’,0) when there is a tie
-
singlecellmultiomics.utils.sequtils.
prob_to_phred
(prob: float)[source]¶ Convert probability of base call being correct into phred score Values are clipped to stay within 0 to 60 phred range
Parameters: prob (float) – probability of base call being correct Returns: phred_score (byte)
-
singlecellmultiomics.utils.sequtils.
read_to_consensus_dict
(read, start: int = None, end: int = None, only_include_refbase: str = None, skip_first_n_cycles: int = None, skip_last_n_cycles: int = None, min_phred_score: int = None)[source]¶ Obtain consensus calls for read, between start and end
singlecellmultiomics.utils.submission module¶
-
singlecellmultiomics.utils.submission.
create_job_file_paths
(target_directory, job_alias=None, prefix=None, job_file_name=None)[source]¶
-
singlecellmultiomics.utils.submission.
generate_job_script
(scheduler, jobfile, stderr, stdout, job_name, memory_gb, working_directory, time_h, threads_n, email, mail_when_finished=False, copy_env=True, slurm_scratch_space_size=None)[source]¶
-
singlecellmultiomics.utils.submission.
generate_submission_command
(jobfile, hold, scheduler='sge')[source]¶
-
singlecellmultiomics.utils.submission.
submit_job
(command, target_directory, working_directory, threads_n=1, memory_gb=8, time_h=8, scheduler='sge', copy_env=True, email=None, job_alias=None, mail_when_finished=False, hold=None, submit=True, prefix=None, job_file_name=None, job_name=None, silent=False, slurm_scratch_space_size=None)[source]¶ Submit a job
Parameters: - threads (int) – amount of requested threads
- memory_gb (int) – amount of requested memory
- scheduler (str) – sge/slurm/local
- hold (list) – list of job depedencies
- submit (bool) – perform the actual submission, when set to False only the submission script is written
Returns: id of sumbitted job
Return type: job_id(str)