API library interface

sequenza.izip

class sequenza.izip.zip_coordinates(item1, item2)[source]

Merge two object that have coordinate chromosome/position. The format of the objects must be a tuple with (coordinates, data) where coordinate is a tuple with chromosome,position_start, position_end and data is a tuple with the data. The data of the two object will be merged for matching lines. For the first object only the start coordinate is taken into account.

sequenza.izip.zip_fast(item1, item2)[source]

Use the native implementation of the heapq algorithm to sort and merge files chromosome-coordinate ordered. It assumes that the two files are position ordered and both files have the same chromosome order. It differs from zip_coordinates by the fact that this return all the position present in both files, group together the lines present in both

sequenza.wig

class sequenza.wig.Wiggle(wig)[source]

Read/write wiggle files as iterable objects.

exception sequenza.wig.WiggleError(message)[source]

sequenza.fasta

class sequenza.fasta.Fasta(file, n=60)[source]

Creates an iterable with genomic coordinates from a fasta file

sequenza.pileup

sequenza.pileup.acgt(pileup, quality, depth, reference, qlimit=53, noend=False, nostart=False)[source]

Parse the mpileup format and return the occurrence of each nucleotides in the given positions.

sequenza.pileup.pileup_acgt(pileup, quality, depth, reference, qlimit=53, noend=False, nostart=False)[source]

Yet another version of the pileup parser. Used as a template for the C implementation, the old function still runs slightly faster, to my surprise…

sequenza.samtools

class sequenza.samtools.bam_mpileup(bam, fasta, q=20, Q=20, samtools_bin='samtools', regions=[])[source]

Use samtools via subprocess and return an iterable object.

class sequenza.samtools.indexed_pileup(pileup, tabix_bin='tabix', regions=[])[source]

Use tabix via subprocess to slice the pileup data and return an iterable object

sequenza.samtools.program_version(program)[source]

Parse tabix or samtools help message in attempt to retrieve the software version: return format: [major, minor, *]

sequenza.samtools.tabix_seqz(file_name, tabix_bin='tabix', seq=1, begin=2, end=2, skip=1)[source]

Index a seqz file with tabix

sequenza.seqz

sequenza.seqz.acgt_genotype(acgt_dict, freq_list, strand_list, hom_t, het_t, het_f, bases_list)[source]

Return the alleles in the genotype

sequenza.seqz.unpack_data(data)[source]

Unpack normal, tumor and gc info from the specific touple structure and remove redundant information

sequenza.vcf

sequenza.vcf.vcf_headline_content(line)[source]

Try to get the string enclosed by “< … >” in the VCF header

sequenza.vcf.vcf_parse(vcf_file, sample_order='n/t', field='FORMAT', depth=['DP', 'DP'], alleles=['AD', 'AD'], preset=None)[source]

Parse the specified tags of a vcf file to retrieve total and per-allele depth information.