variant_extractor API

class variant_extractor.VariantExtractor(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]

Bases: object

Reads and extracts variants from VCF files. This class is designed to be used in a pipeline, where the variants are ingested from VCF files and then used in downstream analysis.

__init__(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]

Parameters:

vcf_file (str) – A VCF formatted file. The file is automatically opened.
pass_only (bool, optional) – If True, only records with PASS filter will be considered.
ensure_pairs (bool, optional) – If True, throws an exception if a breakend is missing a pair when all other were paired successfully.
fasta_ref (str, optional) – A FASTA file with the reference genome. Must be indexed.

close()[source]: Closes the VCF file.

static empty_dataframe()[source]: Returns an empty pandas DataFrame with the columns used by this class.

to_dataframe()[source]

class variant_extractor.variants.BreakendSVRecord(prefix: str | None, bracket: str, contig: str, pos: int, suffix: str | None)[source]

Bases: NamedTuple

NamedTuple with the information of a breakend notated SV record

bracket: str: Bracket of the SV record with breakend notation. For example, for G]17:198982] the bracket will be ]

contig: str: Contig of the SV record with breakend notation. For example, for G]17:198982] the contig will be 17

pos: int: Position of the SV record with breakend notation. For example, for G]17:198982] the position will be 198982

prefix: str | None: Prefix of the SV record with breakend notation. For example, for G]17:198982] the prefix will be G

suffix: str | None: Suffix of the SV record with breakend notation. For example, for G]17:198982] the suffix will be None

class variant_extractor.variants.ShorthandSVRecord(type: str, extra: List[str] | None)[source]

Bases: NamedTuple

NamedTuple with the information of a shorthand SV record

extra: List[str] | None: Extra information of the SV. For example, for <DUP:TANDEM:AA> the extra will be ['TANDEM', 'AA']

type: str: One of the following, 'DEL', 'INS', 'DUP', 'INV' or 'CNV'

class variant_extractor.variants.VariantRecord(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]

Bases: object

NamedTuple with the information of a variant record

__init__(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]

alt: str: Alternative sequence

alt_sv_breakend: BreakendSVRecord | None: Breakend SV info, present only for SVs with breakend notation. For example, G]17:198982]

alt_sv_shorthand: ShorthandSVRecord | None: Shorthand SV info, present only for SVs with shorthand notation. For example, <DUP:TANDEM>

contig: str: Contig name

end: int: End position of the variant in the contig (same as pos for TRA and SNV)

filter: List[str | int]: Filter status. PASS if this position has passed all filters. Otherwise, it contains the filters that failed

property format: Specifies data types and order of the genotype information

id: str | None: Record identifier

property info: Additional information

length: int: Length of the variant

pos: int: Position of the variant in the contig

qual: float | None: Quality score for the assertion made in ALT

ref: str: Reference sequence

property samples: Genotype information for each sample

variant_type: VariantType: Variant type

class variant_extractor.variants.VariantType(value)[source]

Bases: Enum

Enumeration with the different types of variations

CNV = 6

DEL = 2

DUP = 4

INS = 3

INV = 5

SGL = 8

SNV = 1

TRA = 7