variant_extractor API

class variant_extractor.VariantExtractor(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]

Bases: object

Reads and extracts variants from VCF files. This class is designed to be used in a pipeline, where the variants are ingested from VCF files and then used in downstream analysis.

__init__(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]
Parameters:
  • vcf_file (str) – A VCF formatted file. The file is automatically opened.

  • pass_only (bool, optional) – If True, only records with PASS filter will be considered.

  • ensure_pairs (bool, optional) – If True, throws an exception if a breakend is missing a pair when all other were paired successfully.

  • fasta_ref (str, optional) – A FASTA file with the reference genome. Must be indexed.

close()[source]

Closes the VCF file.

static empty_dataframe()[source]

Returns an empty pandas DataFrame with the columns used by this class.

to_dataframe()[source]
class variant_extractor.variants.BreakendSVRecord(prefix: str | None, bracket: str, contig: str, pos: int, suffix: str | None)[source]

Bases: NamedTuple

NamedTuple with the information of a breakend notated SV record

bracket: str

Bracket of the SV record with breakend notation. For example, for G]17:198982] the bracket will be ]

contig: str

Contig of the SV record with breakend notation. For example, for G]17:198982] the contig will be 17

pos: int

Position of the SV record with breakend notation. For example, for G]17:198982] the position will be 198982

prefix: str | None

Prefix of the SV record with breakend notation. For example, for G]17:198982] the prefix will be G

suffix: str | None

Suffix of the SV record with breakend notation. For example, for G]17:198982] the suffix will be None

class variant_extractor.variants.ShorthandSVRecord(type: str, extra: List[str] | None)[source]

Bases: NamedTuple

NamedTuple with the information of a shorthand SV record

extra: List[str] | None

Extra information of the SV. For example, for <DUP:TANDEM:AA> the extra will be ['TANDEM', 'AA']

type: str

One of the following, 'DEL', 'INS', 'DUP', 'INV' or 'CNV'

class variant_extractor.variants.VariantRecord(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]

Bases: object

NamedTuple with the information of a variant record

__init__(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]
alt: str

Alternative sequence

alt_sv_breakend: BreakendSVRecord | None

Breakend SV info, present only for SVs with breakend notation. For example, G]17:198982]

alt_sv_shorthand: ShorthandSVRecord | None

Shorthand SV info, present only for SVs with shorthand notation. For example, <DUP:TANDEM>

contig: str

Contig name

end: int

End position of the variant in the contig (same as pos for TRA and SNV)

filter: List[str | int]

Filter status. PASS if this position has passed all filters. Otherwise, it contains the filters that failed

property format

Specifies data types and order of the genotype information

id: str | None

Record identifier

property info

Additional information

length: int

Length of the variant

pos: int

Position of the variant in the contig

qual: float | None

Quality score for the assertion made in ALT

ref: str

Reference sequence

property samples

Genotype information for each sample

variant_type: VariantType

Variant type

class variant_extractor.variants.VariantType(value)[source]

Bases: Enum

Enumeration with the different types of variations

CNV = 6
DEL = 2
DUP = 4
INS = 3
INV = 5
SGL = 8
SNV = 1
TRA = 7