variant_extractor API
- class variant_extractor.VariantExtractor(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]
Bases:
object
Reads and extracts variants from VCF files. This class is designed to be used in a pipeline, where the variants are ingested from VCF files and then used in downstream analysis.
- __init__(vcf_file: str, pass_only=False, ensure_pairs=True, fasta_ref: str | None = None)[source]
- Parameters:
vcf_file (str) – A VCF formatted file. The file is automatically opened.
pass_only (bool, optional) – If
True
, only records with PASS filter will be considered.ensure_pairs (bool, optional) – If
True
, throws an exception if a breakend is missing a pair when all other were paired successfully.fasta_ref (str, optional) – A FASTA file with the reference genome. Must be indexed.
- class variant_extractor.variants.BreakendSVRecord(prefix: str | None, bracket: str, contig: str, pos: int, suffix: str | None)[source]
Bases:
NamedTuple
NamedTuple with the information of a breakend notated SV record
- bracket: str
Bracket of the SV record with breakend notation. For example, for
G]17:198982]
the bracket will be]
- contig: str
Contig of the SV record with breakend notation. For example, for
G]17:198982]
the contig will be17
- pos: int
Position of the SV record with breakend notation. For example, for
G]17:198982]
the position will be198982
- prefix: str | None
Prefix of the SV record with breakend notation. For example, for
G]17:198982]
the prefix will beG
- suffix: str | None
Suffix of the SV record with breakend notation. For example, for
G]17:198982]
the suffix will beNone
- class variant_extractor.variants.ShorthandSVRecord(type: str, extra: List[str] | None)[source]
Bases:
NamedTuple
NamedTuple with the information of a shorthand SV record
- extra: List[str] | None
Extra information of the SV. For example, for
<DUP:TANDEM:AA>
the extra will be['TANDEM', 'AA']
- type: str
One of the following,
'DEL'
,'INS'
,'DUP'
,'INV'
or'CNV'
- class variant_extractor.variants.VariantRecord(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]
Bases:
object
NamedTuple with the information of a variant record
- __init__(rec: VariantRecord, contig: str, pos: int, end: int, length: int, id: str | None, ref: str, alt: str, variant_type: VariantType, alt_sv_breakend: BreakendSVRecord | None = None, alt_sv_shorthand: ShorthandSVRecord | None = None)[source]
- alt: str
Alternative sequence
- alt_sv_breakend: BreakendSVRecord | None
Breakend SV info, present only for SVs with breakend notation. For example,
G]17:198982]
- alt_sv_shorthand: ShorthandSVRecord | None
Shorthand SV info, present only for SVs with shorthand notation. For example,
<DUP:TANDEM>
- contig: str
Contig name
- end: int
End position of the variant in the contig (same as pos for TRA and SNV)
- filter: List[str | int]
Filter status. PASS if this position has passed all filters. Otherwise, it contains the filters that failed
- property format
Specifies data types and order of the genotype information
- id: str | None
Record identifier
- property info
Additional information
- length: int
Length of the variant
- pos: int
Position of the variant in the contig
- qual: float | None
Quality score for the assertion made in ALT
- ref: str
Reference sequence
- property samples
Genotype information for each sample
- variant_type: VariantType
Variant type