Why pre-generate constructs?
The construct list keeps boundary generation and evidence fields consistent across reports. When enough information is available, protein reports contain candidate boundaries derived from topology, curated annotations, AlphaFold confidence, PDB evidence, PTM/processing features, cysteine context, and ortholog mappings.
Use the list to inspect candidate boundaries alongside the evidence behind them. Before ordering, review target biology, literature, partner requirements, assay goals, and expression constraints.
Strict and lenient construct algorithm
This section summarizes the AlphaFold-derived portion of the construct generator, which runs when a compatible AlphaFold model and PAE matrix are available. Strict constructs use conservative pLDDT seeds followed by recursive PAE boundary scoring; lenient constructs use a lower pLDDT threshold to preserve larger structured regions.
The two AlphaFold confidence signals are used for different decisions. pLDDT is a one-dimensional per-residue signal, so it first defines candidate structured intervals. The lenient track stops after this step and keeps larger intervals using a lower pLDDT threshold. The strict track takes only higher-confidence pLDDT seed intervals and then uses the two-dimensional PAE matrix to decide whether each seed should be split into smaller domain-like units.
For strict splitting, every possible boundary is tested if both resulting fragments remain at least 40 amino acids. A boundary is eligible when the predicted error between the two sides is high relative to the predicted error within each side. Low linker pLDDT near the boundary adds support for a split because it suggests a flexible connector between structured blocks.
Step-by-step strict and lenient generation
- Define the eligible design region. The pipeline first determines the region that can be used for soluble construct design. For secreted proteins this is usually the mature secreted chain; for single-pass proteins it is usually the extracellular region; for multipass proteins, large extracellular regions can be treated as mixed soluble/membrane cases, while membrane-expression constructs are handled separately. Strict and lenient soluble constructs stay inside this region.
- Retrieve canonical AlphaFold confidence data when available. The canonical UniProt sequence is matched to the AlphaFold structure and PAE matrix. pLDDT is used as a per-residue local-confidence signal. PAE is used as a pairwise confidence signal that indicates whether two parts of the model have a confident relative orientation.
- Generate lenient pLDDT regions. The lenient track marks residues with pLDDT at least 60, allows low-confidence gaps up to 12 residues, and discards resulting regions shorter than 25 residues. This track preserves larger structured units and can include more than one domain when the intervening linker is short or only moderately uncertain.
- Generate strict pLDDT seed regions. The strict track marks residues with pLDDT at least 70, allows low-confidence gaps up to 8 residues, and discards seed regions shorter than 25 residues. These high-confidence intervals then feed PAE-based domain splitting.
- Evaluate PAE split candidates inside each strict seed. For every possible cut, both resulting sides must remain at least 40 amino acids. The pipeline computes intrablock PAE within each side, interblock PAE between the two sides, and separation, defined as interblock PAE minus mean intrablock PAE.
- Accept only convincing PAE boundaries. A cut is eligible only when interblock PAE is at least 12 A and separation is at least 4 A. This means the two sides are internally more coherent than they are with respect to each other, which is the signal expected at a domain boundary or flexible hinge.
- Score eligible strict split boundaries. Candidate boundaries are ranked using
split score = separation + 0.35 x max(0, boundary PAE - intrablock PAE) + linker bonus. The linker bonus increases when local pLDDT near the cut is low, because low-confidence linker residues support splitting adjacent structured blocks.
- Recursively split strict regions. The highest-scoring eligible cut is applied, then the same PAE split search is repeated on the left and right child regions. Recursion stops when no valid cut remains, child regions would become too small, or the configured maximum recursion depth is reached.
- Apply final construct filters. Calculated soluble constructs must stay inside the design region and must be at least 50 amino acids. Exact duplicate boundaries from different calculated or annotated sources are collapsed to keep reports concise; PDB-backed constructs are kept individually so no experimental structure is dropped.
- Annotate each construct for review. Construct cards list sequence, boundaries, exports, included UniProt/InterPro annotations, PTMs, furin-like motifs, cysteine warnings, homolog-equivalent regions, and construct-specific conservation. Structural metrics, PAE split diagnostics, ligand-interaction annotations, and images appear when available.
Construct classes
Constructs are displayed in a fixed order so reports can be compared target-to-target. The order starts with broad biological context, then moves toward narrower or more computationally inferred regions.
The default minimum construct length is 50 amino acids; shorter candidates are filtered from the recommendation set.
Design scope comes first
OpenAntigens first determines the design scope: mature secreted sequence, GPI/single-pass extracellular region, a large multipass extracellular region, or a full-length membrane-protein expression region. Soluble constructs stay inside that inferred scope.
For secreted proteins, signal peptide and propeptide annotations help define mature sequence. For single-pass proteins, transmembrane and topology annotations define extracellular boundaries. For multipass proteins, large extracellular regions are treated as mixed cases; short loops usually route to membrane-expression target review.
Boundary filtering
Recommended soluble constructs must fit inside the design scope. InterPro or UniProt annotations that extend outside the mature secreted or extracellular region remain descriptive context.
PDB-derived boundaries that are fully outside the relevant scope are ignored. PDB entries that overlap the scope are each retained as a separate construct, even when their boundaries duplicate another construct, so every experimental structure is captured as evidence.
Strict versus lenient
Strict calculated constructs favor smaller regions with stronger local confidence and cleaner structural cohesion. They are useful when the goal is a compact antigen domain with fewer flexible residues.
Lenient calculated constructs tolerate broader regions and can preserve multi-domain surfaces. They are useful when the antibody-discovery goal may require a larger conformational epitope or when domains are structurally adjacent.
Interpretable evidence
Construct cards emphasize interpretable evidence: boundaries, sequence, class, structural metrics, PTMs, cysteines, homolog equivalents, PDB evidence, and warnings.
These fields keep each boundary traceable. A lower-confidence construct may still be appropriate if it preserves an important ligand interface or known epitope.
AlphaFold-derived structural heuristics
When a compatible AlphaFold model and PAE matrix are available, OpenAntigens uses AlphaFold confidence in two complementary ways: pLDDT estimates local residue confidence, and PAE estimates confidence in relative placement between residue pairs or regions. The construct generator uses both signals to identify coherent structural blocks and plausible split points.
The strict workflow is designed to find compact, domain-like units. It starts with residues whose pLDDT is at least 70, merges short low-confidence gaps up to 8 residues, discards structured seeds shorter than 25 residues, and then evaluates PAE-based split points. A split is considered only if each side would remain at least 40 residues. The split must have interblock PAE at least 12 and PAE separation of at least 4, where separation is the difference between interblock PAE and the average intrablock PAE of the two sides.
Accepted split candidates are scored by combining PAE separation, local boundary PAE enrichment, and linker pLDDT. Boundaries are favored when the candidate blocks are internally coherent, the two sides have uncertain relative placement, and the linker around the boundary has lower confidence. The splitter recurses up to the configured maximum depth, producing strict calculated constructs.
The lenient workflow is intentionally less conservative. It uses a pLDDT threshold of 60 and merges gaps up to 12 residues, which often preserves larger structured regions that strict splitting would divide. Lenient constructs are therefore useful when preserving a multi-domain surface may be more important than isolating a compact single domain.
Interpret these diagnostics with domain annotations, PDB evidence, PTMs, ligand sites, cysteines, and the intended antibody-discovery strategy.
What each construct card includes
Each construct card lists the primary human sequence, boundaries, export-ready TSV/FASTA, ortholog-equivalent rows when available, and construct-specific evidence, diagnostics, and warnings when available.
Homolog transfer
Human construct boundaries are transferred to mouse and cynomolgus monkey by aligning the full design scope to each ortholog sequence. The alignment projects human boundary positions onto ortholog sequences, so residue numbers can differ across species.
Unreliable or missing ortholog alignments are marked unavailable.
Cysteine edits
Pre-generated construct cards report unpaired cysteine warnings. In the Interactive Construct Builder, checkboxes can apply Cys-to-Ser edits to selected unpaired cysteines; names, TSV, and FASTA update immediately.
Cys-to-Ser edits are optional substitutions for flagged unpaired cysteines. Review known disulfides, structure, and functional biology before ordering.
PTM-aware review
Construct cards list overlapping PTMs and processing features for boundary review.
Cleavage, propeptide, signal peptide, and chain annotations deserve special attention because they can define mature protein boundaries.
Multipass proteins
Large extracellular regions on multipass proteins are handled as mixed cases with soluble extracellular-region constructs and full-length membrane-protein context. Proteins with only short extracellular loops are treated primarily as membrane-expression targets.
Membrane-expression suggestions are intentionally separate from soluble extracellular-region constructs because they answer a different experimental question.
How to choose among pre-generated constructs
- Start with the full design region to understand the complete mature secreted or extracellular context.
- Check PDB-backed constructs for experimentally observed boundary precedent.
- Use domain annotated constructs when the goal is a named biological domain.
- Use strict calculated constructs when compact soluble expression is the priority.
- Use lenient calculated constructs when a larger conformational epitope or multi-domain surface may matter.
- Review PTMs, cysteines, furin motifs, ligand regions, and assembly/partner requirements before finalizing.
- Inspect homolog-equivalent sequences if mouse or cynomolgus monkey screening, immunization, or reagent validation is planned.
- Open the Interactive Construct Builder to refine boundaries and export the exact final sequence.
Limitations
Pre-generated constructs are computational recommendations. They can miss literature-specific constructs, expression-system constraints, epitope-specific requirements, glycan-dependent biology, partner-dependent folding, and context-dependent cleavage or processing. AlphaFold confidence supports structural reasoning; expression, secretion, folding, and antibody accessibility require experimental validation.
Use these constructs as a reproducible starting set. Final designs should be reviewed by a scientist familiar with the target biology and validated experimentally.