Aegis
Aegis

Submit a sequence

The Analyze page is the main submission surface. This section walks through service choice, FASTA input rules, optional geography, and what happens when you hit Analyze.

Empty Analyze page
Analyze page — service selector, FASTA input, optional geography.

Choose a service

Aegis runs three analysis engines. You pick one or more per submission; each selected service produces its own result, and Aegis merges them into a single view.

Pipeline
Snakemake-driven: BLAST → taxonomy → AMR → virulence → chimera detection → risk scoring. Authoritative for threat classification.
AI engine
Multi-scale ML probes (150 bp / 2 kb / 8 kb) plus ORF detection and FAISS gene matching. Surfaces anomalies that alignment-based tools miss. Expect ~60–75% scores on reference viral genomes — that's training-data bias, not a danger signal.
Both
Runs pipeline and AI engine concurrently. The UI merges the two into a single job view. Recommended when you're investigating an unknown sequence rather than confirming a known one.
BLAST alone is deprecated
BLAST as a standalone service is no longer exposed. It runs inside the pipeline. If you want taxonomy without the full pipeline, the pipeline is still the right choice — it's only ~seconds slower on short inputs.

FASTA input rules

  • Format — standard FASTA. Each record starts with >header on its own line, followed by sequence lines. Mixed case is fine; ambiguity codes (N) are allowed; whitespace inside sequences is stripped.
  • Characters — nucleotide sequences only: A, T, C, G, N. Protein FASTAs and non-nucleotide characters are rejected at validation.
  • Per-submission limits — up to 20 sequences, 20 million nucleotides total per submission. Trivially repeating input (e.g. homopolymer stretches beyond a threshold) is rejected.
  • File type — paste into the text area or upload.fasta / .fa / .txt. Other types are rejected client-side.
  • Low-complexity masking — the pipeline applies DUST filtering during BLAST. Low-complexity regions are not scored as matches, even if they align; they appear as a separate track in the genome browser.

Optional: geography

If you know where the sample was collected, attach a country (and optionally finer geography) to the submission. It's used by the global surveillance panel to compare your sample to Nextstrain and NCBI records near that location. Leaving geography blank is always fine — it only improves the surveillance view, not the risk score.

Caching

You may see a “Cached” banner
If you (the same user) previously submitted an identical sequence with the same service and parameters, Aegis returns the prior result immediately and shows a Cached badge. A Partially cached banner appears when one of several selected services hits cache and the other runs fresh. Cache is per-user and expires after 24 hours or on a forced invalidation by an admin.

Submitting

  1. Pick the service(s).
  2. Paste or upload FASTA.
  3. (Optional) set geography.
  4. Press Ctrl+Enter or click Analyze.

Synchronous submissions block until results arrive. For long jobs (>30 s) the UI switches to the progress view automatically — see Watching progress.