You need a CLIMB account to upload or access data. If you haven’t got one, see instructions on registering. You will need to SSH into the CLIMB-COVID server.
/cephfs/covid/bham/artifacts/published/elan.latest.consensus.matched.fasta. The daily consensus FASTA is indexed to make sequence extraction quick, a simple script which allows you to do so using a list (or file) of either
central_sample_id, run_name pairs is available from the CLIMB-COVID/utilities repository.
bam / their associated
bam.bai locations must be resolve via a the lookup table (
/cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv) which will be updated by Elan daily.
/cephfs/covid/bham/results/msa/latest/alignments/ which consists of:
cog_<date>_all.fa : all unaligned sequences after deduplication
cog_<date>_all_alignment.fa : all aligned sequences after deduplication
cog_<date>_all_metadata.csv : all corresponding metadata
cog_<date>_alignment.fa : filtered, trimmed alignment with sequences matching those in the corresponding metadata
cog_<date>_metadata.csv : corresponding metadata for filtered, trimmed alignment
/cephfs/covid/bham/results/variants/YYYYMMDD/ (only the latest run is kept, this may be the current date or yesterdays date depending on time) this contains:
naive_variants_table.csv: A long format table containing all variant information (SNPS / indels) on every COG ID within
naive_msa.fasta: The MSA used to generate
best_refs.paired.ls: A summary file containing the filename for the source fasta file included in the MSA and its original fasta header
The FASTA consensus and metadata table are perfectly paired. The sequence records in the FASTA and metadata rows in the table are in the same order.
Additionally, the table contains a
fasta_header column that can be used to map the records in the FASTA file. Note that the order is not guaranteed between different runs of the pipeline (i.e., the FASTA will not be in the same order each time the inbound pipeline finishes).
Note also that the merged consensus FASTA will also include resequencing. That is, a biosample may have more than one genome in the consensus FASTA.
See accessing dataviews.