You need a CLIMB account to upload or access data. If you haven’t got one, see instructions on registering. You will need to SSH into the CLIMB-COVID server.
/cephfs/covid/bham/artifacts/published/elan.latest.consensus.matched.fasta
. The daily consensus FASTA is indexed to make sequence extraction quick, a simple script which allows you to do so using a list (or file) of either central_sample_id
, run_name
, pag_name
or central_sample_id, run_name
pairs is available from the CLIMB-COVID/utilities repository.bam
/ their associated bam.bai
locations must be resolve via a the lookup table (/cephfs/covid/artifacts/elan/latest/majora.pag_lookup.tsv
) which will be updated by Elan daily./cephfs/covid/bham/artifacts/published/majora.latest.metadata.matched.tsv
/cephfs/covid/bham/artifacts/published/latest.accessions.tsv
/cephfs/covid/bham/results/msa/latest/alignments/
which consists of:
cog_<date>_all.fa
: all unaligned sequences after deduplicationcog_<date>_all_alignment.fa
: all aligned sequences after deduplicationcog_<date>_all_metadata.csv
: all corresponding metadatacog_<date>_alignment.fa
: filtered, trimmed alignment with sequences matching those in the corresponding metadatacog_<date>_metadata.csv
: corresponding metadata for filtered, trimmed alignment/cephfs/covid/bham/results/variants/YYYYMMDD/
(only the latest run is kept, this may be the current date or yesterdays date depending on time) this contains:
naive_variants_table.csv
: A long format table containing all variant information (SNPS / indels) on every COG ID within best_refs.paired.ls
naive_msa.fasta
: The MSA used to generate naive_variants_table.csv
best_refs.paired.ls
: A summary file containing the filename for the source fasta file included in the MSA and its original fasta headerThe FASTA consensus and metadata table are perfectly paired. The sequence records in the FASTA and metadata rows in the table are in the same order.
Additionally, the table contains a fasta_header
column that can be used to map the records in the FASTA file. Note that the order is not guaranteed between different runs of the pipeline (i.e., the FASTA will not be in the same order each time the inbound pipeline finishes).
Note also that the merged consensus FASTA will also include resequencing. That is, a biosample may have more than one genome in the consensus FASTA.
See accessing dataviews.