central_sample_id
.A biosample source refers to the physical patient or environment from where a sample was taken.
It is usually not possible (for various reasons) for a patient to be identified with a single, unique, shareable identifier when a sample is made available to the consortium. It is our intention that working between the health agencies, the NHS and HDR-UK, that such identifiers can be made available to the project in the near future. If there is an appropriate unique identifier for a patient, we call this the biosample_source_id
.
H20
. We call this the root_sample_id
.sender_sample_id
.Where possible, the local lab ID should be an identifier that can be mapped quickly and securely. Sites should not be encouraged to hold this linkage information themselves.
central_sample_id
) taken form that patient as the biosample_source_id
as a means to link them together in the interim.Where a biosample is split into multiple aliquots or shared with other sequencing centres, that biosample should not be relabelled by the receiving site. This ambiguity can be resolved later by providing a library and sequencing run name that corresponds to your site.
A library should be given a library_name
that is unique across the whole project. There are no limitations on what this name should look like but it should be at least five characters. Consider using the date, or site name if you don’t have a generated identifier to use. “BIRM-001” would be reasonable, “1” would not.
run_name
that is unique across the whole project. Ideally this will be the name your sequencing software generates for the run. For example, an Illumina run: <date>_<machine_id>_<run_no>_<some_zeros>_<flowcell>
or a Nanopore run <date>_<time>_<position>_<flowcell>_<id>
.
If for some reason you don’t have these, a string of at least five characters that helps you identify this run is fine. Consider using your site and the date if you don’t have another identifier. “BIRM-RUN-20200402-1” is fine, “RUN1” “birm run” or “nanopore” are not.If a library is sequenced multiple times, each run should be considered a separate and distinct sequencing run.
Be aware that once you have provided metadata, many fields can still be changed by resubmitting the data. However, the primary keys: central_sample_id
, library_name
and run_name
cannot be changed. Ensure they are correct before uploading your data to avoid complications.
More generally, an object such as a biosample, sequencing library or data file may be referred to as an artifact. Something that causes a change or creates a new object (such as library pooling, or sequencing) is referred to as a process.
For a sample to be accepted for upload to CLIMB, and used in downstream analyses, you must provide the following:
received_date
(YYYY-MM-DD). Do not attempt to impute the collection date. You must provide the day of the month.You cannot change the COGUK ID once it has been submitted.
You should aim to provide as much information as possible about a sample to the consortium. See a full list of fields that have been deemed acceptable for the consortium to collect.
Be sure that you have the governance and permission in place to share metadata before it is uploaded to CLIMB. Uploaded information may quickly be incorporated into downstream artifacts and shared around the consortium. Do not simply follow what other sites are doing, rules will differ between countries and health boards in the UK.
For a library to be listed on CLIMB, you must provide the following:
You cannot change the library_name once it has been submitted.
library_name
as your run_name
(or vice versa) if they have a 1:1 mapping (i.e. A single library was sequencing on a single run). If you provide sample information row-by-row using the uploader, each biosample that is pooled into the same library should have the same library_name. Each library that is on the same run should have the same run_name.You should also provide information pertinent to performing QC and filtering on your downstream analyses here, such as the version of the ARTIC protocol and primer pools (if used).
See more information on library metadata.
You cannot change the run_name once it has been submitted.
library_name
as your run_name
(or vice versa) if they have a 1:1 mapping (i.e. A single library was sequencing on a single run). If you provide sample information row-by-row using the uploader, each biosample that is pooled into the same library should have the same library_name. Each library that is on the same run should have the same run_name.The API will allow you to tag your sequencing runs with arbitrary key value pairs if you wish to provide more information about your sequencing run, or the downstream bioinformatics.
See more information on run metadata.
Metadata is stored by Majora
. You can log into Majora with your CLIMB unified account. New users should register and wait to have their account approved by the system administrators.
Metadata can be submitted in three ways. No matter which option you use, you will need to know your unified CLIMB user name and get an API token. Your username should have been emailed to you once your account was approved. You can see your token on your user profile.
Via our ocarina
command line tool, supported by Sam Nicholls (UoB)
We anticipate that sites will find using the CSV/TSV interface for uploading biosamples the most straightforward, and the ocarina
command line tool for providing library and sequencing information. Advanced users are welcome to communicate with the API directly.
#metadata
#inbound-distribution
.#metadata-apis
.