The following sections describes the variables used in the different metadata tables. All tables should be machine-readable. They should have one or more key variables which are shared across all tables so that we can linked them (or create a full joined table containing all information)
Metadata table
Metadata table
Content
Data Storage Information
Server(s) address and disks. One row per bundle of data packages / datasets stored in different disks (containing data from multiple mice)
Mice Information
Mice features. One row per mouse.
Scan Information
List of folders/files available for each mice
🔑 SubjectID (mice ID) used as key variable to combine Mice Information and Scan Information
🔑 Datapackage ID is used as key variable to combine Data Storage table with the Mice or Scan Information
Describe origin of the data, where, when and by whom they were collected and whether the data package is complete.
University
University affiliation of the group who collected the data (e.g., UZH; see glossary)
Research_group
Abbreviation describing the group who collected the data (e.g., TIG; see glossary)
Year
Year of data curation
Data_type
Abbreviation describing the type of data collected (e.g., SRµCT; see glossary)
Dataproject_ID
(optional) Identifier of the project from which the data package derives, e.g., In Vivo CSF
Facility_proposal_ID
(optional) Number of the proposal for facility usage e.g., the beamline numbers
Facility_1
Name or acronym of the main facility (e.g., CLS, SPring-8,etc)
Facility_2
(optional) Additional facility information (e.g., beamline)
Facility_Country
Country where the facility is located
Start_date_acquisition
DD/MM/YYYY
Contact_researcher
Main researcher responsible for data collection, ideally can be contacted if there are issues with the data package or dataset
Status
Indicate if ‘complete’ or ‘in progress’
Status_comment
(optional) additional comments on the status of these data
File descriptors
Describe more details of the files, like the type of images they contain and how much volume they take on disk
Files_type
(If applicable) more detailed description of the files depending on the data type. E.g., if data_type is synchrotron: projections, reconstructions or both.
Data_size
The total volume of the data stored
File_subjectIDs
List which samples (mice) were in this location be precise e.g., CA001-CA030 should be used only if there are really 30 subjects with IDs 001 to 030
Storage locations
Show the paths to the data. The columns should provide enough information to find the data and access it (whenever the right permissions are in place). There can be several locations accessible offline or online.
Online locations
If available, indicate the address in the servers maintained by the research group who collected the data, e.g., if synchrotron data collected by TIG the server is expected to be maintained at the TIG facilities
Caution
Make sure paths are machine-readable: avoid entries like “123.54.666.10\data\folder (additional info)”, where the information in brackets has to be removed. Also, beware of any white spaces at the end of the path.
Source_server_path
Full path to the data (provide the IP-internet protocol server address, e.g.,‘\\123.45.679.01\data\synchrotron\brains’ and not the arbitrary letter used to map the network drive, like ‘O:\data\synchrotron\brains’)
Source_server_type
Specify if the path refers to a: disk, tape, server, online repository
Source_server_access
Public or private, specify any special access rules
Physical devices and copies
Disk_<research_group>: indicates a copy of the content of the disk in the different research group facilities. Groups cn be TIG, BMC, TKI (see Glossary)
Disk_<research_group>_status
Copied if there is a copy at the facilities of this group
Disk_<research_group>_ID
Unique identifier of the harddisk, e.g., serial number and model