Metadata: making JSON files

In this use case we read the JSON fields from a table and create JSON files with generic and file-specific information

Code
R
Intermediate
Author
Affiliation

G.Fraga Gonzalez

Center for Reproducible Science, UZH

Published

March 13, 2024

Create a JSON file with metadata from tables and data files

In this example we create simple human and machine-readable metadata files in JavaScript Object Notation (JSON). They consists of fields of key-value pairs. These are sidecar metadata files, that is, they accompany a separate source data file (for this example we use dummy images as data files). In this use case, researchers can edit a table specifying the fields in the JSON file. This script creates a JSON with these fields. Some of the values in the JSON fields are filled for each of the data files based on the filename and an additional table with metadata (subject information).

Important

There are good editors with a graphical interface available online to read and write JSON. We recommend the following website: https://jsoneditoronline.org/

In general, we recommend creating JSON files with a script and not manually, to prevent data entry errors. JSON metadata files are part of essential machine-readable metadata, and additional to structured metadata (tables) which may contain human-only readable columns (e.g., ‘comment’ variables wtih free text notes). JSON files can have more details than the metadata tables.

This use case is based on:

inputs

  • A csv table specifying the JSON keys (e.g., “SynchrotronName”,“SpeciesName” ) and their values (e.g., “mouse”) when they apply to all files. If a value is blank it will be filled by information in the filename or table with subject information (see below)

  • A collection of data files encoding subject ID in their name.

  • An Excel table with subjects information (e.g.,sex, body weight) that has to be added to the JSON file

We will use the R package jsonlite to write the JSON string

outputs

A JSON file per data file with the fields from the table is created for each file with the same filename and the subject ID added in the corresponding json field. In this case the data files are images from mice.

Note on this example

This demo uses a dummy data set, i.e., the JSON fields and metadata have no real-life meaning.Metadata tables can be prepared in a spreadsheet and saved in different file formats for example csv and Excel. In actual projects the csv file format is preferred over Excel, for the sake of interoperability.

Read table with JSON fields and table with subjects information

Code
library(knitr)
library(dplyr)
library(kableExtra)

# Enter input directories 
dirinput <- 'dummy_data'
diroutput <- '../../_data/'

# Read table with JSON fields 
jsonFields <- read.csv(file.path(dirinput,'Dummy_JSON_fields.csv'))

# Read table with subject information
subj_info <- openxlsx::read.xlsx(file.path(dirinput,'DummyData1_20241234_subjects.xlsx'), sheet = 'subject_info')

Read data filenames and separate subject ID

Here we read some .jpg pictures with dummy images. The first filename part contains the subject ID.

Code
# Create table filenames 
files <- dir(file.path(dirinput,'Images')) # Find all files in our images folder

# Use pattern to take only subject images
fname <- files[grepl(paste0('^DS.*.jpg$'),files)] 
tbl_files <- as.data.frame(fname) 

# First filename part contains subject ID. Create new column with that info. 
tbl_files$subject <- sapply(strsplit(fname,'_'),'[[',1) 

Check input tables

In this table we created the column Data.name that indicates the key and the column Permissible.values which will indicate the value in the JSON file. The other columns are not necessary for this example, but can help the users when specifying the content of their JSON files. The entries in Permissible.values will be filled with file-specific information.

Important

The names of these columns are arbitrary and you can define any other name. It is important to note the format of the values (e.g., numeric, alphanumeric or strings). The values can also be arrays [1,2,3]. A JSON file can have a more hierarchical structure with keys and subkeys. In this example we use a simple structure.

Code
# check 
knitr::kable(jsonFields) %>%
  scroll_box(height= "300px")
JSON_section Data.name Data.Description Data.type Permissible.values
Info SynchrotronName Name of the synchrotron facility string Dummy-synchrotron
Info SynchrotronAcronym Acronym of the synchrotron facility string Sync1234
Info IntendedFor Name of the source file this metadata describes string
Info BeamlineAcronym Institution defined name of the machine. Corresponds to DICOM Tag DICOM tag 0008,1010. string BL20B2
Info BeamtimeSessionStartDate YYYY-MM-DD date 2024-01-13
Info BeamtimeSessionEndDate YYYY-MM-DD date 2024-01-13
Info SpeciesName string mouse (mus musculus)
Info SpeciesStrain alphanumeric C123456
Info SubjectID alphanumeric
Info Sex alphanumeric
Info BodyWeight_gr number
Info BodyPart Name of the organ / body region scanned. Corresponds to DICOM Tag DICOM tag 0018,0015. string brain
Acquisition DetectorArrayWidth number 123
Acquisition DetectorArrayHeight number 456
Acquisition DetectorBitDepth number 16
Acquisition DetectorPixelSize in µm number 6
Acquisition RotationAngle number 180
Reconstruction CenterOfRotationMethod string manual
Reconstruction CenterOfRotationValue In pixels" number 3
Reconstruction ReconMatrixSize  Size of the reconstruction matrix in pixels array of three numbers [1,1,2]
Reconstruction ReconMethodNameMethodName Reconstruction method or algorithm. Corresponds to DICOM Tag 0054,1103.  string dummy-method
Reconstruction MethodImplementationVersion  Identification for the software used, such as name and version (optional)  string in-house-bash-script-v02
Reconstruction FilterType  Type of post-recon smoothing  string gaussian
Code
# check 
knitr::kable(subj_info) %>%
  scroll_box(height= "300px")
subjID Sex Body.weight.(g) length.(mm) Dilut.(x) Con.(mg.Ba/ml) Int.1.rate.(uL/min) Int.2.vol.(uL) time.(HH:MM)
DS01 F 27.50 2.3 1.50 123 0.25 5.0 0.8666667
DS02 F 26.60 2.3 1.50 123 0.50 2.5 0.4680556
DS03 F 27.50 2.3 1.50 123 0.10 2.5 0.4173611
DS04 F 20.20 2.3 1.50 123 0.50 5.0 0.9777778
DS05 M 20.20 2.2 1.50 123 0.50 2.5 0.7993056
DS06 F 20.30 2.2 1.50 123 0.50 5.0 0.7006944
DS07 F 26.40 2.3 1.50 123 0.10 2.5 0.6284722
DS08 F 24.60 2.3 1.50 123 0.25 5.0 NA
DS09 F 26.80 2.3 1.50 123 0.25 5.0 0.8402778
DS10 M 25.70 2.3 1.50 123 0.50 2.5 0.7069444
DS11 F 24.40 2.3 1.50 123 0.50 5.0 0.6895833
DS12 F 25.30 2.3 1.00 111 0.50 2.5 0.5312500
DS13 M 24.70 2.2 1.00 111 0.10 2.5 0.7083333
DS14 M 27.00 2.2 1.00 111 0.10 2.5 0.9569444
DS15 M 24.00 2.2 1.00 111 0.25 5.0 0.8638889
DS16 F 25.80 2.2 1.00 111 0.25 5.0 0.4118056
DS17 M 26.40 2.2 1.00 111 0.25 5.0 NA
DS18 F 26.60 2.2 0.75 321 0.25 5.0 0.5805556
DS19 M 26.90 2.2 0.75 321 0.50 5.0 0.4722222
DS20 M 26.20 2.2 0.75 321 0.25 5.0 0.6520833
DS21 M 21.00 2.2 0.75 321 0.50 5.0 0.5486111
DS22 F 19.50 2.2 0.75 321 0.10 2.5 0.7812500
DS23 F 19.60 2.2 0.75 321 0.10 2.5 0.7611111
DS24 F 19.56 2.3 0.75 321 0.50 2.5 0.8715278
DS25 M 19.63 2.3 0.75 321 0.25 5.0 0.6125000
DS26 M 19.71 2.3 0.75 321 0.50 2.5 0.5375000
DS27 M 19.20 2.2 0.75 321 0.25 5.0 0.9298611
DS28 M 20.20 2.2 0.75 321 0.50 2.5 0.7451389
DS29 M 18.60 2.2 0.75 321 0.10 2.5 0.4687500
DS30 M 19.03 2.2 0.75 321 0.25 5.0 0.6833333
DS31 F 20.50 2.2 0.75 321 0.25 5.0 13.3800000
DS32 F 20.20 2.2 0.75 321 0.10 2.5 0.6256944
DS33 M 21.20 2.2 0.75 321 0.25 5.0 0.8361111
DS34 M 19.21 2.2 0.75 321 0.10 2.5 0.7090278
Code
# check 
knitr::kable(tbl_files) %>%
  scroll_box(height= "300px")
fname subject
DS01_statB_1scan_slc1234.jpg DS01
DS02_statB_1scan-v3_slc1234.jpg DS02
DS03_statB_1scan_slc1234.jpg DS03
DS04_statB_1scan_v2_slc1234.jpg DS04
DS04_statB_1scan_v3_slc1234.jpg DS04
DS05_statB_1scan_slc1234.jpg DS05
DS06_statB_1scan_slc1234.jpg DS06
DS07_statB_1scan_slc1234.jpg DS07
DS08_statB_1scan_slc1234.jpg DS08
DS09_statA_1scan-head_slc1234.jpg DS09
DS09_statA_1scan-neck_slc1234.jpg DS09
DS09_statA_1scan-nose-v2_slc1234.jpg DS09
DS09_statA_1scan-nose_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_001_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_002_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_003_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_004_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_005_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_006_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_007_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_008_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_009_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_010_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_011_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_012_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_013_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_014_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_015_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_016_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_017_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_018_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_019_slc1234.jpg DS09
DS09_statA_20scan-ph1-Int_020_slc1234.jpg DS09

Create the JSON files

Prepare table

Prepare the human-readable table specifying the JSON fields into a suitable format to read with jsonlite::toJSON()

Code
# Preserve the sorting of the field names as in the table
sorted_fieldnames <- factor(jsonFields$Data.name, levels = jsonFields$Data.name) 

# Transform table into a list. Each element is a fieldname with its values
json_data <- lapply(split(jsonFields$Permissible.values,sorted_fieldnames),as.character)

Add file-specific information

Loop through the image files and fill the file-specific values in the JSON file with information from the table with subject info.

Code
library(jsonlite)

# Join tables with filenames and subject information by subjID
metadat <- dplyr::full_join(x=tbl_files, y=subj_info, by=join_by('subject'=='subjID'),keep=FALSE)  

# write JSON files 
saveoutput <- 0
for (i in 1:nrow(tbl_files)) {
  
  # Complete Fields with info From table 
  json_data$IntendedFor = metadat$fname[i]
  json_data$SubjectID = metadat$subject[i]
  json_data$Sex = metadat$Sex[i]
  json_data$BodyWeight_gr = metadat$`Body.weight.(g)`[i]
  
  
  # Convert the list to a JSON string
  json_string <- toJSON(json_data, pretty = TRUE, auto_unbox = TRUE)
  
  # Save the JSON string to a file
  if (saveoutput == 1) {
    ouputfilename <- gsub('.jpg','.json',tbl_files$fname[i]) # rename input file
    write(json_string, file.path(diroutput, ouputfilename))  
    print(paste0("Wrote ", file.path(diroutput, ouputfilename)))
  }
  
  # clean json 
  rm (json_string)
}

Output Examples

The r package jsonlite offers several formatting options. Depending on your needs you may have some preferences.

Code
print(toJSON(json_data, pretty = TRUE, auto_unbox = TRUE))
{
  "SynchrotronName": "Dummy-synchrotron",
  "SynchrotronAcronym": "Sync1234",
  "IntendedFor": "DS09_statA_20scan-ph1-Int_020_slc1234.jpg",
  "BeamlineAcronym": "BL20B2",
  "BeamtimeSessionStartDate": "2024-01-13",
  "BeamtimeSessionEndDate": "2024-01-13",
  "SpeciesName": "mouse (mus musculus)",
  "SpeciesStrain": "C123456 ",
  "SubjectID": "DS09",
  "Sex": "F",
  "BodyWeight_gr": 26.8,
  "BodyPart": "brain",
  "DetectorArrayWidth": "123",
  "DetectorArrayHeight": "456",
  "DetectorBitDepth": "16",
  "DetectorPixelSize": "6",
  "RotationAngle": "180",
  "CenterOfRotationMethod": "manual",
  "CenterOfRotationValue": "3",
  "ReconMatrixSize ": "[1,1,2]",
  "ReconMethodNameMethodName": "dummy-method",
  "MethodImplementationVersion ": "in-house-bash-script-v02",
  "FilterType ": "gaussian"
} 
Code
print(toJSON(json_data, pretty = TRUE, auto_unbox = FALSE))
{
  "SynchrotronName": ["Dummy-synchrotron"],
  "SynchrotronAcronym": ["Sync1234"],
  "IntendedFor": ["DS09_statA_20scan-ph1-Int_020_slc1234.jpg"],
  "BeamlineAcronym": ["BL20B2"],
  "BeamtimeSessionStartDate": ["2024-01-13"],
  "BeamtimeSessionEndDate": ["2024-01-13"],
  "SpeciesName": ["mouse (mus musculus)"],
  "SpeciesStrain": ["C123456 "],
  "SubjectID": ["DS09"],
  "Sex": ["F"],
  "BodyWeight_gr": [26.8],
  "BodyPart": ["brain"],
  "DetectorArrayWidth": ["123"],
  "DetectorArrayHeight": ["456"],
  "DetectorBitDepth": ["16"],
  "DetectorPixelSize": ["6"],
  "RotationAngle": ["180"],
  "CenterOfRotationMethod": ["manual"],
  "CenterOfRotationValue": ["3"],
  "ReconMatrixSize ": ["[1,1,2]"],
  "ReconMethodNameMethodName": ["dummy-method"],
  "MethodImplementationVersion ": ["in-house-bash-script-v02"],
  "FilterType ": ["gaussian"]
} 
Code
library(knitr)
library(dplyr)
library(kableExtra)

# Enter input directories 
dirinput <- 'dummy_data'
diroutput <- '../../_data/'

# Read table with JSON fields 
jsonFields <- read.csv(file.path(dirinput,'Dummy_JSON_fields.csv'))

# Read table with subject information
subj_info <- openxlsx::read.xlsx(file.path(dirinput,'DummyData1_20241234_subjects.xlsx'), sheet = 'subject_info')

# Create table filenames 
files <- dir(file.path(dirinput,'Images')) # Find all files in our images folder

# Use pattern to take only subject images
fname <- files[grepl(paste0('^DS.*.jpg$'),files)] 
tbl_files <- as.data.frame(fname) 

# First filename part contains subject ID. Create new column with that info. 
tbl_files$subject <- sapply(strsplit(fname,'_'),'[[',1) 
# Preserve the sorting of the field names as in the table
sorted_fieldnames <- factor(jsonFields$Data.name, levels = jsonFields$Data.name) 

# Transform table into a list. Each element is a fieldname with its values
json_data <- lapply(split(jsonFields$Permissible.values,sorted_fieldnames),as.character)

library(jsonlite)

# Join tables with filenames and subject information by subjID
metadat <- dplyr::full_join(x=tbl_files, y=subj_info, by=join_by('subject'=='subjID'),keep=FALSE)  

# write JSON files 
saveoutput <- 0
for (i in 1:nrow(tbl_files)) {
  
  # Complete Fields with info From table 
  json_data$IntendedFor = metadat$fname[i]
  json_data$SubjectID = metadat$subject[i]
  json_data$Sex = metadat$Sex[i]
  json_data$BodyWeight_gr = metadat$`Body.weight.(g)`[i]
  
  
  # Convert the list to a JSON string
  json_string <- toJSON(json_data, pretty = TRUE, auto_unbox = TRUE)
  
  # Save the JSON string to a file
  if (saveoutput == 1) {
    ouputfilename <- gsub('.jpg','.json',tbl_files$fname[i]) # rename input file
    write(json_string, file.path(diroutput, ouputfilename))  
    print(paste0("Wrote ", file.path(diroutput, ouputfilename)))
  }
  
  # clean json 
  rm (json_string)
}
Back to top