diff --git a/README.md b/README.md index 5fec5535f9702980dca173e7bbdaab99eb421cec..ec6e78ae0bb796e3b17a10ac830c213ac01ea51a 100644 --- a/README.md +++ b/README.md @@ -21,18 +21,17 @@ To calculate the pseudo-perplexities, any huggingface model from the BERT family ### Getting Started To calcualte the pseudo-perplexity per word, run: -``` +```bash # Installing the dependencies ->>> pip install transformers, tqdm - ->>> python3 compute_pppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory --window-size 11 +pip install transformers, tqdm +python3 compute_pppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory --window-size 11 ``` As input, the script expects a json file with the following structure: -``` +```json [ { "page_id": "ocr_27812752_p1.json", @@ -54,7 +53,7 @@ As input, the script expects a json file with the following structure: The output are json files containing the pseudo-perplexity scores: -``` +```json [ { "page_id": "ocr_27812752_p1.json", @@ -85,18 +84,18 @@ To calculate the pseudo-perplexity per sentence, we use the [Language Model Perp Install the Language Model Perplexity (LM-PPL) repository: -``` ->>> pip install lmppl +```bash +pip install lmppl ``` To use the repositroy, run: -``` ->>> python3 run_lmppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory +```bash +python3 run_lmppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory ``` As input, the script expects a json file with the following structure: -``` +```json [ { "sent_id": "ocr_26843985_p4_6", @@ -117,7 +116,7 @@ As input, the script expects a json file with the following structure: The output are json files containing the pseudo-perplexity scores: -``` +```json [ { "sent_id": "ocr_26843985_p4_6",