Skip to content
Snippets Groups Projects
Commit d6c4697b authored by Ubuntu's avatar Ubuntu
Browse files

:art: add syntax highlighting to README

parent c4d98a8e
No related branches found
No related tags found
No related merge requests found
...@@ -21,18 +21,17 @@ To calculate the pseudo-perplexities, any huggingface model from the BERT family ...@@ -21,18 +21,17 @@ To calculate the pseudo-perplexities, any huggingface model from the BERT family
### Getting Started ### Getting Started
To calcualte the pseudo-perplexity per word, run: To calcualte the pseudo-perplexity per word, run:
``` ```bash
# Installing the dependencies # Installing the dependencies
>>> pip install transformers, tqdm pip install transformers, tqdm
>>> python3 compute_pppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory --window-size 11
python3 compute_pppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory --window-size 11
``` ```
As input, the script expects a json file with the following structure: As input, the script expects a json file with the following structure:
``` ```json
[ [
{ {
"page_id": "ocr_27812752_p1.json", "page_id": "ocr_27812752_p1.json",
...@@ -54,7 +53,7 @@ As input, the script expects a json file with the following structure: ...@@ -54,7 +53,7 @@ As input, the script expects a json file with the following structure:
The output are json files containing the pseudo-perplexity scores: The output are json files containing the pseudo-perplexity scores:
``` ```json
[ [
{ {
"page_id": "ocr_27812752_p1.json", "page_id": "ocr_27812752_p1.json",
...@@ -85,18 +84,18 @@ To calculate the pseudo-perplexity per sentence, we use the [Language Model Perp ...@@ -85,18 +84,18 @@ To calculate the pseudo-perplexity per sentence, we use the [Language Model Perp
Install the Language Model Perplexity (LM-PPL) repository: Install the Language Model Perplexity (LM-PPL) repository:
``` ```bash
>>> pip install lmppl pip install lmppl
``` ```
To use the repositroy, run: To use the repositroy, run:
``` ```bash
>>> python3 run_lmppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory python3 run_lmppl.py -m your-model-name -i path/to/your/data -o path/to/output/directory
``` ```
As input, the script expects a json file with the following structure: As input, the script expects a json file with the following structure:
``` ```json
[ [
{ {
"sent_id": "ocr_26843985_p4_6", "sent_id": "ocr_26843985_p4_6",
...@@ -117,7 +116,7 @@ As input, the script expects a json file with the following structure: ...@@ -117,7 +116,7 @@ As input, the script expects a json file with the following structure:
The output are json files containing the pseudo-perplexity scores: The output are json files containing the pseudo-perplexity scores:
``` ```json
[ [
{ {
"sent_id": "ocr_26843985_p4_6", "sent_id": "ocr_26843985_p4_6",
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment