miércoles, 22 de noviembre de 2023

jq example parse json

Try extract only the field text from a JSON Lines file called example-english-corpus.jsonl:

Dataset Card for MIRACL Corpus

{
    "docid": "39#0",
    "title": "Albedo",
    "text": "Albedo (meaning 'whiteness') is the measure of the diffuse reflection of solar radiation out of the total solar radiation received by an astronomical body (e.g. a planet like Earth). It is dimensionless and measured on a scale from 0 (corresponding to a black body that absorbs all incident radiation) to 1 (corresponding to a body that reflects all incident radiation)."
}

 

$ jq -r '.text' example-english-corpus.jsonl
Albedo (meaning 'whiteness') is the measure of the diffuse reflection of solar radiation out of the total solar radiation received by an astronomical body (e.g. a planet like Earth). It is dimensionless and measured on a scale from 0 (corresponding to a black body that absorbs all incident radiation) to 1 (corresponding to a body that reflects all incident radiation).

 

So, the question of the day is: save output as txt or jsonl? NLTK, spaCy, Gensim, etc., suppose a txt input.

No hay comentarios:

Publicar un comentario