Skip to content

📂 Preparing Your Data for Extractinator

To run extraction tasks smoothly, your data needs to follow a simple, structured format.


✅ Supported Formats

Your dataset must be in CSV or JSON format.


🧾 Dataset Requirements

Each record should include a single column or key containing the text you want the model to extract from.

Requirements:

  • A column/key with raw text
  • This must contain only strings
  • You’ll reference this column name in your Task file under the Input_Field field

Any other columns/keys can be included but are not processed by the model. They will be passed through as-is in the output.


🧪 Adding Examples (Optional)

Few-shot examples help guide the model’s output.

Create a separate CSV or JSON file with the following columns:

Column Description
input The example input text
output The expected output for that input

These examples will be shown to the model during extraction if the num_examples parameter is set to a non-zero value. Beware that adding examples increases the prompt size which can significantly impact the inference speed.


🧰 Output Parser

You must define a Pydantic class to describe the structure of the expected output.

  • This is called the OutputParser
  • You’ll import it in your Task file via the Parser_Format field

For details on building one visually or by code, see parser.md.