📂 Preparing Your Data for Extractinator
To run extraction tasks smoothly, your data needs to follow a simple, structured format.
✅ Supported Formats
Your dataset must be in CSV or JSON format.
🧾 Dataset Requirements
Each record should include a single column or key containing the text you want the model to extract from.
Requirements:
- A column/key with raw text
- This must contain only strings
- You’ll reference this column name in your Task file under the
Input_Field
field
Any other columns/keys can be included but are not processed by the model. They will be passed through as-is in the output.
🧪 Adding Examples (Optional)
Few-shot examples help guide the model’s output.
Create a separate CSV or JSON file with the following columns:
Column | Description |
---|---|
input |
The example input text |
output |
The expected output for that input |
These examples will be shown to the model during extraction if the num_examples
parameter is set to a non-zero value. Beware that adding examples increases the prompt size which can significantly impact the inference speed.
🧰 Output Parser
You must define a Pydantic class to describe the structure of the expected output.
- This is called the OutputParser
- You’ll import it in your Task file via the
Parser_Format
field
For details on building one visually or by code, see parser.md.