Iterate: Expand ragged records into multiple rows

class txf.Iterate(pipeline, input, tags, values[, layout='csv'[, **config]])

The Iterate transform is a cross between Fold and Unnest. It takes a nested value in some layout and expands it, but it assumes the value is ragged (e.g., a variable length array or a record with variant schemas).

To adapt the ragged structure to a fixed schema, it produces two fields: The tags and the values. Each input row then generates one row per entry from the nested value. Because the schema is variable, both fields will be strings and later transforms can sort out the data typing. To avoid losing data, empty records will produce one row with empty strings for the outputs.

For arrays (JSON arrays, Markdown rows or csv rows), the tags are the numeric indices; for structured records, the tags are the record keys.

pipeline: Transform

The input pipeline (required).

inputs: tuple(str)

The list of fields to be folded. They will be dropped from the output, so use Copy to preserve them.

tags: str

The output field receiving the record keys or the (0-based) array indices. It cannot overwrite existing fields, so use Drop to remove unwanted fields.

values: str

The output field receiving the record values or the array entries. It cannot overwrite existing fields, so use Drop to remove unwanted fields.

layout: str

The layout of the nested record or array. Supported nesting layouts are:

  • csv Comma-separated values treated as an array.

  • json, jsonl JavaScript Object Notation records or arrays ({..} or [...])

  • md GitHub Markdown row values treated as an array

  • text Single text field treated as an array with one text value.

config: kwargs

Configuration parameters that will be passed to the unnesting layout reader.

Usage

Iterate(p, 'Ragged', 'Index', 'Value', 'csv')
Iterate(p, 'Variant', 'Key', 'Value', 'jsonl')