Iterate: Expand ragged records into multiple rows¶
- class txf.Iterate(pipeline, input, tags, values[, layout='csv'[, **config]])¶
The
Iteratetransform is a cross betweenFoldandUnnest. It takes a nested value in some layout and expands it, but it assumes the value is ragged (e.g., a variable length array or a record with variant schemas).To adapt the ragged structure to a fixed schema, it produces two fields: The tags and the values. Each input row then generates one row per entry from the nested value. Because the schema is variable, both fields will be strings and later transforms can sort out the data typing. To avoid losing data, empty records will produce one row with empty strings for the outputs.
For arrays (JSON arrays, Markdown rows or csv rows), the tags are the numeric indices; for structured records, the tags are the record keys.
- inputs: tuple(str)¶
The list of fields to be folded. They will be dropped from the output, so use
Copyto preserve them.
- tags: str¶
The output field receiving the record keys or the (0-based) array indices. It cannot overwrite existing fields, so use
Dropto remove unwanted fields.
- values: str¶
The output field receiving the record values or the array entries. It cannot overwrite existing fields, so use
Dropto remove unwanted fields.
- layout: str¶
The layout of the nested record or array. Supported nesting layouts are:
csvComma-separated values treated as an array.json,jsonlJavaScript Object Notation records or arrays ({..}or[...])mdGitHub Markdown row values treated as an arraytextSingle text field treated as an array with one text value.
- config: kwargs¶
Configuration parameters that will be passed to the unnesting layout reader.
Usage¶
Iterate(p, 'Ragged', 'Index', 'Value', 'csv')
Iterate(p, 'Variant', 'Key', 'Value', 'jsonl')