Unnest: Parse a string as a record using a layout

class txf.Unnest(pipeline, input, outputs, layout='csv', **config)

The Unnest transform extracts one level of a nested record stored as a string in some layout. Unnest is the logical inverse of Nest.

pipeline: Transform

The input pipeline (required).

input: str

The field to unnest. It will be dropped from the schema, so use Copy to preserve it.

outputs: tuple(str)

The output fields to be extracted. Only the listed fields will be extracted. They cannot overwrite existing fields. Use Drop to remove unwanted fields.

layout: str

The layout of the input string. Supported unnesting layouts are:

  • csv Comma-separated values. The outputs will be used to provide the field names.

  • json, jsonl JavaScript Object Notation records ({..}). Only keys from outputs will be returned

  • md GitHub Markdown rows. The outputs will be used to provide the field names.

  • text Treats the field as an array with one text value tagged with the first output name.

config: kwargs

Configuration parameters that will be passed to the unnesting reader.

Usage

Unnest(p, 'CSV', ('F1', 'F2',), 'csv')
Unnest(p, 'Dict', ('Sales 1992', 'Sales 1993',), 'py')