Aqueduct
This is a generated JSONSchema reference for the Aqueducts configuration.
Title: Aqueduct
|
|
Type |
object |
Additional properties |
|
Description: Definition for an Aqueduct
data pipeline
Property |
Pattern |
Type |
Title/Description |
+ sources |
No |
array |
Definition of the data sources for this pipeline |
+ stages |
No |
array of array |
A sequential list of transformations to execute within the context of this pipeline Nested stages are executed in parallel |
- destination |
No |
Combination |
Destination for the final step of the `Aqueduct` takes the last stage as input for the write operation |
Field: sources
Description: Definition of the data sources for this pipeline
Each item of this array must be |
Description |
Source |
A data source that can be either a delta table ('delta'), a 'file', a 'directory' or an 'odbc' connection |
Field: Source
|
|
Type |
combining |
Additional properties |
|
Defined in |
#/definitions/Source |
Description: A data source that can be either a delta table (delta
), a file
, a directory
or an odbc
connection
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: An in-memory source
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ name |
No |
string |
Name of the in-memory table, existence will be checked at runtime |
Field: type
Description: No description...
Must be one of:
* "InMemory"
Field: name
Description: Name of the in-memory table, existence will be checked at runtime
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: A delta table source
Field: type
Description: No description...
Must be one of:
* "Delta"
Field: name
Description: Name of the delta source, will be the registered table name in the SQL context
Field: location
Description: A URL or Path to the location of the delta table Supports relative local paths
Field: version_ts
|
|
Type |
string or null |
Format |
date-time |
Description: A RFC3339 compliant timestamp to load the delta table state at a specific point in time Used for deltas time traveling feature
Field: storage_options
|
|
Type |
object |
Additional properties |
|
Default |
{} |
Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store
docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: A file source
Field: type
Description: No description...
Must be one of:
* "File"
Field: name
Description: Name of the file source, will be the registered table name in the SQL context
Field: file_type
|
|
Type |
combining |
Additional properties |
|
Defined in |
|
Description: File type of the file to be ingested Supports Parquet
for parquet files, Csv
for CSV files and Json
for JSON files
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: Parquet source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Parquet"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/ParquetSourceOptions |
Description: No description...
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: Csv source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Csv"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/CsvSourceOptions |
Description: No description...
Property |
Pattern |
Type |
Title/Description |
- has_header |
No |
boolean or null |
set to `true` to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are `column_1, column_2, ... column_x` |
- delimiter |
No |
string or null |
set a delimiter character to read this CSV with |
Description: set to true
to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are column_1, column_2, ... column_x
Field: delimiter
Description: set a delimiter character to read this CSV with
Restrictions |
|
Min length |
1 |
Max length |
1 |
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: Json source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Json"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/JsonSourceOptions |
Description: No description...
Field: location
Description: A URL or Path to the location of the delta table Supports relative local paths
Field: storage_options
|
|
Type |
object |
Additional properties |
|
Default |
{} |
Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store
docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: item 3
|
|
Type |
object |
Additional properties |
|
Description: A directory source
Field: type
Description: No description...
Must be one of:
* "Directory"
Field: name
Description: Name of the directory source, will be the registered table name in the SQL context
Field: file_type
|
|
Type |
combining |
Additional properties |
|
Defined in |
|
Description: File type of the file to be ingested Supports Parquet
for parquet files, Csv
for CSV files and Json
for JSON files
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: Parquet source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Parquet"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/ParquetSourceOptions |
Description: No description...
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: Csv source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Csv"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/CsvSourceOptions |
Description: No description...
Property |
Pattern |
Type |
Title/Description |
- has_header |
No |
boolean or null |
set to `true` to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are `column_1, column_2, ... column_x` |
- delimiter |
No |
string or null |
set a delimiter character to read this CSV with |
Description: set to true
to treat first row of CSV as the header column names will be inferred from the header, if there is no header the column names are column_1, column_2, ... column_x
Field: delimiter
Description: set a delimiter character to read this CSV with
Restrictions |
|
Min length |
1 |
Max length |
1 |
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: Json source options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Json"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/JsonSourceOptions |
Description: No description...
Field: location
Description: A URL or Path to the location of the delta table Supports relative local paths
Field: storage_options
|
|
Type |
object |
Additional properties |
|
Default |
{} |
Description: Storage options for the delta table Please reference the delta-rs github repo for more information on available keys (e.g. https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/storage.rs) additionally also reference the object_store
docs (e.g. https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: item 4
|
|
Type |
object |
Additional properties |
|
Description: An ODBC source
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ name |
No |
string |
Name of the ODBC source, will be the registered table name in the SQL context |
+ query |
No |
string |
Query to execute when fetching data from the ODBC connection This query will execute eagerly before the data is processed by the pipeline Size of data returned from the query cannot exceed work memory |
+ connection_string |
No |
string |
ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/) |
Field: type
Description: No description...
Must be one of:
* "Odbc"
Field: name
Description: Name of the ODBC source, will be the registered table name in the SQL context
Field: query
Description: Query to execute when fetching data from the ODBC connection This query will execute eagerly before the data is processed by the pipeline Size of data returned from the query cannot exceed work memory
Field: connection_string
Description: ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)
Field: stages
Description: A sequential list of transformations to execute within the context of this pipeline Nested stages are executed in parallel
Field: stages items
Description: No description...
Each item of this array must be |
Description |
Stage |
Definition for a processing stage in an Aqueduct Pipeline |
Field: Stage
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/Stage |
Description: Definition for a processing stage in an Aqueduct Pipeline
Property |
Pattern |
Type |
Title/Description |
+ name |
No |
string |
Name of the stage, used as the table name for the result of this stage |
+ query |
No |
string |
SQL query that is executed against a datafusion context. Check the datafusion SQL reference for more information https://datafusion.apache.org/user-guide/sql/index.html |
- show |
No |
integer or null |
When set to a value of up to `usize`, will print the result of this stage to the stdout limited by the number Set value to 0 to not limit the outputs |
- explain |
No |
boolean |
When set to 'true' the stage will output the query execution plan |
- explain_analyze |
No |
boolean |
When set to 'true' the stage will output the query execution plan with added execution metrics |
- print_schema |
No |
boolean |
When set to 'true' the stage will pretty print the output schema of the executed query |
Field: name
Description: Name of the stage, used as the table name for the result of this stage
Field: query
Description: SQL query that is executed against a datafusion context. Check the datafusion SQL reference for more information https://datafusion.apache.org/user-guide/sql/index.html
Field: show
|
|
Type |
integer or null |
Format |
uint |
Description: When set to a value of up to usize
, will print the result of this stage to the stdout limited by the number Set value to 0 to not limit the outputs
Field: explain
|
|
Type |
boolean |
Default |
false |
Description: When set to 'true' the stage will output the query execution plan
Field: explain_analyze
|
|
Type |
boolean |
Default |
false |
Description: When set to 'true' the stage will output the query execution plan with added execution metrics
Field: print_schema
|
|
Type |
boolean |
Default |
false |
Description: When set to 'true' the stage will pretty print the output schema of the executed query
Field: destination
|
|
Type |
combining |
Additional properties |
|
Description: Destination for the final step of the Aqueduct
takes the last stage as input for the write operation
Field: Destination
|
|
Type |
combining |
Additional properties |
|
Defined in |
#/definitions/Destination |
Description: Target output for the Aqueduct table
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: An in-memory destination
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ name |
No |
string |
Name to register the table with in the provided `SessionContext` |
Field: type
Description: No description...
Must be one of:
* "InMemory"
Field: name
Description: Name to register the table with in the provided SessionContext
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: A delta table destination
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ name |
No |
string |
Name of the table |
+ location |
No |
string |
Location of the table as a URL e.g. file:///tmp/delta_table/, s3://bucket_name/delta_table |
- storage_options |
No |
object |
DeltaTable storage options |
+ table_properties |
No |
object |
DeltaTable table properties: https://docs.delta.io/latest/table-properties.html |
+ write_mode |
No |
object |
Columns that will be used to determine uniqueness during merge operation Supported types: All primitive types and lists of primitive types |
+ partition_cols |
No |
array of string |
Columns to partition table by |
Field: type
Description: No description...
Must be one of:
* "Delta"
Field: name
Description: Name of the table
Field: location
Description: Location of the table as a URL e.g. file:///tmp/delta_table/, s3://bucket_name/delta_table
Field: storage_options
|
|
Type |
object |
Additional properties |
|
Default |
{} |
Description: DeltaTable storage options
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: table_properties
|
|
Type |
object |
Additional properties |
|
Description: DeltaTable table properties: https://docs.delta.io/latest/table-properties.html
Property |
Pattern |
Type |
Title/Description |
- |
No |
string or null |
- |
Field: additionalProperties
Description: No description...
Field: write_mode
|
|
Type |
combining |
Additional properties |
|
Defined in |
|
Description: Columns that will be used to determine uniqueness during merge operation Supported types: All primitive types and lists of primitive types
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: Append
: appends data to the Destination
Property |
Pattern |
Type |
Title/Description |
+ operation |
No |
enum (of string) |
- |
Field: operation
Description: No description...
Must be one of:
* "Append"
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: Upsert
: upserts data to the Destination
using the specified merge columns
Property |
Pattern |
Type |
Title/Description |
+ operation |
No |
enum (of string) |
- |
+ params |
No |
array of string |
- |
Field: operation
Description: No description...
Must be one of:
* "Upsert"
Field: params
Description: No description...
Field: params items
Description: No description...
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: Replace
: replaces data to the Destination
using the specified ReplaceCondition
s
Property |
Pattern |
Type |
Title/Description |
+ operation |
No |
enum (of string) |
- |
+ params |
No |
array |
- |
Field: operation
Description: No description...
Must be one of:
* "Replace"
Field: params
Description: No description...
Each item of this array must be |
Description |
ReplaceCondition |
Condition used to build a predicate by which data should be replaced in a 'Destination' Expression built is checking equality for the given 'value' of a field with 'field_name' |
Field: ReplaceCondition
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/ReplaceCondition |
Description: Condition used to build a predicate by which data should be replaced in a Destination
Expression built is checking equality for the given value
of a field with field_name
Property |
Pattern |
Type |
Title/Description |
+ column |
No |
string |
- |
+ value |
No |
string |
- |
Field: column
Description: No description...
Field: value
Description: No description...
Field: partition_cols
Description: Columns to partition table by
Field: partition_cols items
Description: No description...
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: A file output destination
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ name |
No |
string |
Name of the file to write |
+ location |
No |
string |
Location of the file as a URL e.g. file:///tmp/output.csv, s3://bucket_name/prefix/output.parquet, s3:://bucket_name/prefix |
+ file_type |
No |
object |
File type, supported types are Parquet and CSV |
- single_file |
No |
boolean |
Describes whether to write a single file (can be used to overwrite destination file) |
- partition_cols |
No |
array of string |
Columns to partition table by |
- storage_options |
No |
object |
Object store storage options |
Field: type
Description: No description...
Must be one of:
* "File"
Field: name
Description: Name of the file to write
Field: location
Description: Location of the file as a URL e.g. file:///tmp/output.csv, s3://bucket_name/prefix/output.parquet, s3:://bucket_name/prefix
Field: file_type
|
|
Type |
combining |
Additional properties |
|
Defined in |
|
Description: File type, supported types are Parquet and CSV
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: Parquet options map, please refer to https://docs.rs/datafusion-common/latest/datafusion_common/config/struct.TableParquetOptions.html for possible options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
- |
Field: type
Description: No description...
Must be one of:
* "Parquet"
Field: options
|
|
Type |
object |
Additional properties |
|
Description: No description...
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: CSV options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
+ options |
No |
object |
Csv options |
Field: type
Description: No description...
Must be one of:
* "Csv"
Field: options
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/CsvDestinationOptions |
Description: Csv options
Property |
Pattern |
Type |
Title/Description |
- has_header |
No |
boolean or null |
Defaults to true, sets a header for the CSV file |
- delimiter |
No |
string or null |
Defaults to `,`, sets the delimiter char for the CSV file |
Description: Defaults to true, sets a header for the CSV file
Field: delimiter
Description: Defaults to ,
, sets the delimiter char for the CSV file
Restrictions |
|
Min length |
1 |
Max length |
1 |
Field: item 2
|
|
Type |
object |
Additional properties |
|
Description: Json destination, no supported options
Property |
Pattern |
Type |
Title/Description |
+ type |
No |
enum (of string) |
- |
Field: type
Description: No description...
Must be one of:
* "Json"
Field: single_file
|
|
Type |
boolean |
Default |
false |
Description: Describes whether to write a single file (can be used to overwrite destination file)
Field: partition_cols
|
|
Type |
array of string |
Default |
[] |
Description: Columns to partition table by
Field: partition_cols items
Description: No description...
Field: storage_options
|
|
Type |
object |
Additional properties |
|
Default |
{} |
Description: Object store storage options
Property |
Pattern |
Type |
Title/Description |
- |
No |
string |
- |
Field: additionalProperties
Description: No description...
Field: item 3
|
|
Type |
object |
Additional properties |
|
Description: An ODBC insert query to write to a DB table
Field: type
Description: No description...
Must be one of:
* "Odbc"
Field: name
Description: Name of the ODBC destination
Field: connection_string
Description: ODBC connection string Please reference the respective database connection string syntax (e.g. https://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/)
Field: write_mode
|
|
Type |
combining |
Additional properties |
|
Defined in |
|
Description: Strategy for performing ODBC write operation
Field: item 0
|
|
Type |
object |
Additional properties |
|
Description: Append
: appends data to the Destination
Property |
Pattern |
Type |
Title/Description |
+ operation |
No |
enum (of string) |
- |
Field: operation
Description: No description...
Must be one of:
* "Append"
Field: item 1
|
|
Type |
object |
Additional properties |
|
Description: Custom
: Inserts data with a prepared stament. Option to perform any number of (non-insert) preliminary statements
Property |
Pattern |
Type |
Title/Description |
+ operation |
No |
enum (of string) |
- |
+ transaction |
No |
object |
SQL statements for `Custom` write mode. |
Field: operation
Description: No description...
Must be one of:
* "Custom"
Field: transaction
|
|
Type |
object |
Additional properties |
|
Defined in |
#/definitions/CustomStatements |
Description: SQL statements for Custom
write mode.
Property |
Pattern |
Type |
Title/Description |
- pre_insert |
No |
string or null |
Optional (non-insert) preliminary statement |
+ insert |
No |
string |
Insert prepared statement |
Field: pre_insert
Description: Optional (non-insert) preliminary statement
Field: insert
Description: Insert prepared statement
Field: batch_size
Description: batch size (rows) to use when inserting data
Field: item 1
Description: No description...
Generated using json-schema-for-humans