- DarkLight
Create Table
- DarkLight
Create Table Component
Create or replace a table.
Matillion ETL for Azure Synapse: when using the FLOAT data type, users can enter any value that is greater than or equal to 1 and less than or equal to 2,147,483,647. If a FLOAT with a size less than or equal to 7 is used, a column of type REAL will be produced in the database—note that this column will still appear as a FLOAT type within Matillion ETL, simply with a smaller size. Conversely, if a FLOAT with a size greater than or equal to 8 is used, the traditional FLOAT value is used for the column both in the database and Matillion ETL. For more information, please refer to the float and real (Transact-SQL).
Matillion ETL for Delta Lake on Databricks: users may experience casting errors if using binary values.
Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.
Properties
Snowflake Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Create/Replace | Select | Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
drop ... cascadeis used in the "Comment" property, which may remove many other database objects. |
Database | Select | Choose the database that the newly created table will belong to. |
Schema | Select | Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support. |
New Table Name | Text | The name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. To change this behaviour, please consult Snowflake documentation target="_blank">case sensitivity. |
Table Type | Select | The type of Snowflake table to create: Permanent: a table that holds data indefinitely and that can be restored using Snowflake's Time Travel. Temporary: a table that is automatically destroyed at the end of the Snowflake session. Transient: a table that holds data indefinitely, but that cannot be restored. |
Columns | Column Name | The name of each new column. |
Data Type | VARCHAR: this type can hold any kind of data, subject to a maximum size. More... NUMBER: this type is suitable for whole-number types (no decimals). More... Note: You can enter AUTOINCREMENT or IDENTITY as the default value for any numeric-type column and the column will automatically give an incremented value on each new row starting from 1. This feature is useful when creating a unique key on the table. FLOAT: this type is suitable for numeric types, with or without decimals. More... BOOLEAN: this type is suitable for data that is either 'true' or 'false'. More... DATE: this type is suitable for dates without times. More... TIMESTAMP: this type is suitable for timestamps. More... TIME: this type is suitable for times target="_blank">More... VARIANT: a flexible type that can be used for any purpose. More... | |
Size | For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes. | |
Precision | Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point. | |
Default Value | The default value under this column for any row. | |
Not Null | True if this column does not accept null values. | |
Unique | Mark this column as unique, declaring that all values in it are unique with respect to one another. You may have multiple unique columns per table. | |
Comment | A location to store descriptive VARCHAR comments against columns. Used to contextualize the content being stored in tables in your database. | |
Default DDL Collation | String | Set the default DDL collation. Setting this parameter forces all subsequently created columns in the affected table to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL. For more information, refer to the Snowflake documentation. |
Primary Keys | Select | Declare one column to be a primary key. Primary keys are, by default, unique. Users can only have one primary key per table. |
Clustering Keys | Select | Specify clustering key(s) to use. Defining clustering keys can allow for more efficient sorting and scanning of large tables. More... |
Data Retention Time in Days | Integer | Set a number of days for which data is retained after deletion. More... |
Comment | Text | Attach a comment to the table. More... |
Redshift Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Schema | Select | Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support. |
New Table Name | Text | The name of the table to create or replace. |
Create/Replace | Select | Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
|
Table Metadata | Column Name | The name of the new column. |
Data Type | Select the data type. For more information, read Data types. Text: this type can hold any kind of data, subject to a maximum size. More... Integer: this type is suitable for whole-number types (no decimals). Selecting the integer type with Size 1-4 activates the Redshift type SMALLINT. Size 5-9 activates the Redshift type INTEGER. Size 10 or greater activates the Redshift type BIGINT. More... Numeric: this type is suitable for numeric types, with or without decimals. More... Real: this type is suitable for data of a single precision floating-point number. More... Double Precision: this type is suitable for data of a double precision floating-point number. More... Boolean: this type is suitable for data that is either 'true' or 'false'. More... Date: this type is suitable for dates without times. More... DateTime: this type is suitable for dates, times, or timestamps (both date and time). More... SUPER: use the SUPER data type to store semi-structured data or documents as values. More... | |
Size | For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. With Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes. For Numeric types, this is the total number of digits allowed, whether before or after the decimal point. | |
Decimal Places | Relevant only for 'Numeric' data, it is the maximum number of digits that may appear to the right of the decimal point. | |
Encoding Type | The Redshift compression encoding. See the Redshift documentation for details of the various available encodings. Note: Although users can manually specify the encoding type here, it is advised to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component, provided you are truncating the table. | |
Allow Nullable | When 'True', Matillion ETL specifies that the column accepts null values. When 'False', Matillion ETL specifies that the column is not allowed to contain null values. Default is 'True'. | |
Comment | A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database. | |
Distribution Style | Select | All: copy rows to all nodes in the Redshift Cluster. Auto: Redshift assigns an optimal distribution style based on the size of the table data. Even: the leader node distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column. Key: the rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns, so that matching values from the common columns are physically stored together. For more information, please refer to the Redshift documentation. |
Distribution Key | Select | This property is only activated if the Distribution Style is set to 'Key'. Users must select which column is used to determine which cluster node the row is stored on. |
Sort Key | Select | This setting is optional, and allows users to specify the columns from the input that should be set as the table's sort key. Sort keys are critical for efficient performance. For more information, we advise users refer to the Redshift documentation. |
Sort Key Options | Select | Choose whether the sort key is of a 'compound' or 'interleaved' variety. For more information, please refer to the Redshift documentation. |
Primary Key | Select | This setting is optional, and specifies the column from the input that should be set as the table's primary key. |
Identity Columns | Column Name | The name of the column. It must match the name of a column defined in the Table Metadata property, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of Data Type 'Integer' or 'Numeric' with zero (0) decimal places. |
Seed | Specifies the starting value. | |
Step | Specifies the increment between values. | |
Backup Table | Select | Specify whether the created table is to be included in automated and manual cluster snapshots. 'No' has no effect on automatic replication of data to other nodes within the cluster, meaning that tables set with 'No' in this property are restored in a node failure. The default setting is 'Yes'. |
BigQuery Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Project | Select | Select the project that the newly created table will belong to. |
Dataset | Select | Select the table dataset. For more information on using datasets, please refer to the BigQuery documentation. |
New Table Name | Text | The name of the table to create or replace. |
Table Metadata | Field Name | The name of the new field. |
Data Type | For more information on available BigQuery data types please refer to the GCP documentation. String: this type can hold any kind of data, subject to a maximum size. Integer: this type is suitable for whole-number types (no decimals). Float: this type is suitable for numeric types, with or without decimals. Numeric: this data type is suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale. Boolean: this type is suitable for data that is either 'true' or 'false'. Date: a formatted date object without time. See the GCP documentation. Time: a formatted time object without date. See the GCP documentation. DateTime: a formatted timestamp containing both date and time that is easily readable by the user. See the GCP documentation. Timestamp: this type is a timestamp left unformatted (exists as Unix/Epoch Time). | |
Mode | The field mode. Default is 'NULLABLE'. NULLABLE: Field allows null values. REQUIRED: Field does not accept null values. REPEATED: Field can accept multiple values. | |
Comment | A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database. | |
Define Nested Metadata | When the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata. | |
Create/Replace | Select | Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if:
|
Partitioning | Select | Choose whether to create a date-partitioned table or not. Column: Partitions the table's data according to the specification in the Partitioning Field property, activated when this option is selected. Day: Selecting Day will create the table as a partition table, which will load data into separate date-partitions and can be queried with the '_PARTITIONTIME' pseudo-column. Select Day when your data is spread out over a wide range of dates or if your data is continually added over time. This allows large datasets to be split into multiple parts with each part corresponding to a single day of data. Hour: Select Hour if your tables have a high volume of data spanning a short date range (typically less than six months of timestamp values). Hourly partitioning allows the addressing of data at hour-level granularity, such as when appending, truncating, or deleting data from a particular partition. Month: Select Month if your tables have a relatively small amount of data for each day, but span a wide date range. None: No partitioning. Year: Select Year if your tables have a relatively small amount of data for each day, but span a wide date range. Both Year and Month are recommended options if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than a few hundred dates). For more information, we recommend section "Daily partitioning versus hourly, monthly, or yearly partitioning" from Introduction to partitioned tables. Enabling partitioning on a table will activate additional properties in the Table Input component, which allows the partition column to be included in the input. |
Partitioning Field | Select | This parameter appears when 'Column' is selected in the Partitioning property. Partitioning Field has a drop-down, which should be populated only by Date or Timestamp column types defined in the Table Metadata. For more information, see here. |
Partition Expiration Time | Text | Specify the number of milliseconds before the partition expires (field can be left blank for no expiration). |
KMS Encryption | Select | Support for customer-managed encryption. By default, this is set to 'No'. Your encryption keys are stored within Cloud KMS. For more information, see here. |
Location | Select | This parameter appears when KMS Encryption is set to 'Yes'. Choose a location that matches the location of the BigQuery dataset. |
Key Ring | Select | This parameter appears when KMS Encryption is set to 'Yes'. Choose from a group of keys. On the GCP console in KMS, a Key Ring or multiple Key Rings will have been created. |
Key | Select | This parameter appears when KMS Encryption is set to 'Yes'. Choose the Encryption Key from the Key Ring. |
Cluster Columns | Select | Creates a clustered table. Allows the optional selection of up to four columns. Columns of types Integer, String, Boolean, Date, and Timestamp are supported. For more information, see here. |
Synapse Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Create/Replace | Select | Select the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost. |
Schema | Select | Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on schemas, please see the Azure Synapse documentation. |
New Table Name | String | Specify the name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, please refer to the Azure Synapse documentation. |
Table Metadata | Column Name | Provide the name of each new column. |
Data Type | Select the data type. The available data types are: DATE: this data type is suitable for dates without times. More... DATETIME: this data type is suitable for timestamps. More... TIME: this data type is suitable for times target="_blank">More... INTEGER: this data type is suitable for whole number types (no decimals). More... NUMERIC: this data type is suitable for numeric types, with or without decimals. More... TEXT: this data type is suitable for text types. More... FLOAT: this data type is suitable for approximate number data types for use with floating point numeric data. More... BOOLEAN: this data type is suitable for data whether values are either "true" or "false". More... | |
Size | The size column sets the size, except for the data types BOOLEAN, DATE, DATETIME, and TIME. For more information about DATE, read date (Transact-SQL). For DATETIME, the size column sets the precision of the datetime offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds. If the DATETIME size is set to 0, the returned datetime will be 26 positions (YYYY-MM-DD hh:mm:ss {+|-}hh:mm). If the DATETIME size is set to a precision of 1, the returned datetime will be 28 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.n {+|-}hh:mm). If the DATETIME size is set to a precision of 7, the returned datetime will be 34 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.nnnnnnn {+|-}hh:mm). For more information, read datetime (Transact-SQL). For TIME, the size column sets the precision of the time offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds. If the TIME size is set to 0, the returned time will be 8 positions (hh:mm:ss). If the TIME size is set to 1, the returned time will be 10 positions (8 + precision + 1) (hh:mm:ss.n). If the TIME size is set to 7, the returned time will be 16 positions (8 + precision + 1) (hh:mm:ss.nnnnnnn). | |
Scale | Define the scale. More... | |
Allow Nullable | Select whether to allow nullable values. More... | |
Distribution Style | Select | Select the distribution style Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution. Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse. Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour. For more information, please read this article. |
Distribution Column | Select | Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash". |
Index Type | Select | Select the table indexing type. Options include: Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. Choosing this option prompts the Index Column Grid property. Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property. Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table. For more information, please consult the Azure Synapse documentation. |
Index Column Grid | Name | The name of each column. |
Sort | Assign a sort orientation of either ascending (Asc) or descending (Desc). | |
Index Column Order | Multiple Select | Select the columns in the order to be indexed. |
Partition Key | Select | Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution. For more information on table partitions, please refer to this article. |
Delta Lake Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Create Method | Select | Select the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost. This method is restricted to tables of file type Delta. |
Catalog | Select | Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter. |
Database | Select | Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup. |
Table Name | String | The name of the table to create or replace. |
Table Metadata | Column Name | The column name from the input flow. Add as many rows to the editor as you need, one per input column. |
Data Type | Select from INTEGER, NUMBER, FLOAT, TEXT, TIMESTAMP, DATE, BOOLEAN, BINARY as the data type for this column. | |
Size | Set the data type size. | |
Scale | Set the data type scale. | |
Comment | A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database. | |
Table Type | Select | Select whether the new table will be of type Managed or Unmanaged. The default setting is Managed. Read Managed table for more information. |
Using | Select | Select the file type. Available types: CSV, Delta, JSON, ORC, Parquet. This property is only available when Table Type is set to Unmanaged. |
Location | S3 Bucket | (AWS only) Specify the S3 bucket location. This property is only available when Table Type is set to Unmanaged. |
Storage Account | Select | (Azure only) Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview. This property is only available when Table Type is set to Unmanaged. |
Blob Container | Select | (Azure only) A Blob Storage location. The available blob containers will depend on the selected storage account. This property is only available when Table Type is set to Unmanaged. |
Partition Keys | Column Select | Specify any columns to be used as partition keys. |
Table Properties | Key | Specify any table metadata using key=value pairs. Specify the key. |
Value | Specify the key's value. | |
Comment | String | Use this property to prescribe any meta comments about the table. |
Variable Exports
This component makes the following values available to export into variables:
Source | Description |
---|---|
Table Recreated | Whether or not the table was (re)created. This is useful when "Create/replace" is set to 'Create if not exists' so users can tell whether the table needed to be created or not. |
Strategy
Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE
statement.