Create Table

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

Create Table Component

Create or replace a table.

Matillion ETL for Azure Synapse: when using the FLOAT data type, users can enter any value that is greater than or equal to 1 and less than or equal to 2,147,483,647. If a FLOAT with a size less than or equal to 7 is used, a column of type REAL will be produced in the database—note that this column will still appear as a FLOAT type within Matillion ETL, simply with a smaller size. Conversely, if a FLOAT with a size greater than or equal to 8 is used, the traditional FLOAT value is used for the column both in the database and Matillion ETL. For more information, please refer to the float and real (Transact-SQL).

Matillion ETL for Delta Lake on Databricks: users may experience casting errors if using binary values.

Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.

Properties

Snowflake Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Create/Replace	Select	Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if: Users did not expect a table to already exist. Users did not expect to have a different schema to the one defined in this component. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost. Note: Since other database objects might depend upon this table, drop ... cascade is used in the "Comment" property, which may remove many other database objects.
Database	Select	Choose the database that the newly created table will belong to.
Schema	Select	Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
New Table Name	Text	The name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. To change this behaviour, please consult Snowflake documentation target="_blank">case sensitivity.
Table Type	Select	The type of Snowflake table to create: Permanent: a table that holds data indefinitely and that can be restored using Snowflake's Time Travel. Temporary: a table that is automatically destroyed at the end of the Snowflake session. Transient: a table that holds data indefinitely, but that cannot be restored.
Columns	Column Name	The name of each new column.
	Data Type	VARCHAR: this type can hold any kind of data, subject to a maximum size. More... NUMBER: this type is suitable for whole-number types (no decimals). More... Note: You can enter AUTOINCREMENT or IDENTITY as the default value for any numeric-type column and the column will automatically give an incremented value on each new row starting from 1. This feature is useful when creating a unique key on the table. FLOAT: this type is suitable for numeric types, with or without decimals. More... BOOLEAN: this type is suitable for data that is either 'true' or 'false'. More... DATE: this type is suitable for dates without times. More... TIMESTAMP: this type is suitable for timestamps. More... TIME: this type is suitable for times target="_blank">More... VARIANT: a flexible type that can be used for any purpose. More...
	Size	For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
	Precision	Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
	Default Value	The default value under this column for any row.
	Not Null	True if this column does not accept null values.
	Unique	Mark this column as unique, declaring that all values in it are unique with respect to one another. You may have multiple unique columns per table.
	Comment	A location to store descriptive VARCHAR comments against columns. Used to contextualize the content being stored in tables in your database.
Default DDL Collation	String	Set the default DDL collation. Setting this parameter forces all subsequently created columns in the affected table to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL. For more information, refer to the Snowflake documentation.
Primary Keys	Select	Declare one column to be a primary key. Primary keys are, by default, unique. Users can only have one primary key per table.
Clustering Keys	Select	Specify clustering key(s) to use. Defining clustering keys can allow for more efficient sorting and scanning of large tables. More...
Data Retention Time in Days	Integer	Set a number of days for which data is retained after deletion. More...
Comment	Text	Attach a comment to the table. More...

Redshift Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Schema	Select	Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
New Table Name	Text	The name of the table to create or replace.
Create/Replace	Select	Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if: Users did not expect a table to already exist. Users did not expect to have a different schema to the one defined in this component. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
Table Metadata	Column Name	The name of the new column.
	Data Type	Select the data type. For more information, read Data types. Text: this type can hold any kind of data, subject to a maximum size. More... Integer: this type is suitable for whole-number types (no decimals). Selecting the integer type with Size 1-4 activates the Redshift type SMALLINT. Size 5-9 activates the Redshift type INTEGER. Size 10 or greater activates the Redshift type BIGINT. More... Numeric: this type is suitable for numeric types, with or without decimals. More... Real: this type is suitable for data of a single precision floating-point number. More... Double Precision: this type is suitable for data of a double precision floating-point number. More... Boolean: this type is suitable for data that is either 'true' or 'false'. More... Date: this type is suitable for dates without times. More... DateTime: this type is suitable for dates, times, or timestamps (both date and time). More... SUPER: use the SUPER data type to store semi-structured data or documents as values. More...
	Size	For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. With Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes. For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
	Decimal Places	Relevant only for 'Numeric' data, it is the maximum number of digits that may appear to the right of the decimal point.
	Encoding Type	The Redshift compression encoding. See the Redshift documentation for details of the various available encodings. Note: Although users can manually specify the encoding type here, it is advised to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component, provided you are truncating the table.
	Allow Nullable	When 'True', Matillion ETL specifies that the column accepts null values. When 'False', Matillion ETL specifies that the column is not allowed to contain null values. Default is 'True'.
	Comment	A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
Distribution Style	Select	All: copy rows to all nodes in the Redshift Cluster. Auto: Redshift assigns an optimal distribution style based on the size of the table data. Even: the leader node distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column. Key: the rows are distributed according to the values in one column. The leader node places matching values on the same node slice. If you distribute a pair of tables on the joining keys, the leader node collocates the rows on the slices according to the values in the joining columns, so that matching values from the common columns are physically stored together. For more information, please refer to the Redshift documentation.
Distribution Key	Select	This property is only activated if the Distribution Style is set to 'Key'. Users must select which column is used to determine which cluster node the row is stored on.
Sort Key	Select	This setting is optional, and allows users to specify the columns from the input that should be set as the table's sort key. Sort keys are critical for efficient performance. For more information, we advise users refer to the Redshift documentation.
Sort Key Options	Select	Choose whether the sort key is of a 'compound' or 'interleaved' variety. For more information, please refer to the Redshift documentation.
Primary Key	Select	This setting is optional, and specifies the column from the input that should be set as the table's primary key.
Identity Columns	Column Name	The name of the column. It must match the name of a column defined in the Table Metadata property, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of Data Type 'Integer' or 'Numeric' with zero (0) decimal places.
	Seed	Specifies the starting value.
	Step	Specifies the increment between values.
Backup Table	Select	Specify whether the created table is to be included in automated and manual cluster snapshots. 'No' has no effect on automatic replication of data to other nodes within the cluster, meaning that tables set with 'No' in this property are restored in a node failure. The default setting is 'Yes'.

BigQuery Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Project	Select	Select the project that the newly created table will belong to.
Dataset	Select	Select the table dataset. For more information on using datasets, please refer to the BigQuery documentation.
New Table Name	Text	The name of the table to create or replace.
Table Metadata	Field Name	The name of the new field.
	Data Type	For more information on available BigQuery data types please refer to the GCP documentation. String: this type can hold any kind of data, subject to a maximum size. Integer: this type is suitable for whole-number types (no decimals). Float: this type is suitable for numeric types, with or without decimals. Numeric: this data type is suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale. Boolean: this type is suitable for data that is either 'true' or 'false'. Date: a formatted date object without time. See the GCP documentation. Time: a formatted time object without date. See the GCP documentation. DateTime: a formatted timestamp containing both date and time that is easily readable by the user. See the GCP documentation. Timestamp: this type is a timestamp left unformatted (exists as Unix/Epoch Time).
	Mode	The field mode. Default is 'NULLABLE'. NULLABLE: Field allows null values. REQUIRED: Field does not accept null values. REPEATED: Field can accept multiple values.
	Comment	A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
	Define Nested Metadata	When the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata.
Create/Replace	Select	Create: (default option for this property) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create if not exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct the schema, which could lead to errors later on in the ETL job if: Users did not expect a table to already exist. Users did not expect to have a different schema to the one defined in this component. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the component succeeds, the table matches the schema defined in this component. However, any data from the existing table (to be replaced) will be lost.
Partitioning	Select	Choose whether to create a date-partitioned table or not. Column: Partitions the table's data according to the specification in the Partitioning Field property, activated when this option is selected. Day: Selecting Day will create the table as a partition table, which will load data into separate date-partitions and can be queried with the '_PARTITIONTIME' pseudo-column. Select Day when your data is spread out over a wide range of dates or if your data is continually added over time. This allows large datasets to be split into multiple parts with each part corresponding to a single day of data. Hour: Select Hour if your tables have a high volume of data spanning a short date range (typically less than six months of timestamp values). Hourly partitioning allows the addressing of data at hour-level granularity, such as when appending, truncating, or deleting data from a particular partition. Month: Select Month if your tables have a relatively small amount of data for each day, but span a wide date range. None: No partitioning. Year: Select Year if your tables have a relatively small amount of data for each day, but span a wide date range. Both Year and Month are recommended options if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than a few hundred dates). For more information, we recommend section "Daily partitioning versus hourly, monthly, or yearly partitioning" from Introduction to partitioned tables. Enabling partitioning on a table will activate additional properties in the Table Input component, which allows the partition column to be included in the input.
Partitioning Field	Select	This parameter appears when 'Column' is selected in the Partitioning property. Partitioning Field has a drop-down, which should be populated only by Date or Timestamp column types defined in the Table Metadata. For more information, see here.
Partition Expiration Time	Text	Specify the number of milliseconds before the partition expires (field can be left blank for no expiration).
KMS Encryption	Select	Support for customer-managed encryption. By default, this is set to 'No'. Your encryption keys are stored within Cloud KMS. For more information, see here.
Location	Select	This parameter appears when KMS Encryption is set to 'Yes'. Choose a location that matches the location of the BigQuery dataset.
Key Ring	Select	This parameter appears when KMS Encryption is set to 'Yes'. Choose from a group of keys. On the GCP console in KMS, a Key Ring or multiple Key Rings will have been created.
Key	Select	This parameter appears when KMS Encryption is set to 'Yes'. Choose the Encryption Key from the Key Ring.
Cluster Columns	Select	Creates a clustered table. Allows the optional selection of up to four columns. Columns of types Integer, String, Boolean, Date, and Timestamp are supported. For more information, see here.

Synapse Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Create/Replace	Select	Select the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost.
Schema	Select	Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on schemas, please see the Azure Synapse documentation.
New Table Name	String	Specify the name of the table to create or replace. Note: This field is case-sensitive by default, since Matillion ETL uses quoted identifiers. For more information, please refer to the Azure Synapse documentation.
Table Metadata	Column Name	Provide the name of each new column.
	Data Type	Select the data type. The available data types are: DATE: this data type is suitable for dates without times. More... DATETIME: this data type is suitable for timestamps. More... TIME: this data type is suitable for times target="_blank">More... INTEGER: this data type is suitable for whole number types (no decimals). More... NUMERIC: this data type is suitable for numeric types, with or without decimals. More... TEXT: this data type is suitable for text types. More... FLOAT: this data type is suitable for approximate number data types for use with floating point numeric data. More... BOOLEAN: this data type is suitable for data whether values are either "true" or "false". More...
	Size	The size column sets the size, except for the data types BOOLEAN, DATE, DATETIME, and TIME. For more information about DATE, read date (Transact-SQL). For DATETIME, the size column sets the precision of the datetime offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds. If the DATETIME size is set to 0, the returned datetime will be 26 positions (YYYY-MM-DD hh:mm:ss {+\|-}hh:mm). If the DATETIME size is set to a precision of 1, the returned datetime will be 28 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.n {+\|-}hh:mm). If the DATETIME size is set to a precision of 7, the returned datetime will be 34 positions (26 + precision + 1) (YYYY-MM-DD hh:mm:ss.nnnnnnn {+\|-}hh:mm). For more information, read datetime (Transact-SQL). For TIME, the size column sets the precision of the time offset, and accepted values range from 0-7, ranging from 0 to 9999999, that represent the fractional seconds. If the TIME size is set to 0, the returned time will be 8 positions (hh:mm:ss). If the TIME size is set to 1, the returned time will be 10 positions (8 + precision + 1) (hh:mm:ss.n). If the TIME size is set to 7, the returned time will be 16 positions (8 + precision + 1) (hh:mm:ss.nnnnnnn).
	Scale	Define the scale. More...
	Allow Nullable	Select whether to allow nullable values. More...
Distribution Style	Select	Select the distribution style Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution. Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse. Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour. For more information, please read this article.
Distribution Column	Select	Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".
Index Type	Select	Select the table indexing type. Options include: Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clustered index column. Choosing this option prompts the Index Column Grid property. Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property. Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table. For more information, please consult the Azure Synapse documentation.
Index Column Grid	Name	The name of each column.
Index Column Grid	Sort	Assign a sort orientation of either ascending (Asc) or descending (Desc).
Index Column Order	Multiple Select	Select the columns in the order to be indexed.
Partition Key	Select	Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution. For more information on table partitions, please refer to this article.

Delta Lake Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Create Method	Select	Select the function of the component from the following options: Create: (default setting) this option creates a new table. Creating a new table will generate an error if a table with the same name already exists. However, existing data will not be destroyed. Create If Not Exists: this option will only create a new table when a table of the same name does not already exist. This option will not destroy existing data. Replace: this option drops any existing table of the same name and then creates a new table. Selecting this option guarantees that after the Orchestration Job has run, the table matches the schema defined in this component. However, any data from the existing table (that is, the table to be replaced upon the new job run) will be lost. This method is restricted to tables of file type Delta.
Catalog	Select	Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database	Select	Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Table Name	String	The name of the table to create or replace.
Table Metadata	Column Name	The column name from the input flow. Add as many rows to the editor as you need, one per input column.
	Data Type	Select from INTEGER, NUMBER, FLOAT, TEXT, TIMESTAMP, DATE, BOOLEAN, BINARY as the data type for this column.
	Size	Set the data type size.
	Scale	Set the data type scale.
	Comment	A location to store descriptive text comments against columns. Used to contextualize the content being stored in tables in your database.
Table Type	Select	Select whether the new table will be of type Managed or Unmanaged. The default setting is Managed. Read Managed table for more information.
Using	Select	Select the file type. Available types: CSV, Delta, JSON, ORC, Parquet. This property is only available when Table Type is set to Unmanaged.
Location	S3 Bucket	(AWS only) Specify the S3 bucket location. This property is only available when Table Type is set to Unmanaged.
Storage Account	Select	(Azure only) Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview. This property is only available when Table Type is set to Unmanaged.
Blob Container	Select	(Azure only) A Blob Storage location. The available blob containers will depend on the selected storage account. This property is only available when Table Type is set to Unmanaged.
Partition Keys	Column Select	Specify any columns to be used as partition keys.
Table Properties	Key	Specify any table metadata using key=value pairs. Specify the key.
Table Properties	Value	Specify the key's value.
Comment	String	Use this property to prescribe any meta comments about the table.

Variable Exports

This component makes the following values available to export into variables:

Source	Description
Table Recreated	Whether or not the table was (re)created. This is useful when "Create/replace" is set to 'Create if not exists' so users can tell whether the table needed to be created or not.

Strategy

Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.

What's Next

Delete Tables

Table of contents

Create Table Component
Properties
Snowflake Properties
Redshift Properties
BigQuery Properties
Synapse Properties
Delta Lake Properties
Variable Exports
Strategy

Create Table

Create Table Component

Properties

Snowflake Properties

Redshift Properties

BigQuery Properties

Synapse Properties

Delta Lake Properties

Variable Exports

Strategy

What's Next