pyspark.sql.Catalog.createTable#
- Catalog.createTable(tableName, path=None, source=None, schema=None, description=None, **options)[source]#
Creates a table based on the dataset in a data source.
New in version 2.2.0.
- Parameters
- tableNamestr
name of the table to create.
Changed in version 3.4.0: Allow
tableName
to be qualified with catalog name.- pathstr, optional
the path in which the data for this table exists. When
path
is specified, an external table is created from the data at the given path. Otherwise a managed table is created.- sourcestr, optional
the source of this table such as ‘parquet, ‘orc’, etc. If
source
is not specified, the default data source configured byspark.sql.sources.default
will be used.- schemaclass:StructType, optional
the schema for this table.
- descriptionstr, optional
the description of this table.
Changed in version 3.1.0: Added the
description
parameter.- **optionsdict, optional
extra options to specify in the table.
- Returns
DataFrame
The DataFrame associated with the table.
Examples
Creating a managed table.
>>> _ = spark.catalog.createTable("tbl1", schema=spark.range(1).schema, source='parquet') >>> _ = spark.sql("DROP TABLE tbl1")
Creating an external table
>>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="createTable") as d: ... _ = spark.catalog.createTable( ... "tbl2", schema=spark.range(1).schema, path=d, source='parquet') >>> _ = spark.sql("DROP TABLE tbl2")