1

I want to find out the datatype of each column of a table?

For example, let's say my table was created using this:

create table X
(
col1 string,
col2 int,
col3 int
)

I want to do a command that will output somethign like this:

column datatype
col1  string
col2  int

Is there a command for this? Preferably in SparkSQL. But, if not, then how to get this data using another way? I'm using spark sql to query hive tables. Perhaps through the metadata in HIVE? thank you.

3 Answers 3

5

You can read the Hive table as DataFrame and use the printSchema() function.

In pyspark repl:

from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
table=hive_context("database_name.table_name") 
table.printSchema()

And similar in spark-shell repl(Scala):

import org.apache.spark.sql.hive.HiveContext
val hiveContext = new org.apache.spark.sql.hive.HiveContext
val table=hiveContext.table("database_name.table_name")
table.printSchema
Sign up to request clarification or add additional context in comments.

5 Comments

Cool, thank you. Is there a way to do this using just the sql syntax? We have some internal tools where the spark objects are not exposed to the user - only SQL commands. Thank you.
We need to have HiveContext to read tables from Hive. I am not sure using SqlContext we can do the same or not.
The pyspark example is missing the table function. Line 3 in the pyspark should be: table=hive_context.table("database_name.table_name") without this you will get an error.
df.dtypes It will give dataframe column names and respective data type.
all Varchar becomes 'string' is there a way to get dtypes with different varchar? ex: I want to differentiate varchar(1), varchar(10), and so on...
3

in scala : Create a dataframe for your table and try below:

df.dtypes

Your result:

Array((PS_PROD_DESC,StringType), (PS_OPRTNG_UNIT_ID,StringType),...)

Comments

2

You can use desc <db_name>.<tab_name> (or) spark.catalog.listColumns("<db>.<tab_name>")

Example:

spark.sql("create table X(col1 string,col2 int,col3 int)")

Using desc to get column_name and datatype:

spark.sql("desc default.x").select("col_name","data_type").show()

//+--------+---------+
//|col_name|data_type|
//+--------+---------+
//|    col1|   string|
//|    col2|      int|
//|    col3|      int|
//+--------+---------+

Using spark.catalog to get column_name and data_type:

spark.catalog.listColumns("default.x").select("name","dataType")show()

//+----+--------+
//|name|dataType|
//+----+--------+
//|col1|  string|
//|col2|     int|
//|col3|     int|
//+----+--------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.