Convert schema of a Spark DataFrame to another DataFrame

Question

I have a Spark DataFrame on PySpark and I want to store its schema into another Spark DataFrame.

For example: I have a sample DataFrame df that looks like -

+---+-------------------+
| id|                  v|
+---+-------------------+
|  0| 0.4707538108432022|
|  0|0.39170676690905415|
|  0| 0.8249512619546295|
|  0| 0.3366111661094958|
|  0| 0.8974360488327017|
+---+-------------------+

I can look out at the schema of df by doing -

df.printSchema()

root
 |-- id: integer (nullable = true)
 |-- v: double (nullable = false)

What I require is a DataFrame that displays above information on df in two columns col_name and dtype.

Expected Output:

+---------+-------------------+
| col_name|              dtype|
+---------+-------------------+
|       id|            integer|
|        v|             double|
+---------+-------------------+

How do I achieve this? I cannot find anything regarding this. Thanks.

I got the desired result by spark.createDataFrame(df.dtypes, ["col_name", "dtypes"]). Thanks. What do you mean by parallelize? — K. K.
– K. K., Commented Oct 23, 2019 at 16:54

pault · Accepted Answer · 2019-10-23 16:53:58Z

1

The simplest thing would be create a dataframe from df.dtypes:

spark.createDataFrame(df.dtypes, ["col_name", "dtype"]).show()
#+--------+------+
#|col_name| dtype|
#+--------+------+
#|      id|   int|
#|       v|double|
#+--------+------+

But if you wanted the dtype column to be as shown in printSchema, you could do so through df.schema

spark.createDataFrame(
    [(d['name'], d['type']) for d in df.schema.jsonValue()['fields']],
    ["col_name", "dtype"]
).show()
#+--------+-------+
#|col_name|  dtype|
#+--------+-------+
#|      id|integer|
#|       v| double|
#+--------+-------+

answered Oct 23, 2019 at 16:53

pault

43.7k17 gold badges121 silver badges161 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Convert schema of a Spark DataFrame to another DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related