1

I am running a sql notebook on databricks. I would like to analyze a table with half a billion records in it. I can run simple sql queries on the data. However, I need to change the date column type from str to date.

Unfortunately, update/alter statements do not seem to be supported by sparkSQL so it seems I cannot modify the data in the table.

What would be the one-line of code that would allow me to convert the SQL table to a python data structure (in pyspark) in the next cell? Then I could modify the file and return it to SQL.

0

3 Answers 3

5
dataFrame = sqlContext.sql('select * from myTable')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! And How would I return it back to sql so I can go back to querying it in sql in the next cell? Probably also one line. Is it something like dataFrame.to_sql (Have no clue. Just made it up to give you an idea of what I mean)
@Semihcan, you want the registerTempTable function spark.apache.org/docs/latest/…
1
df=sqlContext.sql("select * from table")

To convert dataframe back to sql view,

df.createOrReplaceTempView("myview")

Comments

0

# Read from SQL table

df = spark.read.table("your_database.source_table")

# Transform: filter age > 25

df_filtered = df.filter(df.age > 25).select("name", "age")

# Write to new SQL table

df_filtered.write.mode("overwrite").saveAsTable("your_database.filtered_table")

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.