1

I'm trying to read a csv file that is stored on a Azure Data Lake Gen 2, Python runs in Databricks. Here are 2 lines of code, the first one works, the seconds one fails. Do I really have to mount the Adls to have Pandas being able to access it.

data1 = spark.read.option("header",False).format("csv").load("abfss://[email protected]/belgium/dessel/c3/kiln/temp/Auto202012101237.TXT")
data2 = pd.read_csv("abfss://[email protected]/belgium/dessel/c3/kiln/temp/Auto202012101237.TXT")

Any suggestions ?

2 Answers 2

2

Pandas doesn't know about cloud storage, and works with local files only. On Databricks you should be able to copy the file locally, so you can open it with Pandas. This could be done either with %fs cp abfss://.... file:/your-location or with dbutils.fs.cp("abfss://....", "file:/your-location") (see docs).

Another possibility is instead of Pandas, use the Koalas library that provides Pandas-compatible API on top of the Spark. Besides ability to access data in the cloud, you'll also get a possibility to run your code in the distributed fashion.

Sign up to request clarification or add additional context in comments.

1 Comment

Thx but extra libraries for something that 'simple' would be a overkill. Mounting the ADLS helped me out. KR, Harry
0

I could solve it by mounting the cloud storage as a drive. Works fine now.

1 Comment

Can you elaborate on how you did this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.