0

My goal is to run a Spark job using Databricks, and my challenge is that I can't store files in the local filesystem since the file is saved in the driver, but when my executors tried to access the file, it didn't exist because it is located in the driver filesystem.

I want to use the Workspace to store my file; DBFS is not an option for me. The issue is that in my notebook, when I attempt to store the file with Python code, it works perfectly, and I can access the file using the Databricks UI.

The following Python code works:

import os
from pathlib import Path

# Define the path where you want to create the directory and file
directory_path = Path("/Workspace/Shared/credentials/test")
file_path = directory_path / "abc.txt"

# Create the directory if it doesn't exist
os.makedirs(directory_path, exist_ok=True)

# Create and write to the file
with open(file_path, 'w') as file:
    file.write("This is a string stored in abc.txt")

I need to do this in Scala, but I tried the following code and encountered an issue:

%scala
import java.nio.file.{Files, Paths, StandardOpenOption}
val directoryPath = "/Workspace/Shared/credentials/test2"
val filePath = Paths.get(directoryPath, "abc.txt")

// Create the directory if it doesn't exist
val directory = Paths.get(directoryPath)
if (!Files.exists(directory)) {
    Files.createDirectories(directory)
}
println(Files.exists(directory))

// Write the content to the file
val content = "This is a string stored in abc.txt"
Files.write(filePath, content.getBytes(), StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING)

println(s"Directory created at: $directoryPath")
println(s"File created at: $filePath with content: '$content'")

However, I am receiving the following error:

FileSystemException: /Workspace/Shared/credentials: Operation not permitted

Ultimately, I want to use Spark with a Kafka configuration and specify the location of my JKS file:

spark.read
     .format("kafka")
     .option("includeHeaders", IncludeHeaders)
     .option("kafka.bootstrap.servers", topic.innerSource.bootstrapServers.get)
     .option("subscribe", parsedTopicName)
     .option("kafka.security.protocol", jobConfig.kafka.get.securityProtocol)
     .option("kafka.ssl.enabled.protocols", "TLSv1.2")
     .option("kafka.ssl.keystore.location", MyLocationToTheWorkspace)

Do you have any suggestions on how I can use Scala to store my file in the workspace or in another location that all executors can access my JKS files?

7
  • What is /Workspace? Is that a Unix file path? Can you chown / chmod it? Commented Sep 17, 2024 at 20:08
  • I am already the "root", so think that will not help in this case Commented Sep 18, 2024 at 9:34
  • I doubt databricks lets code actually run as root user. Your Python code shows /Workspace, but your Scala code does not? Commented Sep 18, 2024 at 12:35
  • I edited the post, I used "/" same way as I used with python. I am trying to run with my notebook and by simple query of "whoami", I see that I am the root. Commented Sep 18, 2024 at 13:04
  • whoami locally isn't the same as when the code runs remotely in Databricks. On Mac/Linux, there generally is no /Workspace folder Commented Sep 18, 2024 at 16:51

1 Answer 1

0

this works on Shared as well as Single user cluster

import java.nio.file.{Files, Paths}

val directoryPath = Paths.get("/Workspace/Shared/tmp/tmp")
val filePath = directoryPath.resolve("abcdef.txt")

Files.createDirectories(directoryPath)

Files.write(filePath, "This is a string stored in abcdef.txt".getBytes)

enter image description here

enter image description here

Sign up to request clarification or add additional context in comments.

7 Comments

But in our case we are not running in single cluster, we have multiple nodes.
I tested it on Shared (Multiple Nodes) and Single User clusters.
I am getting this error: FileSystemException: /Workspace/Shared/tmp: Operation not permitted
Shared/tmp is my folder, please use the folder from your workspace.
I know, I created the same folder. Can yo share with me your cluster configuration please?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.