2

I am trying to load multiple files/directories in SPARK using Java, I have found a few examples on how to do this in scala, can someone give an example with explanation on how to do this in Java?

In particular I would like to use regex like paths, so that I do not have to specify a fully qualified name for each file. I can already give a comma separated file values with fully qualified names.

I am loading from the local file system, I don't know if this makes a difference

The following is the code I have used to load the files:

    SparkConf sparkConf = new SparkConf().setAppName("TableAggregator");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    JavaRDD<String> lines = ctx.textFile(args[0], 1);

1 Answer 1

3

In Spark, the method textFile() takes an URI for the file (either a local path on the machine or a hdfs://, etc URI).

You can run this method on directories, compressed files and wildcard :

ctx.textFile("data.txt");

ctx.textFile("/your/directory/");

ctx.textFile("/your/directory/*");

ctx.textFile("/your/directory/*.gz");

Be aware that when you use a path for your input, it has to be the same path for all the worker nodes. So you have to copy the file to all workers or use a shared network-mounted file system.

So you can use a pattern with the wildcard to do it simply.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your response.. is the directory look up recursive?
For me, it is not recursive. You have a exception java.io.IOException: not a file
@AntoineHars Where is this documented?
Unfortunately, I haven't found it on the documentation but I tried by myself and I get this error. If you search the term "textfile" in the Spark programming Guide (spark.apache.org/docs/latest/programming-guide.html) and in the javadoc (spark.apache.org/docs/latest/api/java/index.html), there is no mention of the behavior in case of nested directories

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.