8

I use EMR Notebook connected to EMR cluster. Kernel is Spark and language is Scala. I need some jars that are located in S3 bucket. How can I add jars?

In case of 'spark-shell' it's easy:

spark-shell --jars "s3://some/path/file.jar, s3://some/path/faile2.jar"

Also in scala console I can do

:require s3://some/path/file.jar

5
  • what is the kernel you are using? Commented Aug 13, 2019 at 9:32
  • Kernel is Spark and language is Scala Commented Aug 13, 2019 at 9:42
  • did you try AddJar s3://some/path/file.jar ? Commented Aug 13, 2019 at 10:38
  • yes, receive error: Incomplete statement Commented Aug 13, 2019 at 10:43
  • is there a way to add maven dependency ? Commented Nov 9, 2019 at 6:02

3 Answers 3

13

Just put that on your first paragraph:

%%configure -f
{
    "conf": {
        "spark.jars": "s3://YOUR_BUCKET/YOUR_DRIVER.jar"
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

This worked for me. Remeber to run this before a scala command.
@IgorTavares EMR v5.29.0 notebook stop complaininig about library not found, but I got strange NullPointerException after adding spark.jars which points to S3. I'm afraid stack trace doesn't tell me much, since I'm not sure if EMR stack trace matches the open source Spark code lines.
I get, "Error parsing magics!: Magic configure does not exist!"
6

After you start the notebook, you can do this in a cell:

%%configure -f
{
"conf": {"spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3"},

"jars": [
        "//path to external downloaded jars"
    ],

}

8 Comments

I tried this in such way %%configure -f { "conf": {"spark.jars.packages":"//path to external downloaded jars"} } and this %%configure -f { "conf": {"jars":"//path to external downloaded jars"} }
Should i use exactly with this line "conf": {"spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3"}, ?
Those are just sample jars, that I needed for my notebook. You need to replace the jars, with the jars you need
Should I use "spark.jars.packages" : "" or "jars": [""] ?
I used this, some time ago. You need to check this with the current version you are using
|
1

If you're trying to automate I'd suggest this:

In your cluster's bootstrap script, copy the jar file from s3 into a readable location, sort of like so:

#!/bin/bash

aws s3 cp s3://path_to_your_file.jar /home/hadoop/

then in your cluster's software settings (in EMR UI on cluster creation) set the classpath properties:

[
    {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.driver.extraClassPath": "/home/hadoop/path_to_your_file.jar",
        "spark.jars": "/home/hadoop/path_to_your_file.jar"
      }
    }
  ]

(you can add extra properties here like spark.executor.extraClassPath or spark.driver.userClassPathFirst) then launch your cluster and it should be available thru imports.

I had to log into the primary node and run spark-shell to see where the import was located (by typing in import com. and pressing tab to auto complete (theres probably an easier way to do this))

then I was able to import and use the class in zeppelin/jupyter

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.