Adding external jars in EMR Notebooks

Question

I use EMR Notebook connected to EMR cluster. Kernel is Spark and language is Scala. I need some jars that are located in S3 bucket. How can I add jars?

In case of 'spark-shell' it's easy:

spark-shell --jars "s3://some/path/file.jar, s3://some/path/faile2.jar"

Also in scala console I can do

:require s3://some/path/file.jar

what is the kernel you are using?

koushikmln
– koushikmln

2019-08-13 09:32:44 +00:00
Commented Aug 13, 2019 at 9:32 — koushikmln
– koushikmln, Commented Aug 13, 2019 at 9:32
Kernel is Spark and language is Scala

Droll80
– Droll80

2019-08-13 09:42:21 +00:00
Commented Aug 13, 2019 at 9:42 — Droll80
– Droll80, Commented Aug 13, 2019 at 9:42
did you try AddJar s3://some/path/file.jar ?

koushikmln
– koushikmln

2019-08-13 10:38:32 +00:00
Commented Aug 13, 2019 at 10:38 — koushikmln
– koushikmln, Commented Aug 13, 2019 at 10:38
yes, receive error: Incomplete statement

Droll80
– Droll80

2019-08-13 10:43:59 +00:00
Commented Aug 13, 2019 at 10:43 — Droll80
– Droll80, Commented Aug 13, 2019 at 10:43
is there a way to add maven dependency ?

Jeevan
– Jeevan

2019-11-09 06:02:01 +00:00
Commented Nov 9, 2019 at 6:02 — Jeevan
– Jeevan, Commented Nov 9, 2019 at 6:02

Igor Tavares · Accepted Answer · 2019-10-27 14:36:25Z

13

Just put that on your first paragraph:

%%configure -f
{
    "conf": {
        "spark.jars": "s3://YOUR_BUCKET/YOUR_DRIVER.jar"
    }
}

answered Oct 27, 2019 at 14:36

Igor Tavares

96911 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Merlin Over a year ago

This worked for me. Remeber to run this before a scala command.

VB_ Over a year ago

@IgorTavares EMR v5.29.0 notebook stop complaininig about library not found, but I got strange NullPointerException after adding spark.jars which points to S3. I'm afraid stack trace doesn't tell me much, since I'm not sure if EMR stack trace matches the open source Spark code lines.

insanely_sin Over a year ago

I get, "Error parsing magics!: Magic configure does not exist!"

partha_devArch · Accepted Answer · 2019-08-13 12:59:25Z

6

After you start the notebook, you can do this in a cell:

%%configure -f
{
"conf": {"spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3"},

"jars": [
        "//path to external downloaded jars"
    ],

}

answered Aug 13, 2019 at 12:59

partha_devArch

4542 silver badges11 bronze badges

8 Comments

Droll80 Over a year ago

I tried this in such way %%configure -f { "conf": {"spark.jars.packages":"//path to external downloaded jars"} } and this %%configure -f { "conf": {"jars":"//path to external downloaded jars"} }

Droll80 Over a year ago

Should i use exactly with this line "conf": {"spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3"}, ?

partha_devArch Over a year ago

Those are just sample jars, that I needed for my notebook. You need to replace the jars, with the jars you need

Droll80 Over a year ago

Should I use "spark.jars.packages" : "" or "jars": [""] ?

partha_devArch Over a year ago

I used this, some time ago. You need to check this with the current version you are using

|

AdiS · Accepted Answer · 2023-09-21 08:03:48Z

If you're trying to automate I'd suggest this:

In your cluster's bootstrap script, copy the jar file from s3 into a readable location, sort of like so:

#!/bin/bash

aws s3 cp s3://path_to_your_file.jar /home/hadoop/

then in your cluster's software settings (in EMR UI on cluster creation) set the classpath properties:

[
    {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.driver.extraClassPath": "/home/hadoop/path_to_your_file.jar",
        "spark.jars": "/home/hadoop/path_to_your_file.jar"
      }
    }
  ]

(you can add extra properties here like spark.executor.extraClassPath or spark.driver.userClassPathFirst) then launch your cluster and it should be available thru imports.

I had to log into the primary node and run spark-shell to see where the import was located (by typing in import com. and pressing tab to auto complete (theres probably an easier way to do this))

then I was able to import and use the class in zeppelin/jupyter

Collectives™ on Stack Overflow

Adding external jars in EMR Notebooks

3 Answers 3

3 Comments

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related