1

I had a Dockerfile as follows

FROM python:3.7

RUN apt-get update
RUN apt-get install default-jdk -y

COPY requirements.txt ./
RUN pip install -r requirements.txt

which I was using in a CI pipeline on GitLab, and it was working fine.

Recently, however, it has stopped working. I haven't updated my requirements.txt file, so might this be because default-jdk has changed?

How should I update my Dockerfile so it now correctly installs pyspark?

EDIT

example of error:

/usr/local/lib/python3.7/site-packages/pyspark/rdd.py:824: in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
/usr/local/lib/python3.7/site-packages/py4j/java_gateway.py:1160: in __call__
    answer, self.gateway_client, self.target_id, self.name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

answer = 'xro1291'
gateway_client = <py4j.java_gateway.GatewayClient object at 0x7f6490c2a350>
target_id = 'z:org.apache.spark.api.python.PythonRDD', name = 'collectAndServe'

    def get_return_value(answer, gateway_client, target_id=None, name=None):
        """Converts an answer received from the Java gateway into a Python object.

        For example, string representation of integers are converted to Python
        integer, string representation of objects are converted to JavaObject
        instances, etc.

        :param answer: the string returned by the Java gateway
        :param gateway_client: the gateway client used to communicate with the Java
            Gateway. Only necessary if the answer is a reference (e.g., object,
            list, map)
        :param target_id: the name of the object from which the answer comes from
            (e.g., *object1* in `object1.hello()`). Optional.
        :param name: the name of the member from which the answer comes from
            (e.g., *hello* in `object1.hello()`). Optional.
        """
        if is_error(answer)[0]:
            if len(answer) > 1:
                type = answer[1]
                value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
                if answer[1] == REFERENCE_TYPE:
                    raise Py4JJavaError(
                        "An error occurred while calling {0}{1}{2}.\n".
>                       format(target_id, ".", name), value)
E                   py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
E                   : java.lang.IllegalArgumentException
E                       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
E                       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
E                       at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
E                       at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
E                       at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
E                       at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
E                       at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
E                       at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
E                       at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
E                       at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
E                       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
E                       at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
E                       at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
E                       at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
E                       at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
E                       at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
E                       at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
E                       at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
E                       at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
E                       at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
E                       at scala.collection.immutable.List.foreach(List.scala:381)
E                       at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
E                       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
E                       at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
E                       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2066)
E                       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
E                       at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
E                       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
E                       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
E                       at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
E                       at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
E                       at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:153)
E                       at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
E                       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E                       at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
E                       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E                       at py4j.Gateway.invoke(Gateway.java:282)
E                       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                       at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                       at py4j.GatewayConnection.run(GatewayConnection.java:214)
E                       at java.base/java.lang.Thread.run(Thread.java:834)

/usr/local/lib/python3.7/site-packages/py4j/protocol.py:320: Py4JJavaError
2
  • what are the errors you got? Commented Oct 8, 2019 at 9:03
  • @LinPy I've edited the question include one Commented Oct 8, 2019 at 9:08

1 Answer 1

1

Changing the base image to python:3.7-stretch worked for me

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.