2

I am deploying pyspark in my aks Kubernetes cluster using this guides:

I have deployed my driver pod as is explained in the links above:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: spark
  name: my-notebook-deployment
  labels:
    app: my-notebook
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-notebook
  template:
    metadata:
      labels:
        app: my-notebook
    spec:
      serviceAccountName: spark
      containers:
      - name: my-notebook
        image: pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest
        ports:
          - containerPort: 8888
        volumeMounts:
          - mountPath: /root/data
            name: my-notebook-pv
        workingDir: /root
        resources:
          limits:
            memory: 2Gi
      volumes:
        - name: my-notebook-pv
          persistentVolumeClaim:
            claimName: my-notebook-pvc
---
apiVersion: v1
kind: Service
metadata:
  namespace: spark
  name: my-notebook-deployment
spec:
  selector:
    app: my-notebook
  ports:
    - protocol: TCP
      port: 29413
  clusterIP: None

Then I can create the spark cluster using the following code:

import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
# Create Spark config for our Kubernetes based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", "<MYIMAGE>")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "7")
sparkConf.set("spark.executor.cores", "2")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.driver.port", "29413")
sparkConf.set("spark.driver.host", "my-notebook-deployment.spark.svc.cluster.local")
# Initialize our Spark cluster, this will actually
# generate the worker nodes.
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext

It works.

How can I create an external pod that can execute a python script that lives in my my-notebook-deployment, I can do it in my terminal:

kubectl exec my-notebook-deployment-7669bb6fc-29stw python3 myscript.py

But I would want to be able to automate it executing this command inside another pod

2 Answers 2

2

In general you can spin up new pod with specified command running in it i.e.:

kubectl run mypod --image=python3 --command -- <cmd> <arg1> ... <argN>

In your case you would need to provide the code of the myscript.py to the pod (i.e.: by mounting a ConfigMap with the script content) or build a new container image based on the python docker and adding the script to the image.

Sign up to request clarification or add additional context in comments.

Comments

1

You can launch a second pod based on the pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest container image inside a k8s job: https://kubernetes.io/docs/concepts/workloads/controllers/job/#running-an-example-job If your script requires access to resources available inside the first pod, then you have to use the service my-notebook-deployment or the volume my-notebook-pv to access them from the second pod. Sharing rw volume between pods requires pods to run on the same node. Note that k8s also proposes Cronjob: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.