I have a dataset and i need to extract data from a column based on the Index position
The SERVICE_NAME column contains "ISPFSDPartnerPubSub/4_2/ProxyServices/InboundAndOutbound/AP/InboundPartnerCommunicationsAPLPPS" I will need to extract based on 4th and 5th Index as 'colX' and 'colY'
How can i achieve it?
val log = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.option("sep", ",")
.option("quote", "\"")
.option("multiLine", "true")
.load("OSB.csv").cache()
val logs = log.withColumn("Id", monotonicallyIncreasingId()+1)
val df = spark.sql("select SERVICE_NAME, _raw from logs")
Expected Output Col X: AP Col Y: InboundPartnerCommunicationsAPLPPS