I have a pyspark dataframe like this:
spark.createDataFrame(
[
(1, '1234ESPNnonzodiac'),
(2, '1234ESPNzodiac'),
(3, '963CNNnonzodiac'),
(4, '963CNNzodiac'),
],
['id', 'col1']
)
I would like to create a new column where I split col1 on the words zodiac or nonzodiac, so that I can eventually groupby this new column.
I would like the final output to be like this:
spark.createDataFrame(
[
(1, '1234ESPNnonzodiac', '1234ESPN'),
(2, '1234ESPNzodiac', '1234ESPN'),
(3, '963CNNnonzodiac', '963CNN'),
(4, '963CNNzodiac', '963CNN'),
],
['id', 'col1', 'col2']
)