I'm new to spark using python and I'm trying to do some basic stuff to get an understanding of python and spark.
I have a file like below -
empid||deptid||salary
1||10||500
2||10||200
3||20||300
4||20||400
5||20||100
I want to write a small python spark to read the print the count of employees in each department.
I've been working with databases and this is quite simple in a sql, but I'm trying to do this using python spark. I don't have a code to share as I'm completely new to python and spark, but wanted to understand how it works using a simple hands-on example
I've install pyspark and did some quick reading here https://spark.apache.org/docs/latest/quick-start.html
Form my understanding there are dataframes on which one can perform sql like group by, but not sure how to write a proper code