I have a .csv file data.csv stored at location: dbfs:/raw/data/externalTables/emp_data_folder/emp_data.csv
Here is a sample of the data in the file:
Alice,25,50000,North
Bob,30,60000,South
Charlie,35,70000,East
David,40,80000,West
Eve,29,58000,North
Frank,50,90000,South
Grace,28,54000,East
Hannah,32,62000,West
Ian,45,72000,North
Jack,27,56000,South
Using this .csv file, I created an external table in Spark using the following SQL command:
%sql
CREATE TABLE IF NOT EXISTS tablesDbDef.emp_data_f (
Name STRING,
Age INTEGER,
Salary INT,
Region STRING
)
USING CSV
LOCATION '/raw/data/externalTables/emp_data_folder/'
The table is created successfully, and I can query it without any issues.
Next, I inserted a new record into the table using the following command:
%sql
INSERT INTO tablesDbDef.emp_data_f VALUES ('Mark', 20, 50000, 'South')
The record is inserted successfully and I can see this in sql query. My understanding is that if we insert new data, spark will create new files (.csv files in this case) for the newly inserted data. However, when I check the emp_data_folder directory, I don't see any new files created for this newly inserted record. The only files present are the original emp_data.csv and a newly generated _SUCCESS file.
My question is where is this newly inserted data stored if not in files? Because I can see the newly inserted data in the sql queries but there is no file created for this?