I have e-commerce product data in JSON format. This data is frequently updated. I am planning to load this data into BigQuery. Given that the JSON files are huge (a few hundred attributes) and there are many updates, I found that it is not very cost-effective to store the entire data in BigQuery and update it frequently using merge queries.
I am now planning to store the frequently used attributes in individual columns in a BigQuery table. However, I also need to support the use cases when somebody wants to access any other attribute of the JSON for some ad hoc analysis. To address this, can I use the following strategy:
Store the JSON for a product in its own file in GCP in a specific directory. Create an external table using all the files in that directory. Each file's content becomes a row in the external table describing a single product. When updates happen to a product, I update the BigQuery table and also replace the existing file for that product with a new file.
Is this a reasonable approach? If yes, how can I create the external table from the directory?