1

I need to manage a large dataset of approx 100.000 records per year over many years.

The data is retrieved with SOAP connectors using custom query's.

Each records has 'basic info' and 'historic info' which need to be extracted with different SOAP connectors.

In the future, the records will need to be retrieved from other DB/source/API for new records/years.

Retrieving 100.000 records of basic info over SOAP takes about 2minutes. Including the historic info about 20-30 minutes (each records has an average of 10 'hitoric records' as 'child')

Only the 'present' period of data needs to be updated regularly. Old years in fact 'never' need to be updated, or just on sporadic explicit demand, eg, when adding a new attribute.

So I tried the following design in order to prevent excessive update time:

  • I create a function getDataBySOAP(start,end,includeHistory). This does all the work (with different SOAP requests etc)
  • Then 1 query for each year of data, lets say 'data22', data23, data24 etc, just calling the SOAP function (so in the fucture data for another period could be retrieved from another function/DB). All query's were set to never update automatically (all checks off), excepts for the 'current' query
  • Then 1 query, say allData, appendin all year data together

But now, when I say 'update' on the allData, then the update updates everyting, including history. So then we're set of to .. have MANY coffees before the update is ready, because all queries are updated.

What is going wrong? Should I design the queries differently? Now the hostoric data is in additional optional columns. but maybe I sould seperate then in seperate queries .. but even then , it seems like an update would trigger a FULL update, taking hours.

Note: I an rather expert in Excel/VBA. But this is my first Power query/Power BI project .. so really a novice

Thanks for your tips. C.

1 Answer 1

1

That's just how Power Query works. If you have a master query that references the other queries, then they will all be re-evaluated each refresh. In your case, why not have the master query as a calculated table in DAX. That way, the historic tables will be stored and not updated (also mark these as hidden as they won't be used apart from in a calculated table), the new table can be set to update and the calculated table will append them all together. A drawback is the additional storage required (you will be storing historic data twice) but this will be offset with a faster refresh time.

Another option is incremental refresh is you want to configure this properly: https://learn.microsoft.com/en-us/power-bi/connect-data/incremental-refresh-overview

Sign up to request clarification or add additional context in comments.

6 Comments

incremental: condition of only one source, but that's unlikely to be fulfilled in the (near) future. If I get it right You are telling I should create a 'static' aka 'calculated DAX' table where I store the old data of previous years. Static in the sense: not use the formala getDataBySOAP in a power query, but 'copy-paste' the old getDataBySOAP table into such static table. Using the query already defined in my initial question: copy paste the data22, data23, data24 into a static DAX 'oldData' table. And then the allData table should append dataRecent (ie 2025) = dynamic part) to oldData?
Yes, have a query (or multiple) which retrieve the historic data and set the query to "Do not refresh" in PQ. Then have another query for most current data which does refresh. These queries will all load into the Power BI data model where you can create a calculated table that combines all data. On the next refresh, only the query that retrieves current data will update which will update the table in the data model and trigger the calculated table to also update.
probably a stupid question, but since I'm still in the 'debug/setup phase', I'm fooling around in my beloved/trusted Excel rather than Power BI .. is there an Excel equivalent to a PowerBI 'calculated table'?
No - this would have to be done in Power BI
in order to try that, I initially imported the develpment excel I used into Power BI. Then it seemed to make more sense to copy paste the custom power query SOAP sripts themselves directly into Power BI and have them run from Power BI rather than Excel. Much to my surprise, the 'direct' implemented power queryies in Power BI (desktop) are MUCH slower than in Excel. Factor 50 or so slower. Rather unmanageble that way, since seconds became minutes (for smal tables) and minutes became... CANCEL. What am I overlooking in Power BI ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.