3

I'm writing a .NET application that will be used by engineers to graph and report upon our scrap and rework database. The application will feature pre-canned graphs and reports on application launch in a dashboard type implementation. Users will then be able to create their own graphs and reports (multiple graphs/reports will be open at the same time). A network connection and login is required for the application to run.

My question refers to the applications gathering and usage of the data. Currently the scrap and rework database table in question is roughly 100,000 rows and growing at about 16,000 rows per month.

I'm looking for a best practice or experienced based answer, however here are some of our ideas:

  1. Query the entire table on application launch in a "mecha-query", immediately converting to objects for the rest of the program to work with. In the future If the table grows too large have a setting for a partial or full load. (My favorite, but seems like terrible practice.)

  2. Writing a local copy of the table to the users computer using something like SQLite on application launch, data is queried from the on disk SQLite DB as needed, local DB is cleaned up on application close, or on application start if detected.

  3. Using an in memory SQLite DB in which data is queried as needed.

  4. Query SQL Server as needed.

For option 1 and 3 I'm worried about the applications memory footprint looking into the future 5-6 years. With the dashboard functionality previously described the advantages of options 2 and 4 seem negated because the application is basically going to need all the data on start-up anyways. I'm also thinking about the applications extensibility; maybe it will be ported to a web app someday.

Thanks!

8
  • 1
    Your describing caching the database and that's a really bad idea. Commented May 9, 2014 at 15:05
  • 2
    Sounds like premature optimization to me. If you app needs all data at start-up then that does not sound like a scalable design to me. Commented May 9, 2014 at 15:26
  • 2
    I agree with @Blam that #4 should be your default choice. But really you want to do some performance testing to see if it does/doesn't work. If it doesn't perform well, look at other options. One option that's missing is to have an additional read-only store. This could be a replicated version of your current database, a data warehouse, or some other sort of denormalized data store. Of your 4 choices, though, only #4 sounds like a remotely viable option. Commented May 9, 2014 at 17:10
  • 2
    As others have said, I see no reason why just querying the database wouldn't work, especially if the table isn't being updated. Any modern RDBMS will do its own data caching, so with that plus proper indexing I can't imagine performance problems for many many years. Commented May 9, 2014 at 17:53
  • 1
    Silly question perhaps -- given your simple use case, have you looked at simply using Excel tied to a SQL data source? Don't reinvent the wheel if you don't have to. :) Commented May 12, 2014 at 22:30

1 Answer 1

2

Yes I recommend 4

And rethink why you need to build all the objects at start
Even if you do need to build them all at start then put them in a Dictionary and let SQL do what SQL does
With .NET you have size limits for a collection

I see no reason to worry about the load on SQL but you could also use LINQ against the Dictionary.

100,000 rows is not even close to big
100 million rows and you are starting to get big

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the elaboration. I'm curious of the drawbacks of options 1-3 if the data is read-only and denormalized? Scaling seems to be an obvious problem but also seems to be easily fixable with a simple setting, don't load data over 3 months ago for instance. The reason I want to build objects at start is so it's happening while a user is attempting to login. Then the data is also there to populate options lists and such; for instance you don't know what stations you can choose to get data from if you don't know what stations have data.
Then do 1-3. Not a single person has said it is good idea but you seem to be intent on it. Fix scale with a simple setting.
I'm just trying to learn; looking for reasoning or train of thought. I realize 4 will work. I'm not worried about it. What I am worried about is why 1-3 are bad besides scalability. If you can tell me that, or if you tell me that scalability is the only reason why their bad. I'll accept this answer.
SO is for specific programming questions not for learning. You have the answer. Not only are 1-3 flawed but your approach suffers from premature optimization. If scale is not enough reason for you to dismiss them then I can't help you.
Populate the option lists with queries to the DB :). Caching those (and refreshing them on startup or on demand) would provide immediate options for the user to pick from.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.