0

I want to select rows from say Nth row to Mth row in a table. I don't want to use any orderby because the table data is huge, it's 38 million. I found a solution for this which says to use the following query

SELECT *
FROM (select suppliers2.*, rownum rnum from
               (select * from suppliers ORDER BY supplier_name) suppliers2
                where rownum <= 5 )
WHERE rnum >= 3;

But since it has two select statement and my table is very big it's 38 million rows, I wanted to know if there is any other way which is not taxing to the DB. I could also see I can use minus but I again see problem with performance. I basically want to select the first one million rows and put it into file, then select the 2nd million rows and put it into file and so on. Please help.

1 Answer 1

1

It's not clear to me why you need to page through the results in the first place. You apparently want to grab an arbitrary 1 million rows, put that data in one file, grab another arbitrary 1 million rows (ensuring that you don't grab the same row twice), put that in a second file, and repeat the process until you've generated 38 separate files. What benefit do you derive from issuing 38 separate SELECT statements rather than issuing a single SELECT statement and letting the caller simply write the first million rows that it fetches to one file and then write the second million rows that it fetches to a second file?

Are you trying to generate the files in parallel from 38 separate worker processes? If so, it seems unlikely that you'll get much benefit from parallelizing the writes at the expense of increasing the amount of work that the database has to do substantially. I guess I could envision a system where writes were slow on the client but easy to parallelize while reads on the server were very fast and there was a ton of memory available for sorting on the database server that it might be quicker to write the files in parallel. But there aren't many systems with those characteristics. If you do want to use parallelism, you'd generally be better served letting the client issue a single SELECT to the database and allowing the database to run that SELECT statement in parallel.

If you are determined to select the results in pages, the query you posted should be the most efficient. The fact that there are nested select statements isn't particularly relevant to the analysis of performance. The query will only hit the table once. It still may be very expensive if it needs to fetch and sort all 38 million rows in order to determine which is the 3rd row and which is the 5th row. And it will likely get steadily slower when you look for subsequent pages of data. Fetching rows 37,000,001 - 38,000,000 will require, at a minimum, reading the entire table. That's one reason that it's unlikely to be all that helpful to write the files in parallel-- pulling the first few pages of data is likely to be so much more efficient than pulling the last page that you're going to be limited by that query and the time required to pull 38 million rows over the network.

Sign up to request clarification or add additional context in comments.

6 Comments

I am not planing on doing a parallel run, but I need a way whereby I can offload one million row each to 38 files, in a way that doesn't tax the system. Can you please suggest some way I can do that and since this is only a one time job I don't want to invest in tools. If not the select statement is there any other way? May be we have some command available which can help me?
@AmitRaya - Run a single SELECT * FROM your_table from your application. Open file #1. Fetch 1 million rows, write them to file #1. Close file #1. Open file #2. Fetch 1 million rows from the same cursor, write them to file #2. Repeat 38 times.
I don't want to give one select statement because I feel that will be too taxing for the system, which I understand I land in the same problem paging through the table which I hadn't thought about :).. the only idea is I am looking for a way to offload the table to delimited file without taxing the system. That is the end I am looking at, please help.
@AmitRaya - A single SELECT will be far less taxing than 38 separate SELECT statements particularly since the single statement doesn't need an ORDER BY. You can't help reading 38 million rows from the database if you want to write 38 million rows of data to files. Reading 38 million rows isn't trivial but it's not a whole lot on modern hardware either.
If I use Spool as given below is it still equally taxing for the system as a select statement. I am sorry for my ignorance but I am very new to PLSQL. spool c:\myfile.txt select field1||', '||field2||', '||field3 from my_table; spool off -- turn spooling off set head on -- turn the heading parameter back on
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.