How to count records faster querying linked DB2 server from SQL SERVER 2012

Question

My setup is - on my machine I have an instance of SQL SERVER 2012 with 3 linked DB2 servers. Most of the operations are taking a lot of time so I want to optimize as much as possible each query. Since now I've been using COUNT(*) which I know is relatively slow method but now it's taking 19 seconds to return a result, so it's not acceptable.

I've read articles on this topic and I see that the main concern is the accuracy of the result but since I'm using this data mostly to perform pagination and it's not that important to get the absolutetly exact number I would be pretty happy with something faster and not 100% accurate.

I tried this query :

select *
 from openquery(MyLinkedServer,
 '
  select sum (spart.rows)
  from sys.partitions spart
  where spart.object_id = object_id(''MyTable'')
  and spart.index_id < 2

 '
)

but I got this error:

[IBM][iSeries Access ODBC Driver][DB2 UDB]SQL0204 - PARTITIONS in SYS type *FILE not found.

I have very basic knowledge on SQL so I'm not really sure what is the reason for this error. I suspect that maybe it's because the compatability with the DB2 servers but at the end I have two questions - if my query is wrong, how to fix it in order to work? If it's not the query itself but the Linked servers, any idea how I can perform faster recrod count than using COUNT(*)?

P.S

Additional information the COUNT(*) query that I'm using is as follows:

select *
from openquery(MyLinkedSerbver,
'
select 
  COUNT(*)

from Person P

left outer join Information INF 
  ON P.ID = INF.PersonID 

where 

TRIM(P.FirstName)||'' ''||TRIM(P.MiddleName)||'' ''||TRIM(P.LastName) like ''%Peter%'' 

'
)

which yesterday took 19 seconds to execute and today I get:

**SQL0666 - SQL query exceeds specified time limit or storage limit.**

And I have no idea what's going on. I uncheked Allow Query Timeout but with no effect.

Why are you bothering to join to Information? You don't need to. — Charles
– Charles, Commented Sep 19, 2014 at 12:38
@Charles this is just the full (and exact) query I use to COUNT() and even though the time differences are not that significant the one where I look for matches in the name (like in the example) is the slowest one. But I also use columns from INF in the where clause and I need to count the result in that case also. So the full query needs this join no matter that in this example I use column from the first table. — Leron
– Leron, Commented Sep 19, 2014 at 12:47

Charles · Accepted Answer · 2014-09-19 13:04:35Z

1

SELECT COUNT(*) FROM mytable is a very quick operation on IBM i. I've got a table with almost 17 million rows and the above comes back instantly. And I have one of the smaller and older boxes available.

It's only when you add the WHERE clause that things may slow down. As with any other operation on any DB, indexes play an important role. Actually, the IBM i has an advantage over most (every?) other DB's out there when it comes to counting rows. IBM i supports the standard bitmapped index like any other DB; but it also supports another index type called encoded vector index (EVI). I won't go into all the details, but for the purposes of counting rows and EVI index is quite useful as the row count for each key is part of the index itself. As you can imagine, this makes counting rows with a matching key pretty much instant.

Going back to your code
[IBM][iSeries Access ODBC Driver][DB2 UDB]SQL0204 - PARTITIONS in SYS type *FILE not found.

The error is pretty clear. There's no SYS.PARTITIONS table in IBM i. SYS.PARTITIONS is not an ANSI/ISO standard catalog nor a JDBC/ODBC standard one; it's MS SQL Server specific.

Available DB2 for IBM i catalog views for the latest release (7.2) are shown here:
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzcatalog.htm

For v5r4 (5.4), look here:
http://www-01.ibm.com/support/knowledgecenter/ssw_i5_54/db2/rbafzmstcatalog.htm

It appears that SYSPARTITIONSTAT in QSYS2 is the closest IBM i has to MS SQL's SYS.PARTITIONS. However, there's no OBJECT_ID or INDEX_ID column.

If all you're after is a row count, I'd simply use...

SELECT COUNT(*)
FROM MYLIB.MYTABLE

As it should return pretty much instantly. You'll have some overhead by going through the linked server. To see how much, use either an IBM tool such as the (green screen) STRSQL or (Java) iNav's Run SQL Scripts to run the query directly. SQuirreL could be used also.

If you really want to query the meta data, then:

SELECT NUMBER_ROWS
FROM QSYS2.SYSPARTITIONSTAT
WHERE TABLE_NAME = 'MYTABLE'
  AND TABLE_SCHEMA = 'MYLIB'

*Note there's also a SYSTEM_TABLE_NAME and SYSTEM_TABLE_SCHEMA column containing the 10 charactor system table & schema name.

But I'd be surprised if it performed any faster. On my system, over that 17M row table, SELECT COUNT(*) took 39ms and querying SYSPARTITIONSTAT took 135ms.

Looking at your added COUNT(*) code...
The LIKE ''%Peter%'' is going to hurt performance as an index lookup can't be used. At best, you're looking at a full index scan. Make sure you have an index the system can use. I would try 3 separate scenarios and see which index the system uses. (Use the Run & Explain in iSeries Navigator's Run SQL Scripts)

separate (bitmapped) index over each column
separate EVI index over each column
1 combined (First middle last) (bitmapped) index over all three columns

Also have you tried the format

WHERE P.FirstName  like ''%Peter%''
   OR P.MiddleName like ''%Peter%''
   OR P.LastName   like ''%Peter%''

That should allow the DB to search less work especially if most the matches come from P.FirstName, plus it doesn't have to use any temporary storage to concatenate the data. Note: I also got rid of the TRIM() it's not needed in either case. I suspect it's costing you some. It's possible it helps, but in that case TRIMR() would be better. The best solution would be to have variable length columns in the first place.

It's possible that your original format (with or without the TRIM() might be more conducive to the third index option above (1 combined index). Whereas the other two index options would be more conducive to the first two options.

Create all 7 indexes mentioned above then try the various scenarios using Run & Explain to see what's happening which each one.

In between tries, - disconnect and reconnect from the system
- use the SETOBJACC OBJ(PERSON) OBJTYPE(*FILE) POOL(*PURGE)

That should keep caching to a minimum and the comparisons equal.

edited Sep 19, 2014 at 13:04

answered Sep 18, 2014 at 13:48

Charles

24.2k3 gold badges23 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Leron Over a year ago

@Charlse Glad that you've paid attention to another of my IBM i related questions. In all the articles that I went through for MS SQL the COUNT(*) options was marked as slow but accurate. For the sake of simplicity and also narrowing down the data that I'm gonna show to my users now I use only one join -

left outer join aaa.RTrr5 EGN            ON P.ID = PIN.ID              AND PIN.fgdf5 = ''jjj''  AND (PIN.fgdf5 = ''zzz'' OR PIN.fgdf5 = ''www'')

and a where clause WHERE PIN.SomeVal LIKE ''%55%'' and that is taking 19 seconds to return result. Each table has at most 1mil records.

Leron Over a year ago

Also the names are changed but still they are very weird. I can't believe that those are the names that are actually used on the Linked servers. Does IBM or SQL SERVER has some feature that makes the real table names look weird?

Leron Over a year ago

Sorry to be so annoying but I know what my table name is but how can I found out what the lib name is? I read in some post that IBM DB2 use this instead but it's not obvious from the linked servers which actually is the lib or maybe more proper - I don't know where to look at it for?

Charles Over a year ago

Edit your question and show the count(*) statement you're trying to use.

Leron Over a year ago

please take a look at the P.S part it's getting pretty confusing for me to deal with this problem.

|

user1919238 · Accepted Answer · 2014-09-18 07:46:06Z

1

The SQL0204N error refers to an undefined name. It's hard to troubleshoot the exact error without more information about your data, but there is some documentation here (find the equivalent documentation for your version of DB2, as there are differences).

As for count(*), if you need to count rows, this is as efficient as anything you can use for that task. You won't gain anything by avoiding it.

If your query taking 19 seconds is similar to the one shown above, it is likely slow because you are calling the function object_id() for every row. You may be able to optimize this by specifying that the function is deterministic (returns the same output for every input) when creating it. This would, in theory, prevent DB2 from calling the function every time.

Or you could rewrite the query to remove the function call from the where clause.

answered Sep 18, 2014 at 7:46

user1919238

1 Comment

Leron Over a year ago

Thanks for your asnwer. My version is V5R4 which as I understood is pretty old and at the time it came up still some SQL SERVER functions were not implemented which made me think if I cna actually use this approach for counting. As for the query, in fact it's not changed at all, just MyLinkedServer and MyTable in place of the real names. So if you think this can be optimized please provide some example.

Collectives™ on Stack Overflow

How to count records faster querying linked DB2 server from SQL SERVER 2012

2 Answers 2

8 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related