Intersystems Cache Indexing Shocker - How to Fix Sluggish Queries
Photo: Markus Spiske / www.temporausch.com / CC-BY |
When going through a performance review of a grid on the website, I was struggling to see why the query was taking so long. It was a single table query with a single where clause which was on an indexed field. This indexed field could only contain 5 different values and the query was return fewer than 100 entries but was taking over 1 second.
On other lower powered servers the query was taking less than a third of a second and often returning more data albeit on a smaller total data set.
Checking the query plan revealed something interesting, the query was not using the index but was instead performing a whole table scan. Further testing revealed that only "=" queries use indexes. "!=" ignore indexes as do any queries with "OR" or "IN".
After further investigation I found that if you only return the ID then NOT, OR , IN queries do use indexes, however, if you return any other field including the field you are performing your where clause on you activate a full table scan
Uses index
SELECT ID FROM TABLE WHERE STATUS != 'Complete'SELECT count(*) FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Complete'
Full table scan
SELECT STATUS FROM TABLE WHERE STATUS != 'Complete'SELECT * FROM TABLE WHERE STATUS != 'Complete'
SELECT * FROM TABLE WHERE STATUS = 'Active' OR STATUS = 'Error' OR STATUS = 'Cancelled'
As far as I am concerned this is unbelievable behaviour, which is still in the product as of the latest 2014 version of Intersystems Cache database!
At least it had revealed a 3 different ways of substantially improving query performance
1. Union Query
If you know all of the data that can be contained in the field then you can perform a UNION querySELECT * FROM TABLE WHERE STATUS = 'Active'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Error'
UNION
SELECT * FROM TABLE WHERE STATUS = 'Cancelled'
This of course means more maintenance when adding a new status.
2. ID Only Select
If you do not know the data then you can just return the ID and loop through the returned IDs to grab the data.e.g.
SELECT ID FROM TABLE WHERE STATUS != 'Complete'
Loop...
SELECT * FROM TABLE WHERE ID=...
3. Create New Property
e.g.SELECT * FROM TABLE WHERE Active = '1'
This requires a new property which is probably easiest to create in the OnUpdate method, again maintenance of this would be an issue.
I have not seen any documentation talking about these issues with queries. It does strike me this should be an easy thing for Intersystems to fix at the database level, but if they haven't considered it by now then I won't be holding my breath for it.
Comments
First of all, please contact wrc.intersystems.com whenever the behavior of Caché surprises you. You definitely will be surprised by the helpfulness of our support advisors and we'll all do our best for you to have the predictably best behavior from your system.
Secondly, Caché is not unique in having behavior that, at first glance, is surprising with (for example) indices and NOT EQUAL where clauses. (Try googling "ORACLE NOT EQUAL INDEX")
Thirdly (and I feel sort of lame saying this) running Tune Table http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GSQL_optimizing#GSQL_optimizing_tunetable will improve SOME of the behaviors you're seeing. [I feel lame because we pride ourselves on minimizing the tuning needed for Caché systems and yet we don't (re-)initialize some of the tuning parameters automatically. Shame on us but stay tuned (ha, ha !!)]
I'd prefer not to get into a lengthy explanation of how you can improve the plan used (and why) for the different cases you cite. There are relatively easy steps to take for most and easy-to-understand explanations. Give us a call.
RN
Running tune table sets the selectivity to 20% as there are 5 possible values for the state and it ignores the actual spread of data. Manually changing the tuning value of the table does not change how the query works it still always avoids the index.
Setting the index to contain all of the data returned from the query also failed to use the index on a NOT query.
I found that technically the fastest in every test case I made using SQL was to change a NOT query to an ID only OR query and then iterate through the results selecting by ID the required data.
1,000,000 record table with the state evenly spread between 5 states gave me the following performance
SELECT ID, Subject, State FROM Test WHERE State != 'Complete' - 3.356 seconds
SELECT ID FROM Test WHERE State='Active' OR State='Paused' OR State='Pending' OR State='Waiting' , then iterate by ID with SELECT ID, Subject, State FROM Test WHERE ID='id' - 2.221 seconds
SELECT ID FROM Test WHERE State='Active' OR State='Paused' OR State='Pending' OR State='Waiting' , then iterate through global storage using the returned ID - 1.635 seconds
Iterate through Global storage - 1.100 seconds
Iterate through Index Storage and obtain data from Global Storage - 0.794 seconds
Iterate through Index storage which contains data 0.612 seconds
InterSystems Technologies