What would be your first choice when making a case insensitive comparison, to use LOWER or UPPER? Do you think that there is a performance advantage in using one over the other? Some would say they are implemented and should work the same, some would choose LOWER thinking that in most cases the data distribution would favor the lower case characters and others could point that UPPER generates smaller ASCII values than LOWER and that would be an advantage for UPPER CPU wise. As the Oracle implementation of these functions is not public and as the members of various Oracle forums didn’t come to a conclusion in this matter let’s build a test case and see for ourselves.
I created a table with four varchar2 columns of 30 characters populated first with random uppercase and lowercase characters using dbms_random.string(‘A’,30), second with first character in uppercase and rest of characters in lowercase using dbms_random.string(‘U’,1) || dbms_random.string(‘L’,29), third with lowercase characters using dbms_random.string(‘L’,30) and fourth with
uppercase characters using dbms_random.string(‘U’,30). I ran the test first against 10 mil records and second in a 1 mil loop against 10 records. The characters were UTF8 (AL32UTF8) with NLS_SORT and NLS_COMP set to BINARY.
The two queries used to test LOWER and UPPER were:
select count(*) from c where lower(c1) like %abc%;
select count(*) from c where upper(c1) like %ABC%;
The results were interesting and consistent during the 10 tries for each test. CPU wise they were identical and time wise, UPPER was quicker from 1 to 3%:
STRING LOWER_RESULT UPPER_RESULT IMPROVE
------------------ ------------ ------------ -------
Random case string 6,83 sec 6,76 sec 1,11%
Title case string 6,78 sec 6,71 sec 0,98%
Lower case string 6,77 sec 6,66 sec 1,61%
Upper case string 6,78 sec 6,56 sec 3,48%
Conclusion: next time when I need to make a case insensitive comparison I will use UPPER.