Sql – Is SQL IN bad for performance

sqlsql-server-2000

I have a query doing something like:

SELECT FieldX, FieldY FROM A
WHERE FieldW IN (108, 109, 113, 138, 146, 160,
307, 314, 370, 371, 441, 454 ,457, 458, 479, 480,
485, 488, 490, 492, 519, 523, 525, 534, 539, 543,
546, 547, 550, 564, 573, 629, 642, 643, 649, 650,
651, 694, 698, 699, 761, 762, 768, 772, 773, 774,
775, 778, 784, 843, 844, 848, 851, 852, 853, 854,
855, 856, 857, 858, 859, 860, 861, 862, 863, 864,
865, 868, 869, 871, 872, 873, 891) 

Having an IN clause with so many options, is it bad for query performance? I'm experiencing many timeouts in my application, and I believe it could be a source of this kind of problem. Can I optimize the query without removing the numbers, using any good SQL Hint?

EDIT:

@KM these are keys in a different table. This is a forum application, explaining briefly: c# gets all forums from database and stores it in app cache. Before C# calls a procedure that gets the threads for these forums and for this user, c# does some logic filtering the "all forums" collection, considering permissions and some business logic. The timeout happens on database and not on application itself. Doing all this logic on the query will require a lot of inner joins and I'm not 100% sure I can do all this inside the procedure.

I'm using SQL Server 2000

Best Solution

There are several considerations when writing a query using the IN operator that can have an affect on performance.

First, IN clauses are generally internally rewritten by most databases to use the OR logical connective. So col IN ('a','b','c') is rewritten to: (COL = 'a') OR (COL = 'b') or (COL = 'c'). The execution plan for both queries will likely be equivalent assuming that you have an index on col.

Second, when using either IN or OR with a variable number of arguments, you are causing the database to have to re-parse the query and rebuild an execution plan each time the arguments change. Building the execution plan for a query can be an expensive step. Most databases cache the execution plans for the queries they run using the EXACT query text as a key. If you execute a similar query but with different argument values in the predicate - you will most likely cause the database to spend a significant amount of time parsing and building execution plans. This is why bind variables are strongly recommended as a way to ensure optimal query performance.

Third, many database have a limit on the complexity of queries they can execute - one of those limits is the number of logical connectives that can be included in the predicate. In your case, a few dozen values are unlikely to reach the built-in limit of the database, but if you expect to pass hundreds or thousands of value to an IN clause - it can definitely happen. In which case the database will simply cancel the query request.

Fourth, queries that include IN and OR in the predicate cannot always be optimally rewritten in a parallel environment. There are various cases where parallel server optimization do not get applied - MSDN has a decent introduction to optimizing queries for parallelism. Generally though, queries that use the UNION ALL operator are trivially parrallelizable in most databases - and are preferred to logical connectives (like OR and IN) when possible.