C# – EXTREMELY Poor LINQ Query Performance When Using Skip/Take for Paging

c++linqlinq-to-entitiespagingquery-performance

I need to query records from a DB2 database using LINQ. I have entities that have been generated from the DB schema and am attempting to perform a LINQ query using Skip and Take. The underlying table has like 25 columns and maybe a million records. When I execute the query without the "Skip()" it takes approximately .508 milliseconds to complete. When I include Skip() it takes close to 30 seconds. Big difference.

Can anyone tell me why this is happening?

UPDATE: Here is the LINQ query I am using.

var x = 30;

var results = context.ASSET_T
.OrderBy(c => c.ASSET_ID)
.Skip(x)
.Take(x)
.ToList();

UPDATE: So I just tried updating the query so that I only return a single column, ASSET_ID. When I only return that one column the query WITH the Skip() only takes .256 milliseconds.

var x = 30;

var results = context.ASSET_T
.OrderBy(c => c.ASSET_ID)
.Skip(x)
.Take(x)
.Select(c => c.ASSET_ID)
.ToList();

If I include any additional columns then the query execution time increases DRAMATICALLY.

The query below for example takes 10 seconds to execute.

var x = 30;

var results = context.ASSET_T
.OrderBy(c => c.ASSET_ID)
.Skip(x)
.Take(x)
.Select(c => new {
                 ASSET_ID = c.ASSET_ID,
                 ASSET_TYP = c.ASSET_TYP
                 ASSET_DESC = c.ASSET_DESC
                 })
.ToList();

UPDATE: I have now discovered that there are issues (perhaps index-related) with the columns in the table I am trying to query. As I mentioned above, when I execute a query that only returns the ASSET_ID column it only takes .256 milliseconds. If I try to execute a query that ONLY returns ASSET_DESC or a query that ONLY returns ASSET_TYP then the query execution time jumps to around 9 seconds.

Would this indicate that those other columns are not currently being indexed?

UPDATE: I have added the SQL output from the above LINQ query.

SELECT 
Project1.C1 AS C1, 
Project1.ASSET_ID AS ASSET_ID, 
Project1.ASSET_TYP AS ASSET_TYP, 
Project1.ASSET_DESC AS ASSET_DESC
FROM ( SELECT Project1.ASSET_ID AS ASSET_ID, Project1.ASSET_TYP AS ASSET_TYP, Project1.ASSET_DESC AS ASSET_DESC, Project1.C1 AS C1, row_number() OVER (ORDER BY Project1.ASSET_ID ASC, Project1.ASSET_TYP ASC, Project1.ASSET_DESC ASC) AS row_number
  FROM ( SELECT 
    Extent1.ASSET_ID AS ASSET_ID, 
    Extent1.ASSET_TYP AS ASSET_TYP, 
    Extent1.ASSET_DESC AS ASSET_DESC, 
    CAST(1 AS int) AS C1
    FROM MYDB.ASSET_T AS Extent1
  )  AS Project1
)  AS Project1
WHERE Project1.row_number > 1
ORDER BY Project1.ASSET_ID ASC, Project1.ASSET_TYP ASC, Project1.ASSET_DESC ASC FETCH  FIRST 31 ROWS ONLY 

Best Solution

Have you looked at the SQL that get generated for this query?

As far as I know Skip() Take() eventually results in a generated statement that uses a function called Row_Number(). This function is executed across the entire record set in the fashion shown below - To insert the row number as the first generated column in the result before taking values between the start and end values you desire, typically making it very very slow, on large record sets..

SELECT ...
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY [t0].[...]) AS [ROW_NUMBER], ... ,
    FROM [table] AS [t0]
    ) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN @p0 + 1 AND @p0 + @p1
ORDER BY [t1].[ROW_NUMBER]

If you can use an indexed numeric column and arrange it so that you read >= start_value AND <= end-value yourself, then move those values up by your paging amount it will use the index and return results in in milliseconds.

I have well indexed databases with 100's millions of records and Skip().Take() can take up to 30 minutes to obtain 25 records. Where are the direct read takes around 20-40ms.

It would mean you would have to think about the way you code to achieve paging, and may not be practicable to implement in your case.

Related Question