Sql – Variant use of the GROUP BY clause in TSQL

group-bysqltsql

Imagine the following schema and sample data (SQL Server 2008):

OriginatingObject
----------------------------------------------
ID
1
2
3

ValueSet
----------------------------------------------
ID   OriginatingObjectID   DateStamp
1    1                     2009-05-21 10:41:43
2    1                     2009-05-22 12:11:51
3    1                     2009-05-22 12:13:25
4    2                     2009-05-21 10:42:40
5    2                     2009-05-20 02:21:34
6    1                     2009-05-21 23:41:43
7    3                     2009-05-26 14:56:01

Value
----------------------------------------------
ID   ValueSetID   Value
1    1            28
etc (a set of rows for each related ValueSet)

I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.

I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?

Best Solution

You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.

Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.

Multiple-column IN with GROUP-BY (probably faster):

SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
    SELECT OriginatingID, Max(DateStamp)
    FROM ValueSet
    GROUP BY OriginatingID
)

Correlated Subquery:

SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
    SELECT Max(DateStamp)
    FROM ValueSet V2
    WHERE V2.OriginatingID = O.ID
)