Sql – Subquery to return the latest entry for each parent ID

db2sql

I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.

I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.

Thus for


    DOCUMENTS
    ID | NAME
    1  | Document 1
    2  | Document 2
    3  | Document 3
    4  | Document 4
    5  | Document 5

    HISTORY
    DOC_ID | USER_ID | TIMESTAMP
    1      | 12345   | TODAY
    1      | 11111   | IN THE PAST
    1      | 11111   | IN THE PAST
    1      | 12345   | IN THE PAST
    2      | 11111   | TODAY
    2      | 12345   | IN THE PAST
    3      | 12345   | IN THE PAST

I'd be looking to get a return from my search like


    ID | NAME       | LAST_USER_ID
    1  | Document 1 | 12345
    2  | Document 2 | 11111
    3  | Document 3 | 12345
    4  | Document 4 | 
    5  | Document 5 | 

Can I easily do this with one SQL query and a join between the two tables?

Best Solution

Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":

SELECT d.id, d.name, h.last_user_id
    FROM Documents d LEFT JOIN
         (SELECT r.doc_id AS id, user_id AS last_user_id
              FROM History r JOIN
                   (SELECT doc_id, MAX("timestamp") AS "timestamp"
                        FROM History
                        GROUP BY doc_id
                   ) AS l
                   ON  r."timestamp" = l."timestamp"
                   AND r.doc_id      = l.doc_id
         ) AS h
         ON d.id = h.id

I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.

The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.