Sql – Group related records, but pick certain fields from only the first record

aggregate-functionsgroup-bypostgresqlsql

I'm preforming an aggregate function on multiple records, which are grouped by a common ID. The problem is, I also want to export some other fields, which might be different within the grouped records, but I want to get those certain fields from one of the records (the first one, according to the query's ORDER BY).

Starting point example:

SELECT
  customer_id,
  sum(order_total),
  referral_code
FROM order
GROUP BY customer_id
ORDER BY date_created

I need to query the referral code, but doing it outside of an aggregate function means I have to group by that field as well, and that's not what I want – I need exactly one row per customer in this example. I really only care about the referral code from the first order, and I'm happy to throw out any later referral codes.

This is in PostgreSQL, but maybe syntax from other DBs could be similar enough to work.

Rejected solutions:

  • Can't use max() or min() because order is significant.
  • A subquery might work at first, but does not scale; this is an extremely reduced example. My actual query has dozens of fields like referral_code which I only want the first instance of, and dozens of WHERE clauses which, if duplicated in a subquery, would make for a maintenance nightmare.

Best Solution

Well, it's actually pretty simple.

First, let's write a query that will do the aggregation:

select customer_id, sum(order_total)
from order
group by customer_id

now, let's write a query that would return 1st referral_code and date_created for given customer_id:

select distinct on (customer_id) customer_id, date_created, referral_code
from order
order by customer_id, date_created

Now, you can simply join the 2 selects:

select
    x1.customer_id,
    x1.sum,
    x2.date_created,
    x2.referral_code
from
    (
        select customer_id, sum(order_total)
        from order
        group by customer_id
    ) as x1
    join
    (
        select distinct on (customer_id) customer_id, date_Created, referral_code
        from order
        order by customer_id, date_created
    ) as x2 using ( customer_id )
order by x2.date_created

I didn't test it, so there could be typos in it, but generally it should work.