Sql – distinct sql query


I have a simple table with just name and email called name_email.

I am trying to fetch data out of it so that:
If two rows have the same name, but one has an email which is ending with ‘@yahoo.com’ and the other has a different email, then the one with the ‘@yahoo.com’ email should be discarded.

what would be best way to get this data out?

Best Solution

Okay, I'm not going to get involved in yet another fight with those who say I shouldn't advocate database schema changes (yes, you know who you are :-), but here's how I'd do it.

1/ If you absolutely cannot change the schema, I would solve it with code (either real honest-to-goodness procedural code outside the database or as a stored procedure in whatever language your DBMS permits).

This would check the database for a non-yahoo name and return it, if there. If not there, it would attempt to return the yahoo name. If neither are there, it would return an empty data set.

2/ If you can change the schema and you want an SQL query to do the work, here's how I'd do it. Create a separate column in your table called CLASS which is expected to be set to 0 for non-yahoo addresses and 1 for yahoo addresses.

Create insert/update triggers to examine each addition or change of a row, setting the CLASS based on the email address (what it ends in). This guarantees that CLASS will always be set correctly.

When you query your table, order it by name and class, and only select the first row. This will give you the email address in the following preference: non-yahoo, yahoo, empty dataset.

Something like:

select name, email
from tbl
where name = '[name]'
order by name, class
fetch first row only;

If your DBMS doesn't have an equivalent to the DB2 "fetch first row only" clause, you'll probably still need to write code to only process one record.

If you want to process all names but only the specific desired email for that name, a program such as this will suffice (my views on trying to use a relational algebra such as SQL in a procedural way are pretty brutal, so I won't inflict them on you here):

# Get entire table contents sorted in name/class order.
resultSet = execQuery "select name, email from tbl order by name, class"

# Ensure different on first row
lastName = resultSet.value["name"] + "X"

# Process every single row returned.
while not resultSet.endOfFile:
    # Only process the first in each name group (lower classes are ignored).
    if resultSet.value["name"] != lastName:
        processRow resultSet.value["name"] resultSet.value["email"]
    # Store the last name so we can detect next name group.
    lastName = resultSet.value["name"]