Sql – the best way to represent a many-to-many relationship between records in a single SQL table


I have a SQL table like so:

Update: I'm changing the example table as the existing hierarchical nature of the original data (State, Cities, Schools) is overshadowing the fact that a simple relationship is needed between the items.

id      name               
1       Apple     
2       Orange            
3       Banana             
4       Carrot                
5       Mushroom        

I want to define two-way relationships between these entities so a user viewing one entity can see a list of all related entities.

The relationships are defined by an end user.

What is the best way to represent these relationships in the database and subsequently query and update them?

One way as I see it…

My instinct says a relationship table like so:

entity_id_a       entity_id_b
1                 2
5                 1
4                 1
5                 4
1                 3

That being the case, given a supplied entity_id of 4, how would one get all related records, which would be 1 and 5?

Likewise a query of entity_id = 1 should return 2, 3, 4, and 5.

Thanks for your time and let me know if I can clarify the question at all.

Best Solution

Define a constraint: entity_id_a < entity_id_b.

Create indexes:

CREATE UNIQUE INDEX ix_a_b ON entity_entity(entity_id_a, entity_id_b);
CREATE INDEX ix_b ON entity_entity(entity_id_b);

Second index doesn't need to include entity_id_a as you will use it only to select all a's within one b. RANGE SCAN on ix_b will be faster than a SKIP SCAN on ix_a_b.

Populate the table with your entities as follows:

INTO entity_entity (entity_id_a, entity_id_b)
VALUES (LEAST(@id1, @id2), GREATEST(@id1, @id2))

Then select:

SELECT entity_id_b
FROM entity_entity
WHERE entity_id_a = @id
SELECT entity_id_a
FROM entity_entity
WHERE entity_id_b = @id

UNION ALL here lets you use above indexes and avoid extra sorting for uniqueness.

All above is valid for a symmetric and anti-reflexive relationship. That means that:

  • If a is related to b, then b is related to a

  • a is never related to a