Ok, I'm hitting a slight problem in a database I'm working with in MySQL. It is a temporary database for cleaning some data from a less normalized database and it has three tables that are giving me some trouble:
create table author (
id int primary key default null auto_increment,
name text not null);
create table source (
id int primary key default null auto_increment,
name text not null);
create table article (
id int primary key default null auto_increment,
title text not null,
author_id int default null references author(id),
source_id int default null references source(id));
Because of the structure of the database I am modeling, I know that stories with more than one author just have all the author names concatenated into a comma separated list and the system I'm importing into is no better, so there is no need to fix this. Some sample data might look like:
insert into author values (1, "Bob"), (2, "Roger and Mary"), (3, "Mary");
insert into source values (1, "Daily News"), (2, "WXQZ-TV");
insert into article values (null, "Bob's Big Day", 1, 1), (null, "Mary at the Beach", 2, 1),
(null, "Burger World", null, 2), (null, "Mien Kampf", null, null),
(null, "Summertime", 2, null);
A reasonable way then to reconstruct the data seemed to be:
select title, author.name as author, source.name as source
from article, author, source
where article.author_id = author.id and article.source_id = source.id;
title | author | source |
---|---|---|
Bob's Big Day | Bob | Daily News |
Mary at the Beach | Roger and Mary | Daily News |
So, no items that have null foreign keys are included in that result. How might I write it so as to include them?
select title, author.name as author, source.name as source
from article, author, source
where (article.author_id = author.id or article.author_id is null) and article.source_id = source.id;
title | author | source |
---|---|---|
Bob's Big Day | Bob | Daily News |
Burger World | Bob | WXQZ-TV |
Mary at the Beach | Roger and Mary | Daily News |
Burger World | Roger and Mary | WXQZ-TV |
Burger World | Mary | WXQZ-TV |
So articles with null
author_id
s are paired with every author. This is not what I am hunting for. For the answer I just have to remember SQL 101: join types…
- Cartesian product • pair every element of the first table with every element of the second:
select title, author.name as author, source.name as source from article, author, source;
(returns 30 rows = 2 * 3 * 5)- inner join • returns all rows from both tables where there is a match
- left join • returns all the rows from the first table, even if there are no matches in the second table
- right join • returns all the rows from the second table, even if there are no matches in the first table
A left join is what I want. For example, when I do:
select title, author.name as author from article left join (author) on (article.author_id = author.id);
title | author |
---|---|
Bob's Big Day | Bob |
Mary at the Beach | Roger and Mary |
Burger World | NULL |
Mien Kampf | NULL |
Summertime | Roger and Mary |
The problem is when I add the source:
select title, author.name as author, source.name as source
from article left join (author, source)
on (article.author_id = author.id and article.source_id = source.id);
title | author | source |
---|---|---|
Bob's Big Day | Bob | Daily News |
Mary at the Beach | Bob | NULL |
Burger World | Bob | NULL |
Mien Kampf | Bob | NULL |
Summertime | Bob | NULL |
Bob's Big Day | Roger and Mary | NULL |
Mary at the Beach | Roger and Mary | Daily News |
Burger World | Roger and Mary | NULL |
Mien Kampf | Roger and Mary | NULL |
Summertime | Roger and Mary | NULL |
Bob's Big Day | Mary | NULL |
Mary at the Beach | Mary | NULL |
Burger World | Mary | NULL |
Mien Kampf | Mary | NULL |
Summertime | Mary | NULL |
So I'm getting the Cartesian product of article
and artist
left joined with source
. Not what I want… Looking at the syntax a bit more, I need to be chaining my left joins:
select title, author.name as author, source.name as source
from article left join (author) on (article.author_id = author.id)
left join (source) on (article.source_id = source.id);
title | author | source |
---|---|---|
Bob's Big Day | Bob | Daily News |
Mary at the Beach | Roger and Mary | Daily News |
Burger World | NULL | WXQZ-TV |
Mien Kampf | NULL | NULL |
Summertime | Roger and Mary | NULL |
Voila.