How embarrassing. I learned something new today that I really should have known for some number of years now. Left joins can increase the result set size.
Here’s what I thought left joins do: When you combine two tables together with a left join, the source table (the one on the left) becomes the “anchor” for the results, guaranteeing that each and every record in the left table shows up in the result. If there are results in the right table that don’t correspond, those results are omitted. If there are results in the left table that don’t have corresponding records with the right table, those records are shown either way. For example…
Let’s say table A has 10 records pertaining to people’s names. And table B has five records pertaining to where those people live. No people live in two places.
If you did a left join on these two tables, you’d end up with five people and their addresses and five people (NULL sets) with no address information.
And…
Let’s say table A has 10 records pertaining to people’s names. And table B has 12 records pertaining to where those people live, where each person in A has a record in B. But two of those records don’t match up with anything in table A because some person records were accidentally deleted (oh no!).
If you did a left join on these two tables, you’d end up with 10 people with information about where each one lives. The extra records in B are simply ignored.
Okay. That part was easy. Everybody knows that, even your grandmother. Let’s take this a few notches up.
Now if table A has 10 records pertaining to people’s names. And table B has 15 records pertaining to where people live. And this time, those extras are no mistake! Because a bunch of people live in two places, thanks to vacation homes.
If you did a left join on these two tables, what happens? Well, embarrassingly, I predicted this sucker wrong. Assuming all 10 people from A are mentioned in B with some mentioned twice or more, the result would have 15 records!! What!? 15!? Yeah, that was my reaction too. I thought MySQL would spit back 10 and ignore duplicates in B.
Let’s do one more example. How many records will we find if we join the following scenario:
Table A has 10 records pertaining to people’s names. And table B has 15 records pertaining to where people live. One guy has 15 vacation homes and everybody else is homeless (no records in B).
Ok. Do a left join. Not an inner join. Not a regular join. A left join. How many results do we get, huh?
Our result would be 24! Who the hell guessed that? Well, probably some of my more pretentious Computer Science readers, but certainly not me (so that’s what you learn in CS, huh?). It is 24 because you have 15 duplicate records for the one rich guy and 9 default records for the homeless saps.
Thus, the maximum number of records a left join can yield is sizeof(record set A) + sizeof(record set B) – 1. Why is this never explicitly mentioned!
For a long time, I thought left joins meant the result set can never be more than the row count of the result set in the left table. I don’t know how I managed to go through this many years without realizing my error, but I suppose through good query structuring and table use, I never encountered a problem with this until now… And, to my credit, it wasn’t a query I wrote either.
I have never seen this behavior mentioned in any documentation (even MySQL documentation). It seems to be an implicitly assumed function of the command. In fact, I found several examples out in “tutorials” about left joins, that conveniently left out mentioning this fact, but still showed it as an unexplained portion of their results. Nice.
For all of you non-Computer Science gurus, I hope you learned something new from reading this post. Wasted about an hour of my time.