Today, I wanted to go over a relatively simple MySQL feature that a lot of people don’t understand: SQL_CALC_FOUND_ROWS. To use this mystical keyword, simply put it in your query right after the SELECT statement. For example:
SELECT * FROM USER WHERE id > 10 LIMIT 2,1 --see just second record
Becomes
SELECT SQL_CALC_FOUND_ROWS * FROM USER WHERE id > 10 LIMIT 2,1
This won’t change your results. It may, however, make your query run slower than when you select just one row the regular way. What this statement does is tell MySQL to find out just how many total records exist that match your criteria (in this case, where id is bigger than 10). For example, let’s assume that the user table has 100 records that have an id bigger than 10, then the query will take as long as it would have taken for the engine to find those 100 records.
The returned result will still be one the records you are expecting (in this case, the second record it found). But here is where the magic starts: If the very next query you run is a special select statement, you will have access to the total that was found. As in:
SELECT FOUND_ROWS(); --returns 100
The MySQL documentation on this subject says:
[SELECT FOUND_ROWS()] returns a number indicating how many rows the first
SELECT
would have returned had it been written without theLIMIT
clause. In the absence of theSQL_CALC_FOUND_ROWS
option in the most recentSELECT
statement,FOUND_ROWS()
returns the number of rows in the result set returned by that statement.
No matter what your LIMIT clause looks like (such as LIMIT 10, 1), this second query will still return the same number (in this example, 100). Why is this useful? Pagination. Often times, beginners (including me a few years ago) are stuck doing something like this:
SELECT count(*) FROM USER WHERE id > 10 --figure out how many total records there are SELECT * FROM USER WHERE id > 10 LIMIT 50, 1 --get to record #50
People do this because you need the total to know if other matching results exist or what the last page number is.
This requires the engine to run the same query twice. This can be disastrous in cases where that query already takes a very long time to run. By including SQL_CALC_FOUND_ROWS, the overhead of running that count is grouped up with the process of actually retrieving the row of interest. So while the initial query might take a little longer to run than if you hadn’t tried to do a count, it is definitely faster than running the same query twice.
To take this to the next level, your pagination code should omit the use of SQL_CALC_FOUND_ROWS in subsequent page loads by caching the total count in the URL or session.
Happy hunting!
Sounds interesting, but how safe is it? Can it be realistically relied upon where you have more than one user accessing the db? Does the calculated value persist between different and/or subsequent connections?
Also, how do you flush the value when you change the where clause?
It is safe. It is session specific. “session” means for a single connection instance (such as a page load). It does not persist, and it is not shared (unless you are using connection pooling).
You “flush” it by making a new SELECT query statement. From the MySQL documentation:
“In the absence of the SQL_CALC_FOUND_ROWS option in the most recent SELECT statement, FOUND_ROWS() returns the number of rows in the result set returned by that statement.”
Correct me if I’m wrong, but in relation to Ben’s question about safety: if you do cache the total number of rows in the URL you need to be careful in instances where other users might be making submissions to the database. That could cause a scenario where user A’s session thinks there are a total of 100 rows, but user B just added 10 more. If user A is viewing all of those records with a paginated navigation, that there’s the potential for conflicts and you’d need to plan/code accordingly.
of course. but in many cases, knowing a general total is “good enough.” facebook did this. even google does this. in cases where knowing the exact total in a constantly changing table is a requirement, you would obviously avoid this solution. however, in a 40 million record result set for a search, being off one or two is no big deal. however, reducing subsequent page loads (after the caching) from 5 minutes down to 0.2 seconds IS a big deal.
This is not “definitely faster” than using count(*). In fact, when I tried it, the result was many times slower than the original two queries combined.
From the comments on http://dev.mysql.com/doc/refman/5.0/en/information-functions.html
Posted by Wade Bowmer on May 14 2006 11:40pm
Be aware that using SQL_CALC_FOUND_ROWS and FOUND_ROWS() disables ORDER BY … LIMIT optimizations (see bugs http://bugs.mysql.com/bug.php?id=18454 and http://bugs.mysql.com/bug.php?id=19553). Until it’s fixed, you should run your own benchmarks with and without it.
btw: LIMIT 2,1 will return the third row, not the second row.