Object Relational Mappers Series - The N+1 SELECTs Problem

This is the next post on ORMs. Suppose you have a collection of Category objects, which are really just rows in a Category table. Each Category has a collection of Products. It's not uncommon to want a listing of Categories with their Products. In SQL we would have code that looks something like this:

SELECT *
FROM Category c JOIN Product p on C.Id = P.CategoryId

This will run fairly fast. An ORM, however, will likely do something like this:

SELECT * FROM Category
--ForEach Category...
SELECT * FROM Product WHERE CategoryId = ?

That's very inefficient. The problem that has been showcased here is commonly referred to as the N+1 Select problem. The problem is that, for every record in the initial select, we are running an additional select to retrieve data that we need that was not retrieved with the initial select.

Most ORMs will give you a way to handle this gracefully. However, since the ORM is abstracting the SQL call, you may not discover the issue until you have a full-sized db that is experiencing performance problems. The larger the N, the worse the performance. Some people call this ripple loading. Others call it the N+1 SELECTs Problem. Hibernate specifically warns developers about the N + 1 problem. But what is their solution?

Using lazy initialization can help prevent unnecessary selects queries being executed when retrieving an object graph using HQL or Query by Criteria (QBC). However, to avoid the N+1 selects proble, the application developer needs to be careful to tune queries so that they match the way returned data is used in the application.

(Emphasis in original). So again, using an ORM still means understanding your data model, understanding query tuning, understanding SQL, etc. So, what did the ORM buy us?

Dave Wentzel 2013-02-25 CONTENT
data architecture orm