Optimizing SQL queries can be done in many ways, but the current case that I am going to show you is a little bit different.
Last week my direct manager asked me to see why a list of data loads very slow. Also, if it was possible, to optimize it. The data set consisted of around 1000 rows. The load time for all rows was about 7 seconds. Even though each row was an elaborate business entity, that did not justify the load time. As every sane developer could see, there was something wrong.
We are using Entity Framework, so I started looking at the LINQ statement. It was a complex statement with a lot of joins. Also, the statement had some dynamic query building. But these things were not the problem.
A few lines below, I saw several fat subqueries. They also had many joins, filters end, etc. But the most “exciting” part was that they were the same. They only differed by a single where clause.
(simplified pseudo-code example)
from organization in organisations join… join… join… join… where… select { … other fields PPartners = (very complex query with WHERE p.Type == Type.Primary).ToList(), SPartners = (very complex query with WHERE p.Type == Type.Secondary).ToList(), OPartners = (very complex query with WHERE p.Type == Type.Other).ToList() }
I instantly understood that I had found the problem. I removed the subqueries to test how the load time will change. It dropped under a second.
Abstracting the sub queries slow execution time, you can imagine how big was the flattened dataset transferred from the SQL server.
Optimizing SQL queries or “Devide et impera”
In some cases, SQL query optimization will not help you. Yes, you can try to squeeze another 100 milliseconds using various technics, but you will hit a brick wall sooner or later.
Some of you will say that queries like these are avoidable with better DB design. I agree to some extent. As you know, we cannot predict everything or optimize for everything.
So, what was the solution that I introduced? I extracted the subqueries into a single query without any filtering (I pulled the data for all the partners). When I had both datasets loaded into the memory, I built the proper object that I needed.
Before doing this, I ensured that the datasets’ maximum size could not be more than a couple of thousands of rows!
After that refactoring, the load time dropped to a second and a half. Could I make it faster? Sure! Is it worth it? At this point, no.
In short – sometimes it is much better to split your queries into two or even several ones.
When you find yourself writing a very complex query getting “everything” from the DB, think about this case.