Hello guys, the difference between correlated and non-correlated subquery is one of the popular SQL interview question and an important concept for any programmer to learn. The correlated subquery is one of the tricky concepts of SQL. It's similar to recursion in programming which many programmers struggle to understand, but like recursion, it also offers the unique capability to solve many SQL query-based problems like the second-highest salary problem where you need to compare one row of the table to another row. It gives you a different kind of power. The main difference between a regular, non-correlated, and correlated subquery in SQL is in their working, a regular subquery just runs once and returns a value or a set of values that is used by the outer query.
While correlated subquery runs for each row returned by the outer query because the output of the whole query is based upon comparing the data returned by one row to all other rows of the table. That's why it is also very slow and generally avoided until you don't know any other way to solve the problem.
One of the most popular examples of the correlated subquery is about finding the second highest salary of an employee in a given table. Even though there are multiple ways to solve this problem like you can use window functions like row_number or rank but using a regular subquery to solve this problem is the easiest way.
Btw, even though you can solve this problem by using a regular query it becomes tricky when the Interviewer extends this problem to find the Nth highest salary then you just can't go with regular subquery because of unlimited nesting.
Correlated subquery solves this problem elegantly as shown here. It compares data returned by an outer query like salary and compares with other salaries to find out exactly how many salaries are higher than this salary.
In this article, I am going to compare correlated subquery with the regular one of different parameters e.g. their working, speed, performance, and dependency. I'll also give you a good example of a correlated subquery e.g. the Nth highest salary problem and explain how exactly it solves the problem.
So, if the interviewer asks you to find the 4th highest salary then there can only be at most 4 salaries which is equal to or greater than the 4th highest salary. This is just an example, you can use a correlated subquery to solve many such problems in the world of data and SQL.
In short, here are the main difference between correlated and non-correlated subqueries in SQL
For example, the following query is an example of a non-correlated subquery:
Here the subquery is SELECT MAX(Salary) from Employee, you can execute and substitute the result of that query e.g. if subquery return 10000 then the outer query is reduced to
This is not possible with a correlated subquery, which needs to be executed multiple times as shown below:
In this example, the subquery is SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary, you cannot swap its value for the outer query because it needs to be executed for each employee.
Let's say the first row of employees has a salary of 5000, in this case, e.salary will be 500 and subquery will be
and subquery will find how many salaries are higher than 5000 if count return 2 then it's the second-highest salary. This logic needs to be executed for each row the outer query will process.
From the above example, you can see that a correlated subquery like SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary depends upon outer query because it needs the value of e.salary, which comes from the table listed on the outer query.
On the other hand, regular subquery, SELECT MAX(Salary) FROM Employee doesn't depends upon the outer query and can be executed in isolation or independently of the outer query. You can further join these free SQL and database courses to learn more about Correlated and non-correlated subqueries and how to use them.
This happens because to execute a non-correlated subquery you need to examine just n rows of the table and similar to execute the outer query you need to examine n rows, so in total n + n = 2n rows.
This is the reason you should be very careful using a correlated subquery with large tables e.g. tables with millions of rows because that can take a long time and could potentially block other jobs and queries from accessing the table.
In many cases, you can replace correlated subquery with inner join which would result in better performance. For example, to find all employees whose salary is greater than the average salary of the department you can write the following correlated subquery:
Now, you can convert this correlated subquery to a JOIN based query for better performance as shown below:
If you want to learn more about how changing a query from a subquery to join result in better performance and other small changes which gives big performance boost, I strongly suggest you read a good book on SQL query and performance optimization like SQL Performance Explained by Markus Winand.
That's all about the difference between correlated and non-correlated subquery in SQL. You have learned that correlated subquery is executed for each row returned by an outer query, which makes it very slow, but at the same time gives it the power to compare one row of the table to other rows of the table. That's why sometimes the only solution possible was only by using a correlated subquery.
On the other hand regular or non-correlated subquery return a result which is then used by the outer query. It is only executed one time and not for every row returned by the outer query, hence it is faster than a correlated subquery.
Other SQL Interview Questions articles you may like to explore
Thanks for reading this article so far. If you like this SQL Interview question and explanation then please share it with your friends and colleagues. If you have any questions, suggestions,s or feedback then please drop a comment and I'll try to answer it.
One of the most popular examples of the correlated subquery is about finding the second highest salary of an employee in a given table. Even though there are multiple ways to solve this problem like you can use window functions like row_number or rank but using a regular subquery to solve this problem is the easiest way.
Btw, even though you can solve this problem by using a regular query it becomes tricky when the Interviewer extends this problem to find the Nth highest salary then you just can't go with regular subquery because of unlimited nesting.
Correlated subquery solves this problem elegantly as shown here. It compares data returned by an outer query like salary and compares with other salaries to find out exactly how many salaries are higher than this salary.
Difference between Correlated and Non-correlated Subquery in SQL
The difference between correlated and regular subquery is also a frequently asked SQL interview question. Mostly asked on a telephonic interview where they cannot ask you to solve queries and check the fundamentals and theoretical concepts.In this article, I am going to compare correlated subquery with the regular one of different parameters e.g. their working, speed, performance, and dependency. I'll also give you a good example of a correlated subquery e.g. the Nth highest salary problem and explain how exactly it solves the problem.
So, if the interviewer asks you to find the 4th highest salary then there can only be at most 4 salaries which is equal to or greater than the 4th highest salary. This is just an example, you can use a correlated subquery to solve many such problems in the world of data and SQL.
1. Working
A non-correlated subquery is executed only once and its result can be swapped back for a query, on the other hand, a correlated subquery is executed multiple times, precisely once for each row returned by the outer query.For example, the following query is an example of a non-correlated subquery:
SELECT MAX(Salary) FROM Employee WHERE Salary NOT IN ( SELECT MAX(Salary) FROM Employee)
Here the subquery is SELECT MAX(Salary) from Employee, you can execute and substitute the result of that query e.g. if subquery return 10000 then the outer query is reduced to
SELECT MAX(Salary) from Employee where Salary NOT IN (10000).
This is not possible with a correlated subquery, which needs to be executed multiple times as shown below:
SELECT e.Name, e.Salary FROM Employee e WHERE 2 = ( SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary)
In this example, the subquery is SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary, you cannot swap its value for the outer query because it needs to be executed for each employee.
Let's say the first row of employees has a salary of 5000, in this case, e.salary will be 500 and subquery will be
SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= 5000
and subquery will find how many salaries are higher than 5000 if count return 2 then it's the second-highest salary. This logic needs to be executed for each row the outer query will process.
2. Dependency
A correlated subquery depends upon the outer query and cannot execute in isolation, but a regular or non-correlated subquery doesn't depend on the outer query and can execute in isolation.From the above example, you can see that a correlated subquery like SELECT COUNT(Salary) FROM Employee p WHERE p.salary >= e.salary depends upon outer query because it needs the value of e.salary, which comes from the table listed on the outer query.
On the other hand, regular subquery, SELECT MAX(Salary) FROM Employee doesn't depends upon the outer query and can be executed in isolation or independently of the outer query. You can further join these free SQL and database courses to learn more about Correlated and non-correlated subqueries and how to use them.
3.Speed and Performance
A correlated subquery is much slower than a non-correlated subquery because in the former, the inner query executes for each row of the outer query. This means if your table has n rows then whole processing will take the n * n = n^2 time, as compared to 2n times taken by a non-correlated subquery.This happens because to execute a non-correlated subquery you need to examine just n rows of the table and similar to execute the outer query you need to examine n rows, so in total n + n = 2n rows.
This is the reason you should be very careful using a correlated subquery with large tables e.g. tables with millions of rows because that can take a long time and could potentially block other jobs and queries from accessing the table.
In many cases, you can replace correlated subquery with inner join which would result in better performance. For example, to find all employees whose salary is greater than the average salary of the department you can write the following correlated subquery:
SELECT e.id, e.name FROM Employee e WHERE salary > ( SELECT AVG(salary) FROM Employee p WHERE p.department = e.department)
Now, you can convert this correlated subquery to a JOIN based query for better performance as shown below:
SELECT e.id, e.name FROM Employee INNER JOIN (SELECT department, AVG(salary) AS department_average FROM Employee GROUP BY department) AS t ON e.department = t.department WHERE e.salary > t.department_average;
If you want to learn more about how changing a query from a subquery to join result in better performance and other small changes which gives big performance boost, I strongly suggest you read a good book on SQL query and performance optimization like SQL Performance Explained by Markus Winand.
That's all about the difference between correlated and non-correlated subquery in SQL. You have learned that correlated subquery is executed for each row returned by an outer query, which makes it very slow, but at the same time gives it the power to compare one row of the table to other rows of the table. That's why sometimes the only solution possible was only by using a correlated subquery.
On the other hand regular or non-correlated subquery return a result which is then used by the outer query. It is only executed one time and not for every row returned by the outer query, hence it is faster than a correlated subquery.
Other SQL Interview Questions articles you may like to explore
- How to find the second highest salary in MySQL? (solution)
- What is the difference between UNION and UNION ALL in SQL? (answer)
- The difference between TRUNCATE and DELETE in SQL? (answer)
- The difference between self and equi-join in SQL? (answer)
- What is the difference between View and Materialized View in the Oracle database? (answer)
- How to find the second highest salary in Oracle using ROW_NUMBER? (answer)
- The difference between WHERE and HAVING clause in SQL? (answer)
- The difference between LEFT and RIGHT OUTER JOIN in SQL? (answer)
- How to find duplicate records in a table? (query)
Thanks for reading this article so far. If you like this SQL Interview question and explanation then please share it with your friends and colleagues. If you have any questions, suggestions,s or feedback then please drop a comment and I'll try to answer it.
No comments:
Post a Comment
Feel free to comment, ask questions if you have any doubt.