-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyzer support correlated subqueries #64050
base: master
Are you sure you want to change the base?
Analyzer support correlated subqueries #64050
Conversation
This is an automated comment for commit ce6cc50 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
SELECT column
)
SELECT column
)
May I ask what kinds of correlated queires are supported here? In my mind, de-correlated is like transfering
to
but in the test I only saw column in |
Automatic decorellation is not possible for all types correlated subqueries, and can be implemented later as optimization on top of this pull request. |
This would be really hard to do. Implementing correlated subqueries as a separate logical plan step and decorrelation on top of the query plan is a standard today. The algorithm provided in Neumann, Thomas, and Alfons Kemper. paper "Unnesting arbitrary queries." is usually used to support it in DMBS. I think implementing it as a separate function is a bad design decision. In the future it'll require to rewrite it completely as a separate query plan step. Also, I expect it to be very slow because this implementation for each input row:
I don't think it can be used in a real world production. This feature requires adding more functional and performance tests. |
It is not possible to decorrelate correlated subqueries in all scenarious, so we will need to have fallback to slow execution anyway (on top of logical query plan or just with separate function). Also not sure how decorrelation will work with We can push correlated subqueries to logical query plan around expression/filter steps that depend on result of that correlated subquery. Will try to check if this is easy. This feature is required for standard SQL support, it is expected that initial implementation can be incomplete. |
Actually, the paper I mentioned states that it's possible to decorrelate arbitrary query but it's not always possible to express it in SQL. But this is still a questionable point of discussion, there are 2 options for what to do when it's impossible to automatically decorrelate subquery:
I'm not sure if it's necessary to be able to run arbitrary correlated subquery or we can run only those we could decorrelate (as many DBMS do).
It can be done using a JOIN. Anyways, it can not be supported with this approach too, because set is not a scalar.
I find the ad hoc implementation in this PR problematic: it's not extensible, and it will be hard to improve correlated subqueries support on top of it:
In conclusion: I don't know if we want to have nested loop implementation for correlated subqueries, but if we do, it should be implemented on a query plan level, so we won't rewrite the implementation completely when we try to support the general case. |
Unrelated, but having nested loop implementation (w cache) nice in general (for joins it also can be beneficial) |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Analyzer supports correlated subqueries.