-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508
Conversation
…or Q35. More refactoring tba
if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL | ||
&& nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this could be safely relaxed
if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL | ||
&& nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) { | ||
|
||
CharSequenceIntHashMap nestedCandidates = new CharSequenceIntHashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a lighter option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this map into a field and reuse it between the invocations.
@@ -1447,6 +1447,49 @@ public void testOrderingOfSortsInSingleTimestampCase() throws Exception { | |||
}); | |||
} | |||
|
|||
@Test | |||
public void testRewriteTrivialExpressionsBasic() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more tests would be good
…_rewrite_trivial_expr
[PR Coverage check]😍 pass : 49 / 50 (98.00%) file detail
|
We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here: questdb/core/src/main/java/io/questdb/griffin/engine/groupby/vect/GroupByRecordCursorFactory.java Line 82 in fe2aeca
|
I have a better idea: we should disable keyed Rosti for all types but SYMBOL. That's because SYMBOL type has low cardinality, while it's not always the case for INT or IPv4. So, for high cardinality columns Java-based GROUP BY will be faster than the Rosti one. @bluestreak01 WDYT? |
how does it compare effort wise with improving Rosti? After all it is C++ map, we have more options there? What do you think? |
This is not a trivial thing to do, but it's certainly feasible. The thing is that keyed Rosti has a very limited usage, so I'm not sure if it's worth spending the time just to speed up INT and IPv4 keys case. |
Plan is to redo this more generically, with bug fix included. Rosti/Async optimisations are a separate issue but should be addressed before next release, so as to preserve performance of this type of query on large boxes. |
Closes #4141
This relates to performance around Clickbench Q35.
For Clickbench Q35 on M2 Mac Mini, this speeds up the query from 1.7s to 0.9s.
The rewritten query runs using a Rosti implementation and an early limit, instead of async group by and a late limit.
With changes but async group by instead of rosti, it runs in 1.18s.
Query:
Before change:
Execute: 1.78s
After change:
Execute: 915.35ms