Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: merge parquet files #3545

Merged
merged 9 commits into from
May 20, 2024
Merged

Conversation

taimingl
Copy link
Collaborator

Load small parquet files into memory to create a merged RecordBatch to be sorted directly instead of using querying DataFusion for the same purpose.

Merging 6 parquet files, each with 20 row by 30k columns

Before
openobserve::service::compact::merge: merge_parquet_files took: 35.473
openobserve::service::compact::merge: merge_parquet_files took: 37.249
openobserve::service::compact::merge: merge_parquet_files took: 38.337
openobserve::service::compact::merge: merge_parquet_files took: 38.759
openobserve::service::compact::merge: merge_parquet_files took: 38.920

After
openobserve::service::compact::merge: merge_parquet_files took 3.399
openobserve::service::compact::merge: merge_parquet_files took 3.781
openobserve::service::compact::merge: merge_parquet_files took 4.017
openobserve::service::compact::merge: merge_parquet_files took 4.162
openobserve::service::compact::merge: merge_parquet_files took 4.441

Load small parquet files into memory to create a merged RecordBatch to be sorted directly instead of using querying DataFusion for the same purpose.
@taimingl taimingl changed the title Perf: merge parquet files w/o dtfusionmain perf: merge parquet files w/o dtfusionmain May 20, 2024
@taimingl taimingl changed the title perf: merge parquet files w/o dtfusionmain perf: merge parquet files without DataFusion May 20, 2024
@taimingl taimingl changed the title perf: merge parquet files without DataFusion perf: merge parquet files May 20, 2024
@taimingl taimingl marked this pull request as ready for review May 20, 2024 15:11
src/config/Cargo.toml Outdated Show resolved Hide resolved
@taimingl taimingl merged commit 4b5a708 into main May 20, 2024
28 of 30 checks passed
@taimingl taimingl deleted the perf/compactor-merge-files-wo-dtfusionmain branch May 20, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants