Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write speed drops by 10x after a few million row inserts unless database is restarted or flushed #25848

Open
moaazassali opened this issue May 19, 2024 · 4 comments
Labels
performance Performance-related questions

Comments

@moaazassali
Copy link

moaazassali commented May 19, 2024

Performance Issue

Write speed drops down from around 12M/s to 1.2M/s on my server after a few million rows are inserted.

Problem Description

I made a simple benchmark that uses the c# tdengine client library. I send 1000 batches across 1000 devices (1 batch per device) with 3000 rows per batch.

1- First run with 1000 batches (3M rows): write speed = ~12M/s
2- Second run with 1000 batches again: write speed = ~12M/s
3- Third run with 1000 batches: write speed = ~1-1.5M/s
4- Fourth run: ~1-1.5M/s
5- Fifth run: ~1-1.5M/s
6- Sixth run (random increase): ~5M/s
7- Seventh run (back to slow speed): ~1-1.5M/s
8- Eighth run: ~1-1.5M/s
9- Ninth run: ~1-1.5M/s
and so on...

None of the rows were deleted and I just kept adding more rows with new unique timestamps, so just purely new insertions are happening.
I tried up to 20 runs or so, and I also found that very few of them (just like the 6th run) randomly jumped to 3M/s or 5M/s sometimes.

But if I try restarting TDengine or running flush <db>; after the 3rd run, the write speed goes back up to 12M/s:
[3 runs have already happened before restart, so 9M rows already exist]
1- First run after restart: ~12M/s
2- Second run after restart: ~12M/s
3- Third run after restart: ~1-1.5M/s

I have then tried restarting multiple times after the 3rd benchmark run and the results were always the same: first two runs are fast, and third run is slower by 10 times.

Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • Server specs: AMD 2400G, 8GB DDR4 RAM, 512GB 7200rpm HDD
  • TDengine Version: 3.3.0.0

Additional Context

This inconsistency with the write speed after inserting a couple million rows doesn't happen with other databases like ClickHouse.

Therefore, we can rule out that the issue is within my benchmark code.

We can also safely rule out that the issue is within the taos-dotnet-connector library since the write speed issue is seemingly fixed by restarting/flushing the database after around 2 batches (6M rows) have been inserted

I am using parameter binding for insertions (not SQL strings) using BindRow() function in the c# library to ensure high write speed. Using SQL strings will make it even slower anyways.

I am also not running out of RAM. My server's memory usage never exceeded 3GB while running the benchmark.

Some things I already tried

Changing the DURATION parameter to 1 day and making sure every batch is only inserting into 1 unique day per device, but that didn't change anything.

@moaazassali moaazassali added the performance Performance-related questions label May 19, 2024
@moaazassali moaazassali changed the title Write speed drops by 10x after a few million row inserts unless database is restarted Write speed drops by 10x after a few million row inserts unless database is restarted or flushed May 19, 2024
@yu285
Copy link
Contributor

yu285 commented May 20, 2024

May I have your "show create database xxxx\G".

And the "describe tablename".

I need to get your database configuration and table structure.

@yu285
Copy link
Contributor

yu285 commented May 20, 2024

in the first time the speed went down , could you provide the capture of "top -Hp pidof taosd" ? we can check the thread workload during the problem.

@moaazassali
Copy link
Author

moaazassali commented May 20, 2024

show create database test\G (this is using the default parameters except for DURATION=1d and VGROUPS=8):

Database: test
Create Database: CREATE DATABASE `test` BUFFER 256 CACHESIZE 1 CACHEMODEL 'none' COMP 2 DURATION 1440m WAL_FSYNC_PERIOD 3000 MAXROWS 4096 MINROWS 100 STT_TRIGGER 1 KEEP 5256000m,5256000m,5256000m PAGES 256 PAGESIZE 4 PRECISION 'ms' REPLICA 1 WAL_LEVEL 1 VGROUPS 8 SINGLE_STABLE 0 TABLE_PREFIX 0 TABLE_SUFFIX 0 TSDB_PAGESIZE 4 WAL_RETENTION_PERIOD 3600 WAL_RETENTION_SIZE 0 KEEP_TIME_OFFSET 0 ENCRYPT_ALGORITHM 'none' S3_CHUNKSIZE 262144 S3_KEEPLOCAL 5256000m S3_COMPACT 0

describe test.devices:

             field              |          type          |   length    |        note        |     encode     |    compress    |     level      |
================================================================================================================================================
 ts                             | TIMESTAMP              |           8 |                    | delta-i        | lz4            | medium         |
 temperature                    | INT                    |           4 |                    | simple8b       | lz4            | medium         |
 pressure                       | DOUBLE                 |           8 |                    | delta-d        | lz4            | medium         |
 humidity                       | BIGINT                 |           8 |                    | simple8b       | lz4            | medium         |
 device_id                      | VARCHAR                |          10 | TAG                | disabled       | disabled       | disabled       |

top -Hp 341534 of first run with write speed of 11.6M/s (not full list of threads but sorted by highest CPU%):

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 341760 root      20   0 8548344 408336  18568 S   3.3   5.8   0:00.10 vnode-write
 341844 root      20   0 8548344 408336  18568 S   3.3   5.8   0:00.10 vnode-write
 341845 root      20   0 8548344 408336  18568 S   3.3   5.8   0:00.10 vnode-write
 341758 root      20   0 8548344 408336  18568 S   3.0   5.8   0:00.09 vnode-write
 341840 root      20   0 8548344 408336  18568 S   3.0   5.8   0:00.09 vnode-write
 341756 root      20   0 8548344 408336  18568 S   2.7   5.8   0:00.08 vnode-write
 341761 root      20   0 8548344 408336  18568 S   2.7   5.8   0:00.08 vnode-write
 341839 root      20   0 8548344 408336  18568 S   2.7   5.8   0:00.08 vnode-write
 341536 root      20   0 8548344 408336  18568 S   2.0   5.8   0:00.08 trans-svr-work
 341537 root      20   0 8548344 408336  18568 S   1.7   5.8   0:00.12 trans-svr-work
 341539 root      20   0 8548344 408336  18568 S   1.7   5.8   0:00.08 trans-svr-work
 341538 root      20   0 8548344 408336  18568 S   1.3   5.8   0:00.07 trans-svr-work
 341535 root      20   0 8548344 408336  18568 S   0.3   5.8   0:00.32 log
 341681 root      20   0 8548344 408336  18568 S   0.3   5.8   0:00.01 vnode-fetch
 341703 root      20   0 8548344 408336  18568 S   0.3   5.8   0:00.01 vnode-fetch
 341721 root      20   0 8548344 408336  18568 S   0.3   5.8   0:00.01 vnode-fetch
 341534 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.09 taosd
 341540 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.00 trans-accept
 341541 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.00 dnode-cli
 341542 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.00 dnode-cli
 341543 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.37 tmr-taskQ
 341544 root      20   0 8548344 408336  18568 S   0.0   5.8   0:01.63 tmr
 341545 root      20   0 8548344 408336  18568 S   0.0   5.8   0:00.04 dnode-sta-cli                                                             

top -Hp 341534 of second run with write speed of 12.0M/s:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 341758 root      20   0 8548236 750200  18568 S   3.7  10.7   0:00.20 vnode-write
 341844 root      20   0 8548236 750200  18568 S   3.7  10.7   0:00.21 vnode-write
 341845 root      20   0 8548236 750200  18568 S   3.7  10.7   0:00.21 vnode-write
 341756 root      20   0 8548236 750200  18568 S   3.3  10.7   0:00.18 vnode-write
 341760 root      20   0 8548236 750200  18568 S   3.3  10.7   0:00.20 vnode-write
 341839 root      20   0 8548236 750200  18568 S   3.3  10.7   0:00.18 vnode-write
 341761 root      20   0 8548236 750200  18568 S   3.0  10.7   0:00.17 vnode-write
 341840 root      20   0 8548236 750200  18568 S   3.0  10.7   0:00.18 vnode-write
 341536 root      20   0 8548236 750200  18568 S   1.7  10.7   0:00.17 trans-svr-work
 341537 root      20   0 8548236 750200  18568 S   1.3  10.7   0:00.24 trans-svr-work
 341539 root      20   0 8548236 750200  18568 S   1.3  10.7   0:00.15 trans-svr-work
 341538 root      20   0 8548236 750200  18568 S   1.0  10.7   0:00.15 trans-svr-work
 341544 root      20   0 8548236 750200  18568 S   0.3  10.7   0:03.04 tmr
 341534 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.17 taosd
 341535 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.59 log
 341540 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.00 trans-accept
 341541 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.01 dnode-cli
 341542 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.01 dnode-cli
 341543 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.70 tmr-taskQ
 341545 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.09 dnode-sta-cli
 341546 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.00 dnode-sync-cli
 341547 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.00 dnode-sync-cli
 341548 root      20   0 8548236 750200  18568 S   0.0  10.7   0:00.00 dnode-mgmt                         

top -Hp 341534 of third run with write speed of 1.9M/s:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 342432 root      20   0 8564628 998.3m  18692 D  12.0  14.6   0:00.36 vnode-commit
 342433 root      20   0 8564628 998.3m  18692 D  12.0  14.6   0:00.36 vnode-commit
 342434 root      20   0 8564628 998.3m  18692 D  10.0  14.6   0:00.30 vnode-commit
 342435 root      20   0 8564628 998.3m  18692 D   8.3  14.6   0:00.25 vnode-commit
 341840 root      20   0 8564628 998.3m  18692 S   4.7  14.6   0:00.32 vnode-write
 341844 root      20   0 8564628 998.3m  18692 S   4.3  14.6   0:00.34 vnode-write
 341758 root      20   0 8564628 998.3m  18692 S   4.0  14.6   0:00.32 vnode-write
 341845 root      20   0 8564628 998.3m  18692 S   4.0  14.6   0:00.33 vnode-write
 341756 root      20   0 8564628 998.3m  18692 S   3.7  14.6   0:00.29 vnode-write
 341760 root      20   0 8564628 998.3m  18692 S   3.7  14.6   0:00.31 vnode-write
 341839 root      20   0 8564628 998.3m  18692 S   3.7  14.6   0:00.29 vnode-write
 341761 root      20   0 8564628 998.3m  18692 S   3.0  14.6   0:00.26 vnode-write
 341537 root      20   0 8564628 998.3m  18692 S   1.7  14.6   0:00.32 trans-svr-work
 341536 root      20   0 8564628 998.3m  18692 S   1.3  14.6   0:00.24 trans-svr-work
 341539 root      20   0 8564628 998.3m  18692 S   1.3  14.6   0:00.23 trans-svr-work
 341538 root      20   0 8564628 998.3m  18692 S   1.0  14.6   0:00.21 trans-svr-work
 341721 root      20   0 8564628 998.3m  18692 S   0.3  14.6   0:00.04 vnode-fetch
 341534 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.18 taosd
 341535 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.65 log
 341540 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.00 trans-accept
 341541 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.01 dnode-cli
 341542 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.01 dnode-cli
 341543 root      20   0 8564628 998.3m  18692 S   0.0  14.6   0:00.77 tmr-taskQ    

It seems that the difference is the vnode-commit threads in the third run.

Fourth run had a write speed of 7.3M/s with only one vnode-commit thread with CPU usage of 11.7%.
Fifth run had a write speed of 3.4M/s with two vnode-commit threads with 11.6% and 11.0% CPU usage.

Note that the default frequency of top is 3 seconds, so the numbers aren't very accurate especially since the first two runs are done in like 0.25s. Nonetheless, they show the relevant differences.

@moaazassali
Copy link
Author

@yu285 any updates on this performance issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related questions
Projects
None yet
Development

No branches or pull requests

2 participants