Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pyroscope integration for continuous profiling on demand #4254

Merged
merged 34 commits into from
May 29, 2024

Conversation

KShivendu
Copy link
Member

@KShivendu KShivendu commented May 17, 2024

This will allow us to understand the bottlenecks (through chaos testing) and debug customer nodes when required.

Steps to run:

  • Prepare yaml config
    # config/pyroscope.yaml
    debugger:
      pyroscope:
        url: http://localhost:4040
        identifier: yaml
        sampling_rate: 100
  • Run these commands:
    cargo build
    docker run --rm --name pyroscope --network=host grafana/pyroscope:latest
    ./target/debug/qdrant --config-path config/pyroscope.yaml
  • Open localhost:4040 in browser
  • If required, pass process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="qdrant"} as the query to see only Qdrant profiling results

Alternatively you may call the API:

# enable/alter:
PATCH /debugger
{
   "pyroscope": {
      "url": "http://localhost:4040",
      "identifier": "api",
      "sampling_rate": 100
    }
}
# disable:
PATCH /debugger
{
   "pyroscope": null
}

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@KShivendu KShivendu changed the title Draft: Introduce pyroscope integration for continuous profiling on demand WIP: Introduce pyroscope integration for continuous profiling on demand May 17, 2024
@generall generall marked this pull request as draft May 17, 2024 11:05
@KShivendu
Copy link
Member Author

I personally don't like the idea of attaching PyroscopeState to actix server. It should be independent. Wdyt?

Also, it looks like the pyroscope server sometimes fails to shutdown if I call any API (say /debug) before it's completely initiated. Any suggestions to debug the same?

@KShivendu KShivendu marked this pull request as ready for review May 21, 2024 14:51
@KShivendu KShivendu requested a review from generall May 21, 2024 14:53
@KShivendu KShivendu changed the title WIP: Introduce pyroscope integration for continuous profiling on demand Introduce pyroscope integration for continuous profiling on demand May 21, 2024
Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help if you could also include an instruction of how to enable profiler and see the actual flamegraph

src/actix/api/debug_api.rs Outdated Show resolved Hide resolved
src/actix/api/debug_api.rs Outdated Show resolved Hide resolved
src/actix/api/debug_api.rs Outdated Show resolved Hide resolved
src/common/pyroscope_state.rs Show resolved Hide resolved
src/common/pyroscope_state.rs Outdated Show resolved Hide resolved
src/common/pyroscope_state.rs Outdated Show resolved Hide resolved
src/common/pyroscope_state.rs Outdated Show resolved Hide resolved
src/common/pyroscope_state.rs Outdated Show resolved Hide resolved
src/common/pyroscope_state.rs Outdated Show resolved Hide resolved
src/settings.rs Outdated Show resolved Hide resolved
@generall
Copy link
Member

When I try to run with config in settings, i get this:


2024-05-21T23:16:51.102375Z ERROR qdrant::startup: Panic backtrace: 
   0: qdrant::startup::setup_panic_hook::{{closure}}
             at ./src/startup.rs:19:25
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2034:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:783:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:649:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
   5: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   6: core::panicking::panic_nounwind_fmt::runtime
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:110:18
   7: core::panicking::panic_nounwind_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:123:9
   8: core::panicking::panic_nounwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:156:5
   9: core::slice::raw::from_raw_parts::precondition_check
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/intrinsics.rs:2799:21
  10: core::slice::raw::from_raw_parts
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/slice/raw.rs:98:9
  11: <pprof::collector::TempFdArrayIterator<T> as core::iter::traits::iterator::Iterator>::next
             at /home/generall/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/collector.rs:225:26
  12: core::iter::traits::iterator::Iterator::fold
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:2586:29
  13: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/adapters/chain.rs:93:19
  14: core::iter::traits::iterator::Iterator::for_each
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:817:9
  15: pprof::report::ReportBuilder::build
             at /home/generall/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/report.rs:110:17
  16: pyroscope_pprofrs::Pprof::dump_report
             at /home/generall/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:202:22
  17: <pyroscope_pprofrs::Pprof as pyroscope::backend::backend::Backend>::report
             at /home/generall/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:159:9
  18: pyroscope::pyroscope::PyroscopeAgent<pyroscope::pyroscope::PyroscopeAgentReady>::start::{{closure}}
             at /home/generall/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope-0.5.7/src/pyroscope.rs:676:38
  19: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  20: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17
  21: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
  22: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  23: __rust_try
  24: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  25: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  26: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30
  27: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  29: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  30: std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17
  31: <unknown>
  32: <unknown>
    
2024-05-21T23:16:51.102457Z ERROR qdrant::startup: Panic occurred in file library/core/src/panicking.rs at line 156: unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`    

Which doesn't look too good. And it is also not related to this: grafana/pyroscope-rs#165, which is also doesn't look good.

src/common/debug.rs Outdated Show resolved Hide resolved
src/common/debug.rs Outdated Show resolved Hide resolved
@KShivendu
Copy link
Member Author

KShivendu commented May 27, 2024

TLS tests seem to be failing because we already use debug: true in tls_config.yaml (But we just (re)defined debug).

See this. Is this debug actually doing something? Because it isn't a part of the current Settings struct.

src/common/debug.rs Outdated Show resolved Hide resolved
@timvisee
Copy link
Member

timvisee commented May 27, 2024

TLS tests seem to be failing bcecause we already use debug: true in tls_config.yaml (But we just (re)defined debug).

See this. Is this debug actually doing something? Because it isn't a part of the current Settings struct.

IIRC we used it before, but deprecated it some time ago. That would mean you can safely remove it.

One problem though is that users might still have debug: true in their config. I think we should try to prevent a crash on start if this is the case, to stay compatible with old configuration files.

@timvisee
Copy link
Member

timvisee commented May 27, 2024

Sadly, this still crashes on my machine when enabling pyroscope through PATCH /debug:

2024-05-27T08:14:57.765853Z  INFO actix_web::middleware::logger: 127.0.0.1 "PATCH /debug HTTP/1.1" 200 53 "http://localhost:6333/dashboard" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" 2.632543
2024-05-27T08:15:00.390173Z ERROR qdrant::startup: Panic backtrace:
   0: qdrant::startup::setup_panic_hook::{{closure}}
             at ./src/startup.rs:19:25
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2034:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:783:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:649:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
   5: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   6: core::panicking::panic_nounwind_fmt::runtime
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:110:18
   7: core::panicking::panic_nounwind_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:123:9
   8: core::panicking::panic_nounwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:156:5
   9: core::slice::raw::from_raw_parts::precondition_check
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/intrinsics.rs:2799:21
  10: core::slice::raw::from_raw_parts
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/slice/raw.rs:98:9
  11: <pprof::collector::TempFdArrayIterator<T> as core::iter::traits::iterator::Iterator>::next
             at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/collector.rs:225:26
  12: core::iter::traits::iterator::Iterator::fold
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:2586:29
  13: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/adapters/chain.rs:93:19
  14: core::iter::traits::iterator::Iterator::for_each
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/iter/traits/iterator.rs:817:9
  15: pprof::report::ReportBuilder::build
             at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pprof-0.12.1/src/report.rs:110:17
  16: pyroscope_pprofrs::Pprof::dump_report
             at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:202:22
  17: <pyroscope_pprofrs::Pprof as pyroscope::backend::backend::Backend>::report
             at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope_pprofrs-0.2.7/src/lib.rs:159:9
  18: pyroscope::pyroscope::PyroscopeAgent<pyroscope::pyroscope::PyroscopeAgentReady>::start::{{closure}}
             at /home/timvisee/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyroscope-0.5.7/src/pyroscope.rs:676:38
  19: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  20: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17
  21: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
  22: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  23: __rust_try
  24: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  25: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  26: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30
  27: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  28: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  29: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
  30: std::sys::pal::unix::thread::Thread::new::thread_start
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17
  31: start_thread
             at ./nptl/pthread_create.c:447:8
  32: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

2024-05-27T08:15:00.390249Z ERROR qdrant::startup: Panic occurred in file library/core/src/panicking.rs at line 156: unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX` 
2024-05-27T08:15:00.398996Z DEBUG reqwest::connect: starting new connection: https://staging-telemetry.qdrant.io/
thread caused non-unwinding panic. aborting.
fish: Job 1, 'cargo run $argv' terminated by signal SIGABRT (Abort)

Update: it only crashes when running the binary directly, it is fine when running through our default Docker image. I'm assuming some system library mismatch.

@KShivendu KShivendu requested a review from generall May 27, 2024 19:38
@KShivendu KShivendu merged commit 15de581 into dev May 29, 2024
18 checks passed
@KShivendu KShivendu deleted the feat/pyroscope-profiler branch May 29, 2024 11:27
timvisee added a commit that referenced this pull request Jun 11, 2024
* feat: Working pyroscope profiler integration

* fix: Remove UpdateDebugConfigResponse

* fix: Format

* fix: Simplify debug settings type

* fix: Remove /debug from API spec and clean pyrostate state

* fix: Update openapi.json

* refactor: Move models to suitable paths

* refactor: Improve API and pyroscope state structure

* fix: Format

* feat: Use null debug for pyroscope config

* fix: Pyroscope shouldnt be used and enabled in windows

* fix: Format

* fix: Remove pyroscope for macos

* fix: Add more compile conditions

* fix: Try to fix OS target issue

* feat: Move DebugConfig and Pyroscope to common::debug

* feat: Introduce and use DebugState instead of PyroscopeState

* feat: Simplify debug_api.rs and move logic to debug.rs

* fix: format

* fix: Use debug patch enum

* feat: Use parking lot Mutex

* fix: Propagate errors instead of panic

* fix: Format

* feat: Forward stop agent errors

* feat: Use enums

* fix: For non linux OS

* fix: Missing import for non linux os

* fix: Remove redundant logs

* feat: Take lock throughout the patch

* Don't unwrap pyroscope state, it may be None

* fix: Remove debug: true from tls config tests

* refactor: Rename debug setting to debugger

* refactor: Rename debug to debugger everywhere

* Improve logs

---------

Co-authored-by: timvisee <tim@visee.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants