Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blogpost on "monitoring FerretDB performance using Coroot" #4279

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Fashander
Copy link
Member

Description

Closes FerretDB/engineering#168.

Readiness checklist

  • I added/updated unit tests (and they pass).
  • I added/updated integration/compatibility tests (and they pass).
  • I added/updated comments and checked rendering.
  • I made spot refactorings.
  • I updated user documentation.
  • I ran task all, and it passed.
  • I ensured that PR title is good enough for the changelog.
  • (for maintainers only) I set Reviewers (@FerretDB/core), Milestone (Next), Labels, Project and project's Sprint fields.
  • I marked all done items in this checklist.

@Fashander Fashander added the blog/marketing Marketing (and releases) blog posts label May 9, 2024
@Fashander Fashander added this to the v1.22.0 milestone May 9, 2024
@Fashander Fashander requested a review from a team May 9, 2024 05:04
@Fashander Fashander self-assigned this May 9, 2024
@Fashander Fashander enabled auto-merge (squash) May 9, 2024 05:04
Copy link
Contributor

mergify bot commented May 9, 2024

Marketing blog posts should be reviewed by @ptrfarkas and @AlekSi.

Copy link

codecov bot commented May 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.76%. Comparing base (798e667) to head (a4dc206).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4279      +/-   ##
==========================================
- Coverage   75.26%   74.76%   -0.50%     
==========================================
  Files         326      326              
  Lines       22471    22471              
==========================================
- Hits        16913    16801     -112     
- Misses       4337     4436      +99     
- Partials     1221     1234      +13     

see 22 files with indirect coverage changes

Flag Coverage Δ
filter-false ?
filter-true 68.20% <ø> (-0.77%) ⬇️
hana-1 13.96% <ø> (-0.02%) ⬇️
integration 68.20% <ø> (-0.81%) ⬇️
mongodb-1 5.09% <ø> (-0.01%) ⬇️
mysql-1 ?
mysql-2 ?
mysql-3 ?
postgresql-1 44.61% <ø> (-8.52%) ⬇️
postgresql-2 44.77% <ø> (-9.65%) ⬇️
postgresql-3 41.48% <ø> (-12.02%) ⬇️
postgresql-4 41.07% <ø> (ø)
postgresql-5 43.24% <ø> (-0.02%) ⬇️
sqlite-1 43.72% <ø> (-8.37%) ⬇️
sqlite-2 44.08% <ø> (-9.38%) ⬇️
sqlite-3 40.72% <ø> (-11.95%) ⬇️
sqlite-4 40.42% <ø> (-0.06%) ⬇️
sqlite-5 42.37% <ø> (-0.02%) ⬇️
unit 33.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@Fashander Fashander added the trust PRs that can access Actions secrets label May 9, 2024
Comment on lines 13 to 19
Effective real-time monitoring is a critical aspect of any infrastructure.
[Coroot](https://coroot.com/) is an open source observability platform that can provide real-time monitoring and visibility into a [FerretDB](https://www.ferretdb.com/) setup.

<!--truncate-->

Effective real-time monitoring is a critical aspect of any infrastructure.
[Coroot](https://coroot.com/) is an open source observability platform that can provide real-time monitoring and visibility into a [FerretDB](https://www.ferretdb.com/) setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we repeat it twice?

CleanShot 2024-05-14 at 21 07 02@2x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one wasn't solved

## Setting up Coroot for FerretDB monitoring

Since Coroot uses eBPF, you need the right environment before setting it up.
The most recent versions of the Linux kernel (v 4.16 and above) should be compatible since they offer at least minimal eBPF support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The most recent versions of the Linux kernel (v 4.16 and above) should be compatible since they offer at least minimal eBPF support.
The most recent versions of the Linux kernel (v4.16 and above) should be compatible since they offer at least minimal eBPF support.

Comment on lines 137 to 140
The Coroot dashboard provides the full details on all components.

At first glance, we can see a memory leak on the `ferretdb` and `postgres` databases.
That suggests that allocated memory is not being efficiently reused or deallocated, causing the total memory usage to grow progressively as the services operate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… so we want to publish a blog post that casually mentions that FerretDB and PostgreSQL have memory leaks just like that?

Comment on lines 165 to 167
Using distributed tracing, Coroot provides a heat map showing operation requests, their status, durations, and details.

![Latency](/img/blog/ferretdb-coroot/07-latency.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we write something that has nothing to do with FerretDB?

Comment on lines 169 to 171
The above image shows how response time for the `ferretdb` increased progressively over time.
It shows that the system takes a long time to handle queries.
That should prompt us to take additional measures to improve performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wat

@Fashander Fashander requested a review from AlekSi June 3, 2024 09:59
@AlekSi AlekSi had a problem deploying to cloudflare-dev-blog June 3, 2024 14:38 — with GitHub Actions Failure
Copy link
Member

@AlekSi AlekSi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build fails. Not much I could review.

website/static/img/blog/.DS_Store Outdated Show resolved Hide resolved
You also get CPU, memory, storage, network, and log management metrics.

To get started with FerretDB, [see our documentation](https://docs.ferretdb.io/).
And if you want to contact the team for help or have any questions, [contact us on Slack](https://join.slack.com/t/ferretdb/shared_invite/zt-zqe9hj8g-ZcMG3~5Cs5u9uuOPnZB8~A).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about many times before, we should not use that link because it may change. We should link to the community section in our docs or in README

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use `kebab-case-with-dashes` instead of `snake_case_with_underscores` or spaces

Alex, it is your responsibility to enforce that guide. And you are not even following it yourself.

@AlekSi AlekSi added not ready Issues that are not ready to be worked on; PRs that should skip CI and removed not ready Issues that are not ready to be worked on; PRs that should skip CI labels Jun 4, 2024
Copy link
Member

@noisersup noisersup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions from me, apart from that looks great!


Setting up effective monitoring for your entire system can be resource-intensive, time-consuming, and expensive.
Imagine a scenario where your FerretDB instance is deployed as a Docker container along with other components and services.
Observability into all critical areas of the system in as little time is vital.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the whole intro needs some short self-explanation. Why typical monitoring can be resource-intensive, time-consuming, and expensive?

We could potentially expand the "Imagine a scenario..." section to show the potential issues "in the practice", and then jump to the part how relying on eBPF solves that :)

If I can help in any way we could sync on this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

…oot.md

Co-authored-by: Patryk Kwiatek <patryk@kwiatek.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blog/marketing Marketing (and releases) blog posts trust PRs that can access Actions secrets
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants