Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cloud Security] Refactor D4C metering function #183814

Merged
merged 18 commits into from
May 30, 2024

Conversation

CohenIdo
Copy link
Contributor

@CohenIdo CohenIdo commented May 20, 2024

Summary

  • Refactor the Cloud Defend metering to sample each agent separately and do so once per hour.

@CohenIdo CohenIdo added release_note:skip Skip the PR/issue when compiling release notes Team:Cloud Security Cloud Security team related v8.14.0 labels May 20, 2024
@CohenIdo CohenIdo requested a review from a team as a code owner May 20, 2024 10:45
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-cloud-security-posture (Team:Cloud Security)

@CohenIdo CohenIdo marked this pull request as draft May 20, 2024 10:45
@CohenIdo CohenIdo added the ci:project-deploy-security Create a Security Project label May 20, 2024
@CohenIdo
Copy link
Contributor Author

@elasticmachine merge upstream

@CohenIdo CohenIdo marked this pull request as ready for review May 20, 2024 10:49
@CohenIdo CohenIdo added backport:skip This commit does not require backporting and removed v8.14.0 labels May 20, 2024
@CohenIdo
Copy link
Contributor Author

@elasticmachine merge upstream

@CohenIdo CohenIdo marked this pull request as draft May 22, 2024 05:13
@CohenIdo
Copy link
Contributor Author

@elasticmachine merge upstream

@CohenIdo CohenIdo requested a review from kfirpeled May 22, 2024 11:53
@CohenIdo CohenIdo requested a review from joeypoon May 22, 2024 11:54
@CohenIdo CohenIdo marked this pull request as ready for review May 22, 2024 11:54
@CohenIdo CohenIdo changed the title [Cloud Security] update d4c metering function [Cloud Security] Refactor D4C metering function May 22, 2024
{
index: CLOUD_DEFEND_HEARTBEAT_INDEX,
sort: 'event.ingested',
query: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need an explicit size param here otherwise it will only pull 10 docs per run and miss the other heartbeats.

@@ -25,20 +26,31 @@ export const cloudSecurityMetringCallback = async ({
const tier: Tier = getCloudProductTier(config, logger);

try {
const cloudSecuritySolutions: CloudSecuritySolutions[] = [CSPM, KSPM, CNVM, CLOUD_DEFEND];
const cloudSecuritySolutions: CloudSecuritySolutions[] = [CLOUD_DEFEND];

const promiseResults = await Promise.allSettled(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to additionally return a shouldRunAgain and latestTimestamp along with the records so that if there are more heartbeat documents, the task will run again immediately to pick up the rest of the heartbeats.

Copy link
Contributor Author

@CohenIdo CohenIdo May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeypoon It's great input,
here's how I addressed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your approach makes sense to me so I went ahead and approved. Something to keep in mind with this approach is that if there is a very large amount of heartbeats the task might potentially run into the 1 minute timeout.

@CohenIdo CohenIdo requested a review from joeypoon May 23, 2024 14:29
@kfirpeled kfirpeled requested a review from eyalkraft May 23, 2024 15:07
@CohenIdo
Copy link
Contributor Author

@elasticmachine merge upstream

};
}

const buildMeteringRecord = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const buildMeteringRecord = (
const buildUsageRecord = (

);
};

export const getCloudDefendUsageRecord = async ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export const getCloudDefendUsageRecord = async ({
export const getCloudDefendUsageRecords = async ({

if (usageRecords.hits.hits.length < BATCH_SIZE) {
fetchMore = false;
} else {
searchAfter = usageRecords.hits.hits[usageRecords.hits.hits.length - 1].sort;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this return the latest timestamp from the previous batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep:)

timestamp.setMilliseconds(0);

const usageRecord = {
id: `container-defend-${agentId}-${timestamp.toISOString()}`,
Copy link
Contributor

@eyalkraft eyalkraft May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understood correctly - we do create a usage, record record per heartbeat, but we trust the id uniqueness to squash them into a single record every half an hour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent sends a heartbeat every 30 minutes. We want to count records with an hourly granularity, so every 2 records will have the same record ID. Later, in the glue function, they will be merged into one.

@CohenIdo
Copy link
Contributor Author

@elasticmachine merge upstream

@@ -153,7 +153,7 @@ export class SecurityUsageReportingTask {
return { state: taskInstance.state, runAt: new Date() };
}

this.logger.debug(`received usage records: ${JSON.stringify(usageRecords)}`);
this.logger.warn(`received usage records: ${JSON.stringify(usageRecords)}`); // TODO: change to info before merge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CohenIdo this is still warn

@kibana-ci
Copy link
Collaborator

kibana-ci commented May 29, 2024

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5412 +5412
total size - 8.8MB +8.8MB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@CohenIdo CohenIdo merged commit fb12588 into elastic:main May 30, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-security Create a Security Project release_note:skip Skip the PR/issue when compiling release notes Team:Cloud Security Cloud Security team related v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants