Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generational parser does not seem to handle GC Cause included in young event. #352

Open
Bluetopia opened this issue May 1, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@Bluetopia
Copy link

Describe the bug
When processing a line like:
5.374: [GC (Allocation Failure) [PSYoungGen: 262144K->28340K(305664K)] 262144K->28364K(1005056K), 0.0146665 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
GenerationalHeapParser.psYoungGen(GCLogTrace, String) is called and the offset (5) appears to not take into account the cause, resulting in:

May 01, 2024 10:03:19 AM com.microsoft.gctoolkit.parser.GCLogTrace getOccupancyBeforeAfterWithMemoryPoolSizeSummary
WARNING: Unable to calculate generational memory pool summary.
vert.x-eventloop-thread-0, not implemented: 5.374: [GC (Allocation Failure) [PSYoungGen: 262144K->28340K(305664K)] 262144K->28364K(1005056K), 0.0146665 secs]2024-05-01 10:03:19,806 [vert.x-eventloop-thread-0] INFO STDOUT: vert.x-eventloop-thread-0, not implemented: 5.374: [GC (Allocation Failure) [PSYoungGen: 262144K->28340K(305664K)] 262144K->28364K(1005056K), 0.0146665 secs]

with the following groups:

1: 5.374
2: null
3: null
4: 5.374
5: null
6: (Allocation Failure)
7: PSYoungGen
8: 262144
9: K
10: 28340
11: K
12: 305664
13: K
14: 262144
15: K
16: 28364
17: K
18: 1005056
19: K
20: 0.0146665

getOccupancyBeforeAfterWithMemoryPoolSizeSummary tries to convert null (5), "PSYoungGen" (7), and "K" to KByte values, which generates a NumberFormatException, and causing the notYetImplemented() logging.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
GCToolkit is able to properly parse the GC event.

Desktop (please complete the following information):

  • OS: Windows 11
  • Java 11 (Amazon Corretto)
  • GCToolkit tip (April 30, 2024)

Additional context
This is an older log file from Java 8/CMS that likely won't come up as frequently anymore, so it may be a lower priority. It's not clear if this should be a feature request instead due to the fact this triggers "Not yet implemented" logging.

@Bluetopia
Copy link
Author

Similar issues with another file, slightly different in format.

218982.552: [GC [PSYoungGen: 463514K->98518K(531328K)] 2167820K->1805185K(2239360K), 0.0933556 secs]

parallel_simple_nostamps.zip

Sample file attached.

@Bluetopia
Copy link
Author

I may have addressed this locally by changing the indexes from 5 to 8 for trace.getOccupancyBeforeAfterWithMemoryPoolSizeSummary() and from 11 to 14 for getTotalOccupancyBeforeAfterWithTotalHeapPoolSizeSummary() in the following lines in GenerationalHeapParser (new values shown):
1581:collection.add(trace.getOccupancyBeforeAfterWithMemoryPoolSizeSummary(8), getTotalOccupancyBeforeAfterWithTotalHeapPoolSizeSummary(trace, 14));
1717: collection.add(trace.getOccupancyBeforeAfterWithMemoryPoolSizeSummary(8), getTotalOccupancyBeforeAfterWithTotalHeapPoolSizeSummary(trace, 14));

I need to do some work on my end around the CLA before I can contribute, but before I get carried away, I wanted to find out whether there are any expectations of consistency between parsers around the regex groups, and if so, where this information can be found.

It's one thing for me to say that these changes resolve my specific issues, and there are no failing test cases with the maven build, but if it goes against some design decisions, then there may be different changes that are required. Also, there may still be other event definitions that are impacted that my use cases don't trigger a "not implemented" message for that still remain.

@kcpeppe
Copy link
Collaborator

kcpeppe commented Jun 8, 2024

Before I get into some details, the log line you've posted is a PSYoung which is a parallel young gen collector. It is paired with the PSOld, the parallel old gen collector The ParNew collector is a young gen collector that is paired with the CMS collector. The concurrent mode failure is a single threaded Full GC
This is an odd error. We do have tests for this so I'm not sure how this was missed. Certainly @5 would give you an NPE. While that shouldn't prevent the file from being parsed but it will mess up the event being generated.

How this works: The regex should precisely match the target line and it should not match any other line.The test for the rules make use of an array of an array of log lines. Each group of log lines in the inner arrays should match 1 rule and 1 rule only. Each individual rule is run over the entire array of arrays to ensure that it capture it's set of log lines (maybe a set of size 1) and that it doesn't capture any other set of log lines. I suspect that the log that you've pushed contains log lines that are missing from that test.

There is another set of tests that will take a GC log segment and ensure that the generated event contains the expected values. I would say that these log lines are missing from this test also. So a test failure that also should be corrected. I'll file a task

@jlittle-ptc
Copy link

Thanks Kirk. (Switching to my work account, as this is where I'll be making the changes)

I've run the tests in GenerationalHeapParserTest, but it appears that this specific case is missing. All test cases pass with or without my changes. Because of this, I've added a test case for some lines from the files I'm using, which can be seen in the difference here: main...jlittle-ptc:gctoolkit:gctoolkit-352

Also, I did some before/after runs (with debug on) for some of the files in the gclogs under preunified\ps\details and noticed the same sort of messages occurring there. The following files generate "not implemented" messages, which are not exposed by the current test suite (all tests pass):

  • 1047_2.log: 85970.989: [GC [PSYoungGen: 96879K->5527K(100480K)] 236639K->156157K(1032576K)
  • memoryleak2.log: 2011-08-07T18:30:17.711+0100: 15.973: [GC [PSYoungGen: 524160K->0K(524224K)] 2649456K->2133504K(20971456K), 0.0742930 secs]
  • 932_def.par.wd.nt.log: 0.880: [GC [PSYoungGen: 12288K->1648K(14336K)] 12288K->1648K(47104K), 0.0093244 secs]

While I haven't tested every file, I have confirmed that the not implemented lines no longer occur for the above 3 files with my changes.

I did notice that the way the tests are checking for parsed lines, there doesn't seem to be a way to check the changes for the GC Failure. The event doesn't get added to the event list, though it does get parsed successfully (NotImplemented error on Debug before, no error after) I have some questions about detecting errors that I'll post in a discussion post so the knowledge isn't buried in a bugfix.

This is the first time I've contributed back to an OSS program, so hopefully you and your team don't mind a bit of hand-holding along the way. I can put together a PR for my changes above, but wanted to make sure everything meets your expectations before I do so.

@karianna karianna added the bug Something isn't working label Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants