-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for an Optimization Toggle for Real-time Mode to Reduce Memory Overhead Using RingBuffer #1963
Comments
Hi @floydchenv. There are a few questions here, so let me to try to address each one. First, I am not surprised that you see lots of memory usage in GrowableArray when you enable high verbosity events in a real-time session. This is most often caused by the fact that we must keep a certain amount of data around that is used later to support analysis. For example, we must keep dynamic symbol information around if we're going to be resolving symbols for jitted code in stacks. This is over and above the ring-buffer type approach that is being used to limit how much data is kept on-hand. How many ETW events are kept around during a live session is related to how quickly you are processing the incoming events. If you are processing them slower than the incoming rate, then you'll see committed size grow. As a next step here, I would be interested to understand which GrowableArray(s) are taking up most of the memory. Also, how much as a percentage do these data structures represent of the total process committed size. On the two different libraries, there isn't really a relationship between them other than that they both can parse ETW events. They grew up separately and are maintaned by two different teams. I'm not super familiar with Microsoft.Windows.EventTracing, so I can't comment on how it compares to TraceEvent. With regard to stack handling, the Sample event will contain the IP, but the event also has a stack associated with it. The stack was captured during collection by the kernel's stack walker and saved into the trace. You should not need to walk the stack explicitly. Depending on how the trace is collected (if stack compression is enabled), you may need to do the work to capture the stack and then match it to the event. Take a look at perfview/src/TraceEvent/TraceLog.cs Line 1569 in afd33a9
To parse the ContextSwitch event, here's a pointer to the code that TraceEvent uses to parse the payload:
Hope that helps. |
Hello PerfView Team,
I am working on a UE4 game project and currently utilizing the Microsoft.Windows.EventTracing library to parse stack information corresponding to samples and ContextSwitches recorded in ETL for the game process. My approach involves classifying stack information for each frame based on timestamps, ultimately yielding stack data for all threads in each frame. The results are as follows:
Currently, we use xperf to record the necessary performance data. Here is a snippet of the parsing code we use:
We are now looking to leverage ETW's Real-time mode for on-the-fly data recording and parsing. However, we've encountered a significant issue: if we enable the collection of ContextSwitch and Dispatcher stack information in a Microsoft.Diagnostics.Tracing.TraceEvent session, we observe a rapid increase in memory usage (more than 1+ MB/s), with no signs of stabilization or decrease.
Upon investigating memory allocations with dotMemory, we noticed that most of the memory usage is concentrated in GrowableArray.
Would it be possible to implement a RingBuffer mechanism to store this data in Real-time Sessions? This feature could greatly optimize memory usage for real-time performance analysis, particularly in complex applications like ours.
The text was updated successfully, but these errors were encountered: