Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NvFBC retrieves slightly outdated images. #2472

Open
3 tasks done
hgaiser opened this issue Apr 26, 2024 · 17 comments
Open
3 tasks done

NvFBC retrieves slightly outdated images. #2472

hgaiser opened this issue Apr 26, 2024 · 17 comments

Comments

@hgaiser
Copy link
Contributor

hgaiser commented Apr 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your issue described in the documentation?

  • I have read the documentation

Is your issue present in the nightly release?

  • This issue is present in the nightly release

Describe the Bug

NvFBC has a few different methods to capture an image. The one used in Sunshine is:

  /*!
   * Capturing does not wait for a new frame nor a mouse move.
   *
   * It is therefore possible to capture the same frame multiple times.
   * When this occurs, the dwCurrentFrame parameter of the
   * NVFBC_FRAME_GRAB_INFO structure is not incremented.
   */
  NVFBC_TOSYS_GRAB_FLAGS_NOWAIT = (1 << 0),

Basically it means that whenever Sunshine requests a new frame, a frame is provided, but that frame can be "old" (max of 1/fps seconds old). Still though, it means at 60fps a frame could be 16msec old.

Changing to NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY would mean that the frame request blocks until a new frame becomes available, but it returns the frame immediately if NvFBC knows it's a new frame. This also means that when the host is serving static content, the FPS drops to 13.33FPS in my tests (not sure why this amount exactly).

  /*!
   * Similar to NVFBC_TOCUDA_GRAB_FLAGS_NOFLAGS, except that the capture will
   * not wait if there is already a frame available that the client has
   * never seen yet.
   */
  NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY = (1 << 2),

And for context (since that flag basically extends the NOFLAGS flag) :

  /*!
   * Default, capturing waits for a new frame or mouse move.
   *
   * The default behavior of blocking grabs is to wait for a new frame until
   * after the call was made.  But it's possible that there is a frame already
   * ready that the client hasn't seen.
   * \see NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
   */
  NVFBC_TOCUDA_GRAB_FLAGS_NOFLAGS = 0,

As far as I can see, we can simply use NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY instead of NVFBC_TOSYS_GRAB_FLAGS_NOWAIT. This wait becomes kinda redundant, but it won't hurt either.

Expected Behavior

N/A

Additional Context

N/A

Host Operating System

Linux

Operating System Version

Arch Linux

Architecture

64 bit

Sunshine commit or version

7fb8c76

Package

other (self built)

GPU Type

Nvidia

GPU Model

GeForce RTX 3090

GPU Driver/Mesa Version

550.76

Capture Method (Linux Only)

NvFBC

Config

N/A

Apps

N/A

Relevant log output

N/A

@cgutman
Copy link
Contributor

cgutman commented Apr 30, 2024

Does that work properly in the case where the host display's refresh rate is higher than the stream frame rate? IIUC, in that case, we could potentially capture and encode more frames per second than the client is expecting because there will often be a frame ready when we expect to be sleeping until the next frame is due.

@hgaiser
Copy link
Contributor Author

hgaiser commented May 1, 2024

We are asking NvFBC to sample at the requested framerate, so that isn't an issue:

capture_params.dwSamplingRateMs = 1000 /* ms */ / config.framerate;

@gschintgen
Copy link
Contributor

We are asking NvFBC to sample at the requested framerate, so that isn't an issue:

capture_params.dwSamplingRateMs = 1000 /* ms */ / config.framerate;

I think that's a red herring. I had a look at this for #2333 and also thought that was the crucial line but it's rather this one:

next_frame += delay;

I.e. we manually capture at the configured rate. The other line is probably just to put the capture feature in a reasonable state.

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

Did you test that with the current NOWAIT flag? Because in that case the call to NvFBC indeed does not wait. The value is presumably still important for the internal loop of NvFBC, but not for the FPS that Sunshine maintains.

Which raises an interesting issue. If NvFBC updates internally every 16msec and Sunshine always waits exactly 16.666msec between frames, then every 24 (16 / 0.666) frames it theoretically misses a frame. If that is true, it would be better to only let NvFBC do the sleeping (even though it can only do 16msec and not 16.666msec, so you'd get 62.5Hz).

@gschintgen
Copy link
Contributor

Did you test that with the current NOWAIT flag? Because in that case the call to NvFBC indeed does not wait. The value is presumably still important for the internal loop of NvFBC, but not for the FPS that Sunshine maintains.

Which raises an interesting issue. If NvFBC updates internally every 16msec and Sunshine always waits exactly 16.666msec between frames, then every 24 (16 / 0.666) frames it theoretically misses a frame. If that is true, it would be better to only let NvFBC do the sleeping (even though it can only do 16msec and not 16.666msec, so you'd get 62.5Hz).

You’re right I did miss the last (but crucial) part of your initial post, i.e. the part where you point out the “manual” waiting done by Sunshine. Anyway, I only had superficial contact with this detail of the code base since I don’t have nvidia hardware and it came up in the mentioned PR discussion.

I guess it all depends on what exactly NvFBC does with its 16.000 ms sampling rate:
a) It captures at 62.5Hz (i.e. the theoreticalnext_frame = previous + delay like currently implemented in Sunshine’s logic), which would probably lead to pacing issues due to the rate mismatch. But that sounds more like the NOWAIT approach.
b) It captures a frame, then waits for 16.000ms and then blocks to wait for the next frame which will hopefully come at the 16.667 timepoint (due to a framelimiter). With perfect render timing that should theoretically lead to the best results (precise 60fps, no latency). But in that case I do think that 16ms is too long (rendering will still be subject to jitter) and frames will be missed.

Anyway I just wanted to give a heads up, but missed that you correctly saw sunshine’s own waiting.

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

I guess it all depends on what exactly NvFBC does with its 16.000 ms sampling rate: a) It captures at 62.5Hz (i.e. the theoreticalnext_frame = previous + delay like currently implemented in Sunshine’s logic), which would probably lead to pacing issues due to the rate mismatch. But that sounds more like the NOWAIT approach. b) It captures a frame, then waits for 16.000ms and then blocks to wait for the next frame which will hopefully come at the 16.667 timepoint (due to a framelimiter). With perfect render timing that should theoretically lead to the best results (precise 60fps, no latency). But in that case I do think that 16ms is too long (rendering will still be subject to jitter) and frames will be missed.

It seems like your option a) is correct. I tested this in moonshine (similar to sunshine, but only has NvFBC + NVENC), where I rely on NvFBC to block until a new frame arrives. According to moonlight I get roughly 62.5Hz if I stream dynamic content (in this case I just moved the cursor a lot) :

image

I guess sunshine waits to achieve the accurate 60Hz framerate, but by doing so, it skips a frame every 24 frames.

Anyway I just wanted to give a heads up, but missed that you correctly saw sunshine’s own waiting.

No worries! Appreciate it :).

@gschintgen
Copy link
Contributor

Out of curiosity (since I'm not familiar with the inner workings of hardware pointers and such): Did you try what happens to the framerate as reported by Moonlight if you stream, let's say, an animated title screen of a game, framecapped at 60.00Hz, but without any mouse input?

  /*!
   * Default, capturing waits for a new frame or mouse move.
   *
   * The default behavior of blocking grabs is to wait for a new frame until
   * after the call was made.  But it's possible that there is a frame already
   * ready that the client hasn't seen.
   * \see NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
   */
  NVFBC_TOCUDA_GRAB_FLAGS_NOFLAGS = 0,

If taken literally the framerate should then drop to 60fps. Maybe it's "just" the mouse input (e.g. 1000Hz gaming mouse) that messes with the capture timing. Is it possible to hide mouse input from NvFBC?

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

I wasn't entirely sure what would happen either, but it seems to still stream at 62.5Hz if I don't move the mouse in a game (even if the game I was streaming shows a framerate of a steady 60Hz according to Steam FPS overlay).

@gschintgen
Copy link
Contributor

Ok, so it seems indeed like NvFBC calls its blocking image capture at a precise 16.00 ms and most of the time it will find a frame that it has not yet shown, so it immediately returns that one. Until another 24 or 25 second period is over and the difference accumulates to a whole frame time interval.

So the ideal capture routine for frame-capped (e.g. 60fps) content (which is not really under our control...) would be:

  1. run a first blocking capture for a single frame
  2. sleep for e.g. 80% of the theoretical frametime in order to avoid capturing way too many frames in case of uncapped rendering
  3. emit another blocking capture for a single frame, which will come after ~16.7 ms. But without any latency between rendering and capturing.
  4. goto 2

Except that it will break down if the game is running uncapped and step 2 will capture too early and too often.

Isn't this then similar to the problem that the Windows side of Sunshine has to deal with when it is using the Desktop Duplication API (which I don't know anything about except the few tidbits I picked up here and there)?
I saw some pacing code in there...

// Try to continue frame pacing group, snapshot() is called with zero timeout after waiting for client frame interval
if (frame_pacing_group_start) {
const uint32_t seconds = (uint64_t) frame_pacing_group_frames * client_frame_rate_adjusted.Denominator / client_frame_rate_adjusted.Numerator;
const uint32_t remainder = (uint64_t) frame_pacing_group_frames * client_frame_rate_adjusted.Denominator % client_frame_rate_adjusted.Numerator;
const auto sleep_target = *frame_pacing_group_start +
std::chrono::nanoseconds(1s) * seconds +
std::chrono::nanoseconds(1s) * remainder / client_frame_rate_adjusted.Numerator;
const auto sleep_period = sleep_target - std::chrono::steady_clock::now();
if (sleep_period <= 0ns) {
// We missed next frame time, invalidating current frame pacing group
frame_pacing_group_start = std::nullopt;
frame_pacing_group_frames = 0;
status = capture_e::timeout;
}
else {
high_precision_sleep(sleep_period);
std::chrono::nanoseconds overshoot_ns = std::chrono::steady_clock::now() - sleep_target;
log_sleep_overshoot(overshoot_ns);
status = snapshot(pull_free_image_cb, img_out, 0ms, *cursor);
if (status == capture_e::ok && img_out) {
frame_pacing_group_frames += 1;
}
else {
frame_pacing_group_start = std::nullopt;
frame_pacing_group_frames = 0;
}
}
}

(I did not try to analyze or understand this code in detail.)

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

Why would the second frame come after 16.7msec? I would expect the internal functioning of NvFBC to always retrieve frames after 16msec intervals, so if you wait for 80%, say exactly 13msec, that the next blocking call would return after another exactly 3msec.

@gschintgen
Copy link
Contributor

Right, there's also the question what "blocking" actually means:
a) Blocking until the timer is up.
b) Blocking until a new frame is available.
I suppose it means b). Isn't that the point of blocking vs NOWAIT?

If the game is running at 60fps / 16.667ms, then the blocking call executed 13ms after the previous capture would then wait until a new frame has been rendered and is ready to be captured. And those frames should be emitted at an interval of 16.666ms due to the framecap.

@gschintgen
Copy link
Contributor

I think it all depends on when the timer will trigger next:

  • precisely x ms after the previous theoretical frame instant, independently on when frames are incoming
  • x ms after the previous capture (and then the capture will block until the 16.666ms are up)

It seems like NvFBC is doing the former, while the latter would be what's needed.

Unfortunately I can't test any of this. (I'm on Intel & AMD)

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

My suspicion is that NvFBC is running an internal loop, separate of the frame generation loop, which polls the latest frame after every dwSamplingRateMs milliseconds. I ran the following code with nvfbc-rs:

	capturer.start(BufferFormat::Rgb, 60)?;

	// In case it needs to warm up or initialize something.
	for _ in 0..10 {
		capturer.next_frame(CaptureMethod::Blocking)?;
	}

	let now = std::time::Instant::now();
	for _ in 0..100 {
		std::thread::sleep(std::time::Duration::from_millis(13));
		capturer.next_frame(CaptureMethod::Blocking)?;
	}
	println!("{}", now.elapsed().as_micros() / 100);

Meaning we set the framerate to 60Hz (which is used to set the sampling rate to 16 msec), sleep 13msec and then wait for a next frame in a blocking manner. Repeat this 100 times and get the average time waited on new frames. I am getting this timing:

16033

Without the 13msec sleep I also get 16016, roughly the same amount.

@gschintgen
Copy link
Contributor

Hm, but somehow it must be possible to make NvFBC actually wait for a new frame:

Default, capturing waits for a new frame or mouse move.
     * When using blocking calls each captured frame will have
     * this flag set to NVFBC_TRUE since the blocking mechanism waits for
     * the display server to render a new frame.
      NVFBC_BOOL bIsNewFrame;
     * The default behavior of blocking grabs is to wait for a new frame until
     * after the call was made.  But it's possible that there is a frame already
     * ready that the client hasn't seen.
     * \see NVFBC_TOSYS_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY

At least with NVFBC_TOSYS_GRAB_FLAGS_NOFLAGS the call should always be blocking, no? That should then drag out the interval to the 16.666ms (supposing a framecap on the game as always).

With NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY we could have the same behavior than with the simple NOWAIT? I.e. every time the 16.00ms timer fires it detects that a new frame is already there, it emits it, and 16.00 ms later it again detects that a new frame is immediately ready, etc. until 24-25 seconds are up.

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

Hm, but somehow it must be possible to make NvFBC actually wait for a new frame:

Default, capturing waits for a new frame or mouse move.
     * When using blocking calls each captured frame will have
     * this flag set to NVFBC_TRUE since the blocking mechanism waits for
     * the display server to render a new frame.
      NVFBC_BOOL bIsNewFrame;
     * The default behavior of blocking grabs is to wait for a new frame until
     * after the call was made.  But it's possible that there is a frame already
     * ready that the client hasn't seen.
     * \see NVFBC_TOSYS_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY

At least with NVFBC_TOSYS_GRAB_FLAGS_NOFLAGS the call should always be blocking, no? That should then drag out the interval to the 16.666ms (supposing a framecap on the game as always).

With NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY we could have the same behavior than with the simple NOWAIT? I.e. every time the 16.00ms timer fires it detects that a new frame is already there, it emits it, and 16.00 ms later it again detects that a new frame is immediately ready, etc. until 24-25 seconds are up.

Ah sorry, I had renamed NOFLAGS to Blocking in https://github.com/hgaiser/nvfbc-rs/blob/main/nvfbc/src/cuda.rs#L41 . So I was using NOFLAGS, which still waits approximately 16msec, not 16.666msec.

@gschintgen
Copy link
Contributor

That's weird. (And I'm out of ideas.)

@hgaiser
Copy link
Contributor Author

hgaiser commented May 6, 2024

That's weird. (And I'm out of ideas.)

That's okay, we tried ;)

A mystery for another day 🪄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants