Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Input Lag reduction settings to appropriate cores #3134

Open
dankcushions opened this issue May 10, 2020 · 10 comments
Open

Apply Input Lag reduction settings to appropriate cores #3134

dankcushions opened this issue May 10, 2020 · 10 comments

Comments

@dankcushions
Copy link
Member

dankcushions commented May 10, 2020

The Pi4 now has enough overhead to apply some of the more aggressive input lag tweaks, some of which we previously warned against here. I would like to investigate these settings with a view to perhaps defaulting them where appropriate, or to provide a function to easily enable them for suitable cores. Said settings could also be applied to x86 defaults (no idea about Odroid, etc).

Known input lag reduction settings:

video_threaded = "false"

turning off threaded video seems to reduce input lag by ~1 frame, at the cost of performance. source: https://forums.libretro.com/t/an-input-lag-investigation/4407 (this huge thread has lots of great stuff on input lag)

run_ahead_enabled = "true"
run_ahead_frames = "1"

enables run-ahead and sets the frames to 1. run-ahead explained here. saves 1 frame of input lag. costs memory/cpu. 1 frame is an accepted safe amount of frames for ALL games on ALL cores (?). some games can suit larger amounts of frames, but that is not enough to work with.

run_ahead_secondary_instance = "true"

creates a secondary instance of the emulator. i have no idea if this setting is necessary, but i have always had it on. does anyone know why you would turn it off?
UPDATE: shouldn't be necessary unless you encounter issues, so keep it off - thanks @barbudreadmon !

video_max_swapchain_images = 2

there's a great explanation of this somewhere here but i have since lost it! saves 1 frame, at the cost of performance (?). @Brunnis mentioned this can cause tearing here, but i have not noticed this. it requires cpu to be set to 'performance' rather than the 'default' governor to avoid issues (see here)
Causes intermittent tearing on fkms, even with governer = perf.

Any more settings?

SNES
Tested in lr-snes9x (so presumably same settings could be applied to all older lr-snes9x-YYYY versions). 40-50% CPU in the random games i tested. Does anyone have any savestates for the later levels of Yoshi's island? they're supposed to be some of the most intense stuff on SNES.

(please help testing on more systems!)

tagging @Brunnis

@cmitu
Copy link
Contributor

cmitu commented May 10, 2020

An explanation for how the 2nd instance works (taken from https://i.reddit.com/r/emulation/comments/886ucq/):

In Single-Instance mode, when it wants to run a frame, instead it does this:

  • Disable audio and video, run a frame, Save State
  • Run additional frames with audio and video disabled if we want to run ahead more than one frame
  • Enable audio and video and run the frame we want to see
  • Load State
    All save states and load states are done to ram and never reach the disk.

In Two-Instance mode, it does this:

  • Primary core does Audio only, then saves state
  • Secondary core loads state, runs frames ahead discarding audio and video, then runs a frame with video only.
    For performance reasons, it only resyncs the secondary core when input is dirty, otherwise it keeps running additional frames on the secondary core while the input is clean.
    Why bother with Two-Instance mode at all? Many of the cores do not leave audio emulation in a clean state after loading state, so you would get buzzing. Using Two-Instance mode makes the primary core not do any load states and avoids that.
    In Single-Instance mode, it is possible to improve performance further by running ahead without loading state while input is clean, but I am not currently doing that. I'd imagine there'd be issues if calling the "run a frame" function left you in a state further along than a single frame.

@darksaviorx
Copy link

darksaviorx commented May 10, 2020

Marvelous is one of the most demanding snes games so I'll test that. The camp site after the intro.

Pi4, stock speeds. You must have some overhead for people that want to use a shader so I'm using the crt-pi shader at 1080p for these tests. lr-snes9x. Stock emulator settings.

video_threaded=off, video_max_swapchain_images = 1. 30fps
video_threaded=off, video_max_swapchain_images = 2. 30fps
video_threaded=off, video_max_swapchain_images = 3. 60fps
video_threaded=on, video_max_swapchain_images = 1. 60fps

I don't really notice the input lag so I'm not sure which setting gives better results.
My 2 cents, I personally don't like runahead due to the anomalies that might show up the higher you go, and it just feels weird to me so I'll turn that off if you guys enable it by default.

@Brunnis
Copy link

Brunnis commented May 10, 2020

Just a quick heads-up: a max swapchain images setting of 2 requires the performance CPU governor to perform well. Otherwise, the governor seems to be fooled into periodically downclocking the CPU. The performance impact is pretty big.

I'll see if I can comment a bit more later today.

@dankcushions
Copy link
Member Author

@darksaviorx

My 2 cents, I personally don't like runahead due to the anomalies that might show up the higher you go, and it just feels weird to me so I'll turn that off if you guys enable it by default.

runahead will cause unpleasant animation skipping if you raise it above the inherent input lag in the game/console/core, but my understanding is that run_ahead_frames = 1 is ALWAYS safe to use as ANY core/emulator/game will inherently have 1 frame of input lag. i am not suggesting we go above 1, definitely.

@dankcushions
Copy link
Member Author

removing video_max_swapchain_images = 2 from contention. with it on i get intermittent tearing on lr-snes9x, super metroid. easily spotted during scrolling, but isn't always present (perhaps the tearing line is sometimes hidden in overscan). it looks like it may be viable if we ever switch to pure KMS, or driver updates, etc.

@Brunnis
Copy link

Brunnis commented May 10, 2020

Yes, it appears to be fixed on 5.4 + full KMS. Could be fine with 5.4 + fake KMS as well, but I haven’t tested yet. Will let you know when I have.

@Brunnis
Copy link

Brunnis commented May 24, 2020

To be honest, I'm leaning towards just using video_threaded = "false" and leaving the rest as-is. Run-ahead is nice, but it's still what I would consider a hack. For future updates, max_ swapchain_ images = 2 might technically work fine, but will heavily eat into the margin available for shaders (and really should be paired with the performance governor to work well).

Just using the KMS video driver (fake or full) and disabling threaded video should provide ~2 frames lower input lag than the pre-Buster RetroPie builds anyway. That's a pretty nice and noticeable improvement.

@dankcushions
Copy link
Member Author

we would have to be careful with video_threaded = "false". a lot of retropie users use overlays and shaders, and for the pi3 IIRC threading was crucial for that combo to maintain speed, and i note that even the pi4 seems to struggle even with threading turned on: libretro/RetroArch#10688

(if anyone cares to benchmark to see if threaded = false affects this combo please do! i may try to later)

@bluestang2006
Copy link

On my Pi4 4GB I’ve been using the 5.4 kernel (64-bit) with vsync off, video thread off, run ahead off, and swap chain 2 and the few 2d fighting roms I play have performed very well. Input lag seems to much more reduced than the settings I had before.

However, Killer Instinct 1/2 (mame2003plus) need to have the video thread on, otherwise it plays at 30 fps. Killer Instinct 2 needs work still, playable but not enjoyable yet. Raiden Fighters 1/2 and Jet play very well on mame2016.

Enabling shaders for me is a performance hit on the FPS. Some games don’t seem to be affected as much but there is a noticeable hit. I left it off for now, but I’ll be testing some more in the coming days.

I’ve also been playing around with the KMS audio driver (vc4hdmi) and I was able to reduce the audio latency setting to 20 with alsathread. I need to go back and validate that again with the above settings. The audio driver is developmental, so there is no software mixing so no ES BGM audio apps should be enabled.

This thread has been helpful with being able to tweak my settings.

@barbudreadmon
Copy link
Contributor

@dankcushions second instance is basically a workaround for bad savestate code, supposedly it has some minor performance impact and i heard it also causes some additional disk I/O by duping the core, so it's probably better to avoid it whenever you can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants