Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server stops transmitting frames #121

Closed
nappy opened this issue Feb 28, 2023 · 24 comments
Closed

Server stops transmitting frames #121

nappy opened this issue Feb 28, 2023 · 24 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@nappy
Copy link

nappy commented Feb 28, 2023

Description of the bug
The server stops transmitting any updates of the view. The screen on the viewer freezes while the connection and inputs are still working. The issue seems to correlate with usage of input fields/animations especially when low memory.

How To Reproduce
While there are no steps known so far to reproduce the issue deterministically, opening new frames of suggested websites (amazon, facebook, google, ebay, youtube, wikipedia etc) preferably with input elements allows to reproduce this reasonably fast.
Steps to reproduce the behavior:

  1. Setup droidVNC-NG in server or repeater mode.
  2. Launch websockify and connect with novnc viewer.
  3. Open new chrome tabs of popular websites until the screen on the viewer freezes.

Sometimes it can take a while to reproduce the issue. If it does not happen within a minute it can help to pause sending inputs for 10-20 seconds. Often when you want to resume the session, it is frozen right away.

Expected Behavior
The server should continue transmitting the frames that according to the logs it still receives, but does not send to the client connection.

Logs

2023-02-28 11:55:12.882  1175-1175  MainService             net.christianbeier.droidvnc_ng       D  image available
2023-02-28 11:55:12.899  1175-1175  droidvnc-ng (native)    net.christianbeier.droidvnc_ng       D  vncUpdateFramebuffer: copy took 8.512 ms
2023-02-28 11:55:14.280  1175-1175  MainService             net.christianbeier.droidvnc_ng       D  image available
2023-02-28 11:55:14.299  1175-1175  droidvnc-ng (native)    net.christianbeier.droidvnc_ng       D  vncUpdateFramebuffer: copy took 8.639 ms
2023-02-28 11:55:17.998  1175-1175  MainService             net.christianbeier.droidvnc_ng       D  image available
2023-02-28 11:55:18.017  1175-1175  droidvnc-ng (native)    net.christianbeier.droidvnc_ng       D  vncUpdateFramebuffer: copy took 8.569 ms
2023-02-28 11:55:31.397  1175-1175  MainService             net.christianbeier.droidvnc_ng       D  image available
2023-02-28 11:55:31.416  1175-1175  droidvnc-ng (native)    net.christianbeier.droidvnc_ng       D  vncUpdateFramebuffer: copy took 8.355 ms
2023-02-28 11:55:43.064  1175-1175  MainService             net.christianbeier.droidvnc_ng       D  image available
2023-02-28 11:55:43.083  1175-1175  droidvnc-ng (native)    net.christianbeier.droidvnc_ng       D  vncUpdateFramebuffer: copy took 10.247 ms

The sever keeps receiving frames as seen above. There are no errors or exceptions. According to my analysis the server is not sending any packages on the TCP level though.

Environment:

  • Device Model: jhs558 (Giada DN75)
  • Build number: rk3399_all-userdebug 7.1.2 NHG47K eng.zc.20210907.105050 test-keys
  • droidVNC-NG version: 1.3.5
  • Android version: 7.1.2
  • Client-side OS and version: Microsoft Windows 10 Version 21H2
  • VNC client and version: noVNC-1.4.0 / websockify 0.10.0

Additional context
When you disconnect and reconnect to the server, you will recieve the updated screen content and the connection continues to work until the next occurrence of the issue.

@nappy nappy added the bug Something isn't working label Feb 28, 2023
@bk138
Copy link
Owner

bk138 commented Feb 28, 2023

Hi @nappy have you tried:

  • without webosckify, i.e. directly
  • with another server device

What happens in these cases?

@nappy
Copy link
Author

nappy commented Feb 28, 2023

With thightvnc 2.8.75 I was able to reproduce the issue with a direct connection.
With ultravnc 1.3.8.1 I was so far not able to reproduce, although it has been freezing with a not responding error (possibly unrelated)

In general I experience this kind of issue on many devices. On the device above however it is particular easy to reproduce.

@bk138
Copy link
Owner

bk138 commented Feb 28, 2023

I can try with the specified tightvnc viewer, but have only android 12 devices. Emulator with android 7 worked fine...

@nappy
Copy link
Author

nappy commented Feb 28, 2023

If it helps I can lend you my device for analysis. It is not that it does not happen elsewhere (even on newer Android version iirc) but to debug the issue that device might be helpful. I think a similar issue has been reported before right?

@bk138
Copy link
Owner

bk138 commented Feb 28, 2023

That's very nice of you, this would be a powerful last resort :-). I'll get back to you once I found the time to test with the emu and my devices. Is #113 the same or different from what you're seeing?

@nappy
Copy link
Author

nappy commented Feb 28, 2023

Its different in that I dont see the underlying connection terminating. Its a little bit similar in the fact that it might be related to low memory conditions. For the most parts its seems to be a different issue though. The closest issue I found was this one here: LibVNC/libvncserver#371 Although I did not measure or noticed any CPU loads.

From my I own analysis I suspect that there is some bug in libvnc in the detection wether the screen has changed (there were some conditions where the client defines the area its interested in, again the framebuffer seems to still get updated), or maybe the sending gets blocked somehow. I could not really get my native debuger to work properly, but if I get directions to add some logging to a particuar piece
of code I could do that.

@bk138 bk138 mentioned this issue Mar 14, 2023
@bk138 bk138 self-assigned this Jul 19, 2023
@bk138
Copy link
Owner

bk138 commented Jul 19, 2023

@nappy I'm now experiencing the same issue with an rk3399 board running Android 10, especially when I have mouse pointers enabled (current master branch). Can you try with droidVNC-NG's current master and mouse pointers enabled to see if you can repro more quickly?

Edit: happens with MultiVNC and TigerVNC viewers, so I reckon it's not a client bug.

@bk138
Copy link
Owner

bk138 commented Jul 19, 2023

Found out that the client output thread goes in here
https://github.com/LibVNC/libvncserver/blob/7fda996b43a57da5aaaecaa5cb489c7e0ccc3dff/src/libvncserver/main.c#L477
i.e. cl->requestedRegion stays empty.

@Phliplip
Copy link

@bk138 I'm trying out droidVNC on a Android Kiosk device. I'm experiencing the same after a few seconds.
I'm using RealVNC's VNC Connect on a Mac, but have also tried same client on Windows with same result.

I have now tried with Mac's built-in VNC client, and so far it has not crashed, however the experience with this client seems a bit more laggy in terms of screen updates.

To add, It was only the screen input that stopped, I have the test kiosk next to me, and I can see that the mouse input I do on the VNC client is actually happening on the device - Just not getting the screen update to the VNC client.

@bk138
Copy link
Owner

bk138 commented Sep 11, 2023

@Phliplip is your board coincidentally also an rk3399?

@Phliplip
Copy link

@Phliplip is your board coincidentally also an rk3399?

Device is a BIGPOS® 2150, the specifications from the manufactorer don't mention what board it is based on.
To follow-up I also have it working with TigerVNC client, so it seems to be a problem only related to RealVNC's client VNC Connect

@badfish
Copy link

badfish commented Nov 1, 2023

I can reproduce this consistently on an android tv: screen updates work fine in the kodi app, but freeze when going to flauncher, and start up again when switching back to kodi.

I've built this app from the git source and will try to investigate the bug, but any pointers as to where I should look would be welcome.

@bk138
Copy link
Owner

bk138 commented Nov 1, 2023

@badfish #121 (comment) is the closest I got so far.

@badfish
Copy link

badfish commented Nov 2, 2023

#121 (comment)
Sorry: red herring. The onImageAvailable() method is never called when 'flauncher' is the active app, except for once when it first appears. So it's a bug in flauncher.

@bk138
Copy link
Owner

bk138 commented Nov 2, 2023

Yeah, that's in line with #121 (comment) - thing is, I saw that it worked initally with the same app in the foreground of the server's device and then wound down.

@bk138 bk138 added the help wanted Extra attention is needed label Nov 2, 2023
@RaulMerelli
Copy link

I think I'm experiencing the same issue with the android 13 device I'm using. After some hours, after the screen turned off all I can see is a black image. Input still works. The viewer is RealVNC on Windows. droidVNC-NG version: 2.2

@bk138 bk138 removed their assignment Apr 22, 2024
@bk138 bk138 self-assigned this May 11, 2024
@bk138
Copy link
Owner

bk138 commented May 30, 2024

working theory 1 - cl->requestedRegion race ( ⛔ )

  • cl->requestedRegion is set from rfbProcessclientNormalMessage() wichh receives a framebuffer update request
  • cl->requestedRegion is emptied in rfbSendFramebufferUpdate() before sending the update
  • race condition in that the output thread empties a requestedRegion thas was just set by the input thread so that there is not framebuffer update sent to a request, ending the whole request, response cycle?
  • added debug tracing hints that this is not the case 😐 or trace timing flaky
  • write acess to requestedRegion protected by cl->updateMutex
  • investigate cl->updateMutex for reading requestedRegion 🤔

working theory 2 - no framebuffer update request because framebuffer update failed ⛔

  • request, response cycle ended by a response being prepared (requestedRegion emptied), but not actually sent?
  • rfbSendFramebufferUpdate always succeeds as per tracing ⛔

working theory 3 - an encodings thing as it only happens on certain hardware? ⛔

  • finding: server indicates using LastRect, but in certain cases client does not receive it 💡
  • client waits for lastRect-encoded rect, but never receives it -> decoder stuck
    • seen with LibVNCClient-based clients and also xtightvncviewer
  • wireshark indeed shows that LastRect does not arrive at client 💡
    • server says it sent it to socket
    • but: client says after 4 rects that it's waiting for the next one, wireshark decoded 9 rects 💡
vnc client, stuck after the last message:
client-side wireshark:
  • seems like a bug in the tight encoder 💡
  • try other network interface: bingo! works flawlessly over WLAN 💡

working theory 4 - EEE settings ⛔

  • after installing termux and in there ethtool
  • check powersave of NICs via /data/data/com.termux/files/usr/bin/ethtool --show-eee eth0 says
    Cannot get EEE settings: Operation not supported on transport endpoint

working theory 5 - MTU/speed woes? ✅

@bk138
Copy link
Owner

bk138 commented Jun 5, 2024

@nappy were you seeing this using LAN or WLAN on your rk3399 board?

@nappy
Copy link
Author

nappy commented Jun 5, 2024

The device I mentioned above, where the issue is most prevalent, is connected via LAN.

@bk138
Copy link
Owner

bk138 commented Jun 5, 2024

@nappy Would you be available to try and see if the workaround works for your board?

It's basically:

  • install termux, e.g. from https://f-droid.org/en/packages/com.termux/
  • open it, in there:
    • pkg install root-repo
    • pkg install net-tools
  • on some computer attached to device via adb:
    • adb root
    • adb shell /data/data/com.termux/files/usr/bin/mii-tool eth0 # should show 1000 MBit/s and that is auto-negotiated
    • adb shell /data/data/com.termux/files/usr/bin/mii-tool -F 100baseTx-FD eth0 # the actual workaround forcing the interface to 100MBit/s, after this I saw no more hangs
    • adb shell /data/data/com.termux/files/usr/bin/mii-tool eth0 # should now show 100 MBit/s

I could not find a way to persist this on my test device (userdebug build) via init.rc though, must be some SELinux issue (but no audit logs) :-/

@nappy
Copy link
Author

nappy commented Jun 5, 2024

Its currently running a newer firmware version, but I will make sure I reproduce the issue first and then to apply your workaround by tomorrow.

@bk138
Copy link
Owner

bk138 commented Jun 5, 2024

Its currently running a newer firmware version, but I will make sure I reproduce the issue first and then to apply your workaround by tomorrow.

Nice! Thanks very much! Also see my note in #121 (comment) about IPv6 and how it exacerbated the issue, maybe it helps.

Edit: also see #121 (comment) i.e. happens on Android 10 rk3399 board as well.

@nappy
Copy link
Author

nappy commented Jun 6, 2024

@bk138 The issue was reproducible with Android 8 very easily, but the workaround does indeed mitigate the issue:
After setting the network mode to 100baseTx-FD the problem is not occurring anymore.

@bk138
Copy link
Owner

bk138 commented Jun 6, 2024

@nappy great news! Do you have any idea on how to persist this?

@bk138 bk138 closed this as completed in 3256b4b Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants