Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can dpkg-query be made multithreaded, and would that help? #228

Open
ctrlcctrlv opened this issue Nov 9, 2022 · 19 comments
Open

Can dpkg-query be made multithreaded, and would that help? #228

ctrlcctrlv opened this issue Nov 9, 2022 · 19 comments

Comments

@ctrlcctrlv
Copy link
Contributor

It's really slow. I don't mind working on it if maintainers think it's possible.

@TheAssassin
Copy link
Member

Would be quite a lot of work. Many processes rely on others running beforehand. For instance, during deployment, the copy operations may not run in parallel, since this might induce race conditions.

Some aspects can be parallelized, though, even though I'm not sure there will be a huge benefit speed wise. For instance, looking up copyright files on Debian and derivatives is really, really slow on some systems (less of a problem in one-time VMs as used in CI/CD environments).

The pre-copy phase (which we consider the deployment phase, i.e., looking up dependencies and registering them in queues to defer the execution of the actual copy process) could be done in multiple threads as long as access to the data storage is synchronized. But, again, I am not entirely sure there's a huge gain.

What exactly is your problem, though? "It is slow" doesn't tell me anything useful. There's a million ways in which an application can be slow.

@ctrlcctrlv
Copy link
Contributor Author

I only mean it takes a long time, but I expect it to as the resulting AppImage is, in this case, 128MB.

@TheAssassin
Copy link
Member

Yeah, but why? Are you using plugins which take a long time because they redundantly run some deployment processes, for instance? Or is it your system's I/O that slows everything down? Are you on Debian and are affected by that copyright files deployment slow-down I observe a lot? Come on, please be a little more helpful. https://www.chiark.greenend.org.uk/~sgtatham/bugs.html

@ctrlcctrlv
Copy link
Contributor Author

Yes I'm on Debian and yes I did see a lot of dpkg-query notices. I am trying, I hate when users do this too, lol. I just couldn't imagine my distribution mattered.

@TheAssassin
Copy link
Member

Just for testing, re-run the process with export DISABLE_COPYRIGHT_FILES_DEPLOYMENT=1. Not recommended for production, but for A/B testing, it'll do.

@ctrlcctrlv
Copy link
Contributor Author

Yep that's the issue. The deployment flies by with that flag.

@ctrlcctrlv
Copy link
Contributor Author

That sure is annoying. I see why it's happening, it's because the files are linked to /lib and not in /usr/lib.

Compare:

fred@mapache:~/Workspace/TTAegisub/packages/appimage_bundle$ dpkg-query -S /usr/lib/x86_64-linux-gnu/libffms2.so.5.0.0
libffms2-5:amd64: /usr/lib/x86_64-linux-gnu/libffms2.so.5.0.0
fred@mapache:~/Workspace/TTAegisub/packages/appimage_bundle$ dpkg-query -S /lib/x86_64-linux-gnu/libffms2.so.5.0.0
dpkg-query: no path found matching pattern /lib/x86_64-linux-gnu/libffms2.so.5.0.0
$ ls -alh /|grep lib
lrwxrwxrwx   1 root root    7 Feb 19  2022 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Feb 19  2022 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Feb 19  2022 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Feb 19  2022 libx32 -> usr/libx32

Why not resolve symlink before the call out to dpkg-query? Am I missing something?

@TheAssassin
Copy link
Member

So just for the record, this can happen when your I/O isn't quite up to date (e.g., using HDDs (or SSDs via SATA)), when you have hundreds of packages installed, when your filesystem induces a slowdown etc. I'd call this an I/O bandwidth issue. As said, in the real world, it usually doesn't matter so much since most people use automated builds to generate their release binaries (which is something I'd recommend, too). For local testing, the export workaround will help you speed up things. The copyright files deployment typically shouldn't introduce any bugs in production as it's hardly ever touched anyway.

By the way, you should post links to public projects so devs can have a glance at it to look for typical issues.

@TheAssassin
Copy link
Member

Why not resolve symlink before the call out to dpkg-query? Am I missing something?

Probably an oversight, I guess? On the other hand, dpkg-query should be aware of the symlinks, since they're part of the pagkage. Does resolving the links beforehand speed up the process?

@ctrlcctrlv
Copy link
Contributor Author

The symlinks are not part of the package in this case, the symlinks in the root are put there by the base-files package.

@ctrlcctrlv
Copy link
Contributor Author

hundreds of packages installed

Thousands.

fred@mapache:~$ dpkg -l | wc -l
4087

@TheAssassin
Copy link
Member

The symlinks are not part of the package in this case, the symlinks in the root are put there by the base-files package.

I guess in that case, for peace of mind, you'd actually want to ask dpkg-query about either location...

@ctrlcctrlv
Copy link
Contributor Author

I actually think this is a bug in linux-deploy and you should be calling readlink.

@TheAssassin
Copy link
Member

But how can we guarantee (at least to some extent) we capture the right copyright file if we look up the paths first? And, again, does it speed things up? As said, I think actually it might make sense to even look up both locations.

@ctrlcctrlv
Copy link
Contributor Author

I don't know if it speeds it up as I haven't patched this file yet or built linuxdeploy yet.

subprocess::subprocess proc{{"dpkg-query", "-S", path.c_str()}};

Patching this should be easy, unless you're against fixing it this way.

@ctrlcctrlv
Copy link
Contributor Author

As I see it we now have two issues: copyright detection failure due to non-resolution of symlinks, and the original speed issue.

@TheAssassin
Copy link
Member

Patching this should be easy, unless you're against fixing it this way.

Surely not. I expressed my concerns about changing the process. This just needs testing. Please don't hesitate to open a PR, ideally with some number crunching.

@ctrlcctrlv ctrlcctrlv changed the title Can the deployment be made multithreaded? Can dpkg-query be made multithreaded, and would that help? Nov 9, 2022
@ctrlcctrlv
Copy link
Contributor Author

With the advent of #231 I went from almost all my libraries except ones I compiled myself in /opt failing to find copyright files to none of them failing, so I have renamed this issue, as it does nothing for the speed issue.

@TheAssassin
Copy link
Member

As said before, I'm not sure multithreading will speed things up, since it looks like some I/O bottleneck to me. We'd need an alternative frontend for dpkg, I guess... (maybe we can ask it to list multiple packages at once...?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants