Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Mark some tests as slow and schedule them at the beginning #15950

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

kjarosh
Copy link
Member

@kjarosh kjarosh commented Apr 10, 2024

By setting is_slow = true, a test may be marked as slow, which means that the duration of its execution is exceptionally long compared to other tests.

In order to minimize the duration of cargo test, slow tests should be executed at the beginning. This maximizes load on multithreaded CPUs by fully utilizing all threads and preventing late slow tests from stalling the whole suite.

By enabling the fast feature, slow tests may be skipped. This is useful e.g. when someone wants to quickly run the test suite without waiting for a long time in order to roughly assess whether their changes broke something.

Test Command Before After After with fast
cargo test 106.74s 77.45s (-27%) 43.55s (-59%)
cargo test -F imgtests 115.94s 80.55s (-31%) 52.22s (-55%)
cargo nextest run 118.382s 80.872s (-32%) 49.526s (-58%)
cargo nextest run -F imgtests 132.738s 91.500s (-31%) 69.857s (-47%)
cargo nextest run --cargo-profile ci 23.916s 13.178s (-45%) 10.070s (-58%)
CI ubuntu 118.932s 82.319s (-31%) N/A
CI macos 69.940s 29.981s (-57%) N/A
CI windows 310.591s 217.358s (-30%) N/A

@Dinnerbone
Copy link
Contributor

(For what it's worth, I highly recommend using nextest rather than test for local development!)

@kjarosh
Copy link
Member Author

kjarosh commented Apr 11, 2024

Right! Did not think about using nextest. 😄 Certainly has a better interface, but unfortunately tests are executing a bit longer :(

I added nextest benchmarks to the table in the description. At first glance, performance improvements are similar to cargo test. However, it seems that nextest does not execute tests marked as slow which is weird... 🤔 I'm going to investigate it tomorrow, and mark this PR as a draft for now.

@kjarosh kjarosh marked this pull request as draft April 11, 2024 00:02
@torokati44
Copy link
Member

You could try the profile dedicated to CI - it's a bit more optimized:
https://github.com/ruffle-rs/ruffle/pull/15320/files

@torokati44
Copy link
Member

torokati44 commented Apr 11, 2024

Although, I do like the idea of somehow trying to sort the test cases in decreasing real time duration. But I wish it was somehow ✨automagical✨, with no need to manually annotate anything. That sort (pun may or may not be intended) of thing is bound to get outdated.
I'm on the fence about the fast feature... It feels like not the right tool to achieve this (tests can be filtered already anyway, at least by name, after all), but what do I know, maybe it is.

@kjarosh
Copy link
Member Author

kjarosh commented Apr 11, 2024

This PR is just a suggestion, I tried speeding tests up somehow and noticed that at the end my CPU was sitting idle. Regarding the fast feature, I'm not sure about it either. It executes 98.86% of tests taking only 60% of the time. I'm not sure how useful it is, but decided to include it anyways for you to jugde.

I also thought about automating test annotation somehow. The idea of no annotations seems eerily close to the halting problem though ;) I was thinking about how to annotate the tests in order to schedule them properly. It could be the slow flag which divides tests into two categories, but it also could be a nondiscrete value (e.g. estimated execution time) which constitutes a total order. The latter seems to be an overkill, because (1) these values would be more dependent on the system, and (2) time improvement would be minimal compared to the former, which is a much simpler approach.

Regarding automation, I thought about giving a feedback to the user about slow tests which are fast and non-slow tests which are slow. That would require two time values in order to create hysteresis. They could be set manually (e.g. 1s,10s), or calculated at the end based on the probability distribution (e.g. σ,2σ). That of course is a bit complicated so I did not go in this direction as I wasn't sure whether we would want this approach at all. That also sounds like a problem that possibly could be solved at a toolchain/library level.

@torokati44
Copy link
Member

torokati44 commented Apr 11, 2024

This PR is just a suggestion,

Of course, and thanks! :)

Regarding automation, I thought about giving a feedback to the user about slow tests which are fast and non-slow tests which are slow.

nextest already warns about [SLOW] tests by itself, but it doesn't have the is_slow annotation of course, which would make this more or less acceptable for a given test case.

The idea of no annotations seems eerily close to the halting problem though ;) I was thinking about how to annotate the tests in order to schedule them properly. It could be the slow flag which divides tests into two categories, but it also could be a nondiscrete value (e.g. estimated execution time)

Bah, it could also be just a little system-local cache of "the last 5-10 times this test case was ran, it took xx milliseconds". Then to schedule, look up these records, average per case, and sort. The averaging should even out any discrepancies caused by P-core vs. E-core, powersave/performance, dynamic clocks, overall system load, thermal throttling, etc... And how a given test case is identified isn't that crucial either, as long as it works "reasonably enough", since it only determines the order of test executions, which shouldn't affect anything ™️, other than load balancing at the end, of course.

This could even be a generic cargo-test or cargo-nextest feature, maybe even enabled by default?

@torokati44
Copy link
Member

torokati44 commented Apr 11, 2024

This could even be a generic cargo-test or cargo-nextest feature, maybe even enabled by default?

Ahh, see: nextest-rs/nextest#905
Also somewhat related: nextest-rs/nextest#742
Ah, and of course: nextest-rs/nextest#958

@kjarosh
Copy link
Member Author

kjarosh commented Apr 12, 2024

Okay, I think I found out why nextest does not run tests marked as slow; I also found out why tests on nextest are slower compared to cargo test.

Turns out libtest_mimic adds the test kind as a prefix before its name (e.g. [slow] test). Nextest executes tests one by one matching them by name (with the prefix), but the filtering is done manually by our code (is_candidate), which does not take into the account the kind prefix as libtest-mimic does (see TestInfo::test_name_with_kind from libtest-mimic). That silently ignores any test with a kind, as they are not executed but marked as passed.

The strategy of executing tests by nextest also explains why it's significantly slower than cargo test. Regression tests always perform a full walk over tests/swfs regardless of whether a test is matched exactly by its name or not. This full walk is performed of course for each test.

Removing the full walk on exact match makes nextest about 15s faster than before, which is closer to being as fast as cargo test. (I suspect the difference will be even greater on Windows, as it is not very efficient with files.)

@kjarosh kjarosh marked this pull request as ready for review April 13, 2024 08:51
@kjarosh kjarosh force-pushed the slow-tests branch 2 times, most recently from 1a196a2 to 12713e7 Compare April 16, 2024 22:33
@torokati44
Copy link
Member

While I'm still not entirely sold on the manual [slow] annotation, I think the first two commits could be considered independently as well - they seem useful!

@kjarosh
Copy link
Member Author

kjarosh commented Apr 18, 2024

Created a new PR with these two commits: #16031

@adrian17
Copy link
Collaborator

I know I'm a bit late to the thread but

Although, I do like the idea of somehow trying to sort the test cases in decreasing real time duration.

I'm not sure why I'd ever want that tbh. And I haven't really seen other projects do this kind of thing before either?
(If we wanted to sort (which I personally don't), if anything, I'd go for increasing time duration; if there's a failing test, this will minimize average time to it being reported.)

@torokati44
Copy link
Member

I'm not sure why I'd ever want that tbh.

Because it minimizes the chance that a many-core CPU grinds through most tests in X seconds, and one or two really long tests happened to get started at let's say X*0.8, extending the overall time to wait by another large fraction of X, while most cores are already idle. If the short and quick tests are left to the end, they can maximally utilize all of the cores all the way until all tests are done, therefore being quicker overall. It's a classic "packing problem" kind of thing...

@kjarosh
Copy link
Member Author

kjarosh commented Apr 18, 2024

if anything, I'd go for increasing time duration

That is better when we expect tests to fail instead of pass, but generally in CI it's way more probable for the tests to pass. When running locally we might have different expectations, although I usually have a positive attitude and expect them to pass ;)

By setting `is_slow = true`, a test may be marked as slow, which means
that the duration of its execution is exceptionally long compared to
other tests.
In order to minimize the duration of `cargo test`,
slow tests should be executed at the beginning.
This maximizes load on multithreaded CPUs by fully utilizing all
threads and preventing late slow tests from stalling the whole suite.
By enabling the `fast` feature, slow tests are skipped.
This is useful e.g. when someone wants to quickly run the test suite
without waiting for a long time in order to roughly assess whether
their changes broke something.
The option `sleep_to_meet_frame_rate` artificially increases
the duration of tests in order to run at realtime speed.
This patch marks all tests which execute significantly longer
than an average test with `is_slow = true`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants