-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental hash protection from concurrent modifications #1783
base: main
Are you sure you want to change the base?
Conversation
This was my attempt to resolve systematic cannot iterate over VMNull errors I get with my work application. This patch does allow to catch unsafe operations in the following example: use v6.e.PREVIEW;
my %h;
constant THREADS-PER-GROUP = 20;
my $starter = Promise.new;
sub writer($tid) {
my $id = $tid.fmt('%5d');
for ^100000 -> $i {
%h{$i} = $i ~ ":" ~ $id;
}
}
sub reader($tid) {
my $id = $tid.fmt('%5d');
for ^100000 -> $i {
note "[$id] $i: ", %h{$i};
}
}
my @w;
my @pw;
my @pr;
for ^THREADS-PER-GROUP -> $thread-id {
@pw[$thread-id] = Promise.new;
@w.push: start {
@pw[$thread-id].keep;
await $starter;
writer($thread-id);
}
@pr[$thread-id] = Promise.new;
@w.push: start {
@pr[$thread-id].keep;
await $starter;
reader($thread-id);
}
}
await |@pr, |@pw;
$starter.keep;
await @w;
note "Done"; But it proved that corrupted hash data is not the cause of the error above. Instead, on a few occasions, I observed another problem where metamodel classes was recveiving a
Unfortunately, I haven't managed to produce a reliable scenario where the problem would show up more or less regularly. |
Perhaps relatedly: the REA ecosystem harvester regularly crashes on https://github.com/rakudo/rakudo/blob/225533d6f2a207f8b913c0f58bc9b6a4dcd4b3b8/src/core.c/IO/Pipe.rakumod#L19 where apparently $!nl-in has become unset. Now. https://github.com/rakudo/rakudo/blob/main/src/core.c/IO/Handle.rakumod#L5 shows that $!nl-in will normally always be set. And that uses the (possibly fragile) As an experiment, I will change that code in |
In response to #5444 and the research that vrurg did for MoarVM/MoarVM#1783 this removes the default value setting of the $!chomp, $!nl-in and $!nl-out to TWEAK, instead being specified on the attribute definition. By doing this, the default value won't be set internally by the nqp::attrinited op, which is now suspected of being a possible cause of random crashes. If the REA harvester doesn't crash anymore (runs hourly, crashed about twice a week), then we have a direction to further look into these crashes.
Sounds promising. While I was able to reproduce the problem with a local test (then something changed and the test doesn't crash anymore) the scenario was always pointing and deserialization stage. If memory doesn't lie to me then MoarVM does lazy deserialization on demand and I was thinking that there is a race possible where while one thread is still deserializing, the other one already reads
This is the only part where I'm slightly confused. Rakudo doesn't currently use attrinit with MoarVM – all is done by dispatchers. NQP does and my experience backs this by pointing out that whenever I was able to pinpoint the location where |
I think that's incorrect: https://github.com/rakudo/rakudo/blob/main/src/Perl6/World.nqp#L3853 is where the |
And the default |
Documentation of "Note that any access to the attribute that results in a I wonder if that would be the explanation: that a |
Continue to think out loud re attribute initialization: I wonder if Opaque objects could have their attributes (cheaply) initialized with a sentinel value, and then an |
Along the way of investigation you missed another location: https://github.com/rakudo/rakudo/blob/69b8a24ae34ec70f51e1b9f20b37d240cec1ffd3/src/Perl6/Actions.nqp#L977
This is kind of what @jnthn has done to it, in my understanding (I never really in-depth analyzed that part): https://github.com/rakudo/rakudo/blob/69b8a24ae34ec70f51e1b9f20b37d240cec1ffd3/src/Perl6/bootstrap.c/BOOTSTRAP.nqp#L1425 |
The REA harvester crashed again with a VMnull error at the same location. So it does NOT appear to be related to |
@niner Would it be possible to have a look at this PR and decide what to do about it? It's already let me locate a problem with |
@@ -158,6 +158,7 @@ MVM_trycas_AO(volatile AO_t *addr, uintptr_t old, const uintptr_t new) { | |||
#define MVM_HASH_RANDOMIZE 1 | |||
#define MVM_HASH_MAX_PROBE_DISTANCE 255 | |||
#define MVM_HASH_INITIAL_BITS_IN_METADATA 5 | |||
#define MVM_HASH_PROTECT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We surely want this to be disabled by default, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. The first comment says that this is draft. Don't remember why I haven't explicitly marked it so.
I more worried about the necessity to disable speshing because I haven't found a way to initialize per-hash lock for spesh-allocated objects.
09c3ec9
to
1595d9a
Compare
This is a quick-baked attempt to provide protection from a hash being modified while concurrent threads are also accessing it.
1595d9a
to
751a2ff
Compare
This is a quick-baked attempt to provide protection from a hash being modified while concurrent threads are also accessing it.
This PR is a working draft. It does nothing to the lower-level
MVMStrHashTable
to protect from MoarVM internal problems. But it does intercept NQP and Raku attempts to break the rules.Also, I know too little about speshing to get it play well with the protection. Therefore this PR simply disables speshing of hashes and slurpy named arguments.