Add experimental hash protection from concurrent modifications #1783

vrurg · 2024-01-09T14:29:31Z

This is a quick-baked attempt to provide protection from a hash being modified while concurrent threads are also accessing it.

This PR is a working draft. It does nothing to the lower-level MVMStrHashTable to protect from MoarVM internal problems. But it does intercept NQP and Raku attempts to break the rules.

Also, I know too little about speshing to get it play well with the protection. Therefore this PR simply disables speshing of hashes and slurpy named arguments.

vrurg · 2024-01-09T14:44:19Z

This was my attempt to resolve systematic cannot iterate over VMNull errors I get with my work application. This patch does allow to catch unsafe operations in the following example:

use v6.e.PREVIEW;

my %h;

constant THREADS-PER-GROUP = 20;

my $starter = Promise.new;

sub writer($tid) {
    my $id = $tid.fmt('%5d');
    for ^100000 -> $i {
        %h{$i} = $i ~ ":" ~ $id;
    }
}

sub reader($tid) {
    my $id = $tid.fmt('%5d');
    for ^100000 -> $i {
        note "[$id] $i: ", %h{$i};
    }
}

my @w;
my @pw;
my @pr;
for ^THREADS-PER-GROUP -> $thread-id {
    @pw[$thread-id] = Promise.new;
    @w.push: start {
        @pw[$thread-id].keep;
        await $starter;
        writer($thread-id);
    }

    @pr[$thread-id] = Promise.new;
    @w.push: start {
        @pr[$thread-id].keep;
        await $starter;
        reader($thread-id);
    }
}

await |@pr, |@pw;

$starter.keep;

await @w;

note "Done";

But it proved that corrupted hash data is not the cause of the error above. Instead, on a few occasions, I observed another problem where metamodel classes was recveiving a VMNull while reading from attributes. I.e. in something like the following code if %!attr has never been accessed by a thread yet then there is a chance it'd be nullish:

method foo() {
    if $debug { note(%!attr.HOW.name(%!attr)); } # Here we may get VMNull
    my $v = %!attr<foo>; # Here a valid hash is observed
}

Unfortunately, I haven't managed to produce a reliable scenario where the problem would show up more or less regularly.

lizmat · 2024-01-09T15:29:37Z

Perhaps relatedly: the REA ecosystem harvester regularly crashes on https://github.com/rakudo/rakudo/blob/225533d6f2a207f8b913c0f58bc9b6a4dcd4b3b8/src/core.c/IO/Pipe.rakumod#L19 where apparently $!nl-in has become unset.

Now. https://github.com/rakudo/rakudo/blob/main/src/core.c/IO/Handle.rakumod#L5 shows that $!nl-in will normally always be set. And that uses the (possibly fragile) nqp::attrinited functionality, of which I know that @jnthn wants to get rid of.

As an experiment, I will change that code in IO::Handle to set the default in TWEAK, which would not use nqp::attrinited. And see if the harvester crashes less. If it does crash less, then I guess we need to look at nqp::attrinited more closely.

In response to #5444 and the research that vrurg did for MoarVM/MoarVM#1783 this removes the default value setting of the $!chomp, $!nl-in and $!nl-out to TWEAK, instead being specified on the attribute definition. By doing this, the default value won't be set internally by the nqp::attrinited op, which is now suspected of being a possible cause of random crashes. If the REA harvester doesn't crash anymore (runs hourly, crashed about twice a week), then we have a direction to further look into these crashes.

vrurg · 2024-01-09T15:57:22Z

As an experiment, I will change that code in IO::Handle to set the default in TWEAK, which would not use nqp::attrinited. And see if the harvester crashes less. If it does crash less, then I guess we need to look at nqp::attrinited more closely.

Sounds promising.

While I was able to reproduce the problem with a local test (then something changed and the test doesn't crash anymore) the scenario was always pointing and deserialization stage. If memory doesn't lie to me then MoarVM does lazy deserialization on demand and I was thinking that there is a race possible where while one thread is still deserializing, the other one already reads NULL from undeserialized attribute. But later observations did show that the real app log has errors recorded long after all related modules should've been loaded. It doesn't mean that I don't overlook something, but attrinit is way more likely source of issues.

Now. https://github.com/rakudo/rakudo/blob/main/src/core.c/IO/Handle.rakumod#L5 shows that $!nl-in will normally always be set.

This is the only part where I'm slightly confused. Rakudo doesn't currently use attrinit with MoarVM – all is done by dispatchers. NQP does and my experience backs this by pointing out that whenever I was able to pinpoint the location where VMNull pops up it was NQP code. But here it is clearly a Raku case. It would make sense to somehow get a full backtrace to see if it's the attribute itself or something in the underlying NQP.

lizmat · 2024-01-09T16:26:07Z

Rakudo doesn't currently use attrinit with MoarVM – all is done by dispatchers

I think that's incorrect: https://github.com/rakudo/rakudo/blob/main/src/Perl6/World.nqp#L3853 is where the BUILDALL builder creates nqp::attrinited code.

lizmat · 2024-01-09T16:29:21Z

And the default BUILDALL at https://github.com/rakudo/rakudo/blob/main/src/core.c/Mu.rakumod#L222 contains references to nqp::p6attrinited. Not sure what the difference is, if there is any.

lizmat · 2024-01-09T16:32:39Z

Documentation of nqp::attrinited: https://github.com/Raku/nqp/blob/main/docs/ops.markdown#attrinited

"Note that any access to the attribute that results in a getattr call causes it to be inited."

I wonder if that would be the explanation: that a getattr sneaking in would mark the attribute as initialized, when it wasn't yet.

lizmat · 2024-01-09T16:35:11Z

Continue to think out loud re attribute initialization: I wonder if Opaque objects could have their attributes (cheaply) initialized with a sentinel value, and then an nqp::eqaddr could be used to see whether the attribute had been initialized already or not.

vrurg · 2024-01-09T20:36:45Z

Along the way of investigation you missed another location: https://github.com/rakudo/rakudo/blob/69b8a24ae34ec70f51e1b9f20b37d240cec1ffd3/src/Perl6/Actions.nqp#L977

Continue to think out loud re attribute initialization: I wonder if Opaque objects could have their attributes (cheaply) initialized with a sentinel value

This is kind of what @jnthn has done to it, in my understanding (I never really in-depth analyzed that part): https://github.com/rakudo/rakudo/blob/69b8a24ae34ec70f51e1b9f20b37d240cec1ffd3/src/Perl6/bootstrap.c/BOOTSTRAP.nqp#L1425

lizmat · 2024-01-10T10:41:36Z

The REA harvester crashed again with a VMnull error at the same location. So it does NOT appear to be related to nqp::attrinited.

vrurg · 2024-01-26T22:31:44Z

@niner Would it be possible to have a look at this PR and decide what to do about it? It's already let me locate a problem with %!mro in Metamodel::C3MRO which otherwise wouldn't be that easy to spot. So, there is a point in having this functionality in the VM.

niner · 2024-01-27T12:24:47Z

src/moar.h

@@ -158,6 +158,7 @@ MVM_trycas_AO(volatile AO_t *addr, uintptr_t old, const uintptr_t new) {
 #define MVM_HASH_RANDOMIZE 1
 #define MVM_HASH_MAX_PROBE_DISTANCE 255
 #define MVM_HASH_INITIAL_BITS_IN_METADATA 5
+#define MVM_HASH_PROTECT 1


We surely want this to be disabled by default, don't we?

For sure. The first comment says that this is draft. Don't remember why I haven't explicitly marked it so.

I more worried about the necessity to disable speshing because I haven't found a way to initialize per-hash lock for spesh-allocated objects.

This is a quick-baked attempt to provide protection from a hash being modified while concurrent threads are also accessing it.

niner reviewed Jan 27, 2024

View reviewed changes

vrurg force-pushed the draft-of-hash-lock-protection branch from 09c3ec9 to 1595d9a Compare January 30, 2024 19:53

vrurg mentioned this pull request Feb 11, 2024

Metamodel concurrency problems Raku/problem-solving#418

Open

Add experimental hash protection from concurrent modifications

751a2ff

This is a quick-baked attempt to provide protection from a hash being modified while concurrent threads are also accessing it.

vrurg force-pushed the draft-of-hash-lock-protection branch from 1595d9a to 751a2ff Compare February 22, 2024 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental hash protection from concurrent modifications #1783

Add experimental hash protection from concurrent modifications #1783

vrurg commented Jan 9, 2024 •

edited

vrurg commented Jan 9, 2024

lizmat commented Jan 9, 2024

vrurg commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

vrurg commented Jan 9, 2024

lizmat commented Jan 10, 2024 •

edited

vrurg commented Jan 26, 2024

niner Jan 27, 2024

vrurg Jan 27, 2024

Add experimental hash protection from concurrent modifications #1783

Are you sure you want to change the base?

Add experimental hash protection from concurrent modifications #1783

Conversation

vrurg commented Jan 9, 2024 • edited

vrurg commented Jan 9, 2024

lizmat commented Jan 9, 2024

vrurg commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

lizmat commented Jan 9, 2024

vrurg commented Jan 9, 2024

lizmat commented Jan 10, 2024 • edited

vrurg commented Jan 26, 2024

niner Jan 27, 2024

Choose a reason for hiding this comment

vrurg Jan 27, 2024

Choose a reason for hiding this comment

vrurg commented Jan 9, 2024 •

edited

lizmat commented Jan 10, 2024 •

edited