Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added lock when updating actor via UpdateActorHandler #620

Merged
merged 2 commits into from
May 22, 2024

Conversation

asdfzdfj
Copy link
Contributor

while working on incoming activity, the UpdateActorMessage could be fired more than once per actor, and even with #602 applied there could still be multiple UpdateActorHandler running at the same time updating the same actor, before the added guard could kick in

this adds a bit of locking in UpdateActorHandler per actor url to try and ensure that only one handler is actually updating an actor at a time, and discards any following update actor message that comes after

added the force parameter to UpdateActorMessage and handler to force actor update regardless of #602 last updated time (still have to get past the lock though)

the mbin:actor:update command also got some jank in there fixed as they also used UpdateActorMessage to do its job, and the force parameter is also used here

also try moving the fetched time guard in #602 up to the handler level the guard is alright but such guard should resides on the higher level before calling the update method

@BentiGorlich BentiGorlich added enhancement New feature or request backend Backend related issues and pull requests labels Mar 24, 2024
@BentiGorlich
Copy link
Member

Great idea to use locks for this. Didn't know they existed :D I would probably just have used a DB field for it, which is just bad in comparision.

I think the lock should be aquired when dispatching the message and then passing the lock along with the actor to the message (see Lock#Serialization). With your solution you ensuring that only one update actor job is done at a time for an actor, however it is not solving the problem of dispatching multiple update actor messages for the same actor, so we still have a strain on rabbit...

What do you think?

@BentiGorlich
Copy link
Member

And I just noticed that the default lock dsn that everybody uses atm is the flock store. I think we should think about changing the default (because it uses filesystem locks which I doubt are great). I think I would lean towards redis...
As your new code is the first and only bit of code that uses the lock component it may result in errors due to a wrong setup, because it was just not needed beforehand

@asdfzdfj
Copy link
Contributor Author

I think the lock should be aquired when dispatching the message and then passing the lock along with the actor to the message (see Lock#Serialization). With your solution you ensuring that only one update actor job is done at a time for an actor, however it is not solving the problem of dispatching multiple update actor messages for the same actor, so we still have a strain on rabbit...

What do you think?

while I was making this I couldn't find the example for the lock+key usage, and lock feature sets (including keys) basically depends on what stores you used, so I opted to do it in a way that it only needs the locks to work and depend less on fancier features that may or may not be present with any particular stores (also I used semaphore store for locks on my test instance for some inexplicable reason, and the key feature doesn't work with that store). maybe I'll see what can I come up with that

though I do agree that this should be guarded since before dispatching the update actor message


And I just noticed that the default lock dsn that everybody uses atm is the flock store. I think we should think about changing the default (because it uses filesystem locks which I doubt are great). I think I would lean towards redis... As your new code is the first and only bit of code that uses the lock component it may result in errors due to a wrong setup, because it was just not needed beforehand

I think the flock store that we have configured is fine, as the symfony caching components already uses lock behind the scenes and so far it seems to be working fine (also this is actually how I found out that symfony has locks, I happened to notice some debug locg lines mentioning lock acquisition using cache key while the caching call was working)

still, a small docs con configuring lock stores if soneone want a more durable lock store might be nice, I'm leaning more towards DoctrineDbalPostgreSqlStore myself, same feature set with flock, but I think redis is fine too

@BentiGorlich
Copy link
Member

and lock feature sets (including keys) basically depends on what stores you used

I guess the feature that is required is "sharing" right? It is not explicitly stated which one it is...
DoctrineDbalPostgreSqlStore sounds good to me too, lets go with that

@melroy89
Copy link
Member

melroy89 commented Mar 24, 2024

You could even consider redis / keydb/dragonfly for this key lock objects. With redis you have both support for expiring (ttl) and sharing.

Also depending on the amount of locks it could be more performant than using postgresql.

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from a6d1d1e to 46fefcf Compare March 24, 2024 16:42
@asdfzdfj
Copy link
Contributor Author

asdfzdfj commented Mar 24, 2024

perhaps it's just like what benti pointed out, that the goal of this patch here is to prevent multiple update actor handler running on same actor at once (and I've seen it happen on my instance), rather than avoid dispatching multiple UpdateActorMessage in quick succession, as I find the former to be more feasible

anyway, giving a shot at moving the lock up to the dispatch level to avoid dispatching multiple UpdateActorMessage in quick succession, the change is here in a separate branch

from the quick test it seems to work, however I'm not merging it into this patch right now as I'm not sure the added complexity is worth it:

  • the lock store MUST be reconfigured otherwise the changes wouldn't work, as the lock keys must be serializable, and the following stores cannot have their lock keys serialized, which includes flock store that's our current default, and also postgresql advisory locks I was aiming too, leaving redis store as the only practical choice for us:
mbin % rg markUnserializable vendor/symfony/lock
vendor/symfony/lock/Store/FlockStore.php
132:        $key->markUnserializable();

vendor/symfony/lock/Store/SemaphoreStore.php
79:        $key->markUnserializable();

vendor/symfony/lock/Store/DoctrineDbalPostgreSqlStore.php
98:                $key->markUnserializable();
133:                $key->markUnserializable();

vendor/symfony/lock/Store/ZookeeperStore.php
72:        $key->markUnserializable();

vendor/symfony/lock/Store/PostgreSqlStore.php
88:                $key->markUnserializable();
124:                $key->markUnserializable();

vendor/symfony/lock/Key.php
58:    public function markUnserializable(): void
  • there's also this odd symfony serializer problem where it failed to deserialize the key object properly, complaining about missing values for Key constructor, so I have to manually (un)serialize the keys in a roundabout way in order to get it working
  • the payload for UpdateActorMessage is relatively light compared to other messages, so while it still strains rabbitmq I personally find this somewhat more acceptable, compared to potential complexity that I have to add to avoid this situation

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 46fefcf to df712cc Compare March 25, 2024 11:52
if ($magazine instanceof Magazine && $magazine->apFetchedAt > (new \DateTime())->modify('-1 hour')) {
return $magazine;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed this intentionally or?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ow wait it's now part of _invoke..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I moved this up to handler level, before calling the update function
something about the putting the guard down there doesn't sit right with me, but couldn't quite put the finger on it

could push it down to updateActor() or restore it back if that's better

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 8ca481a to 8273b6e Compare March 28, 2024 09:42
@asdfzdfj
Copy link
Contributor Author

ok, I've extended it try and avoid dispatching multiple UpdateActorMessage in the first place with the help of lock keys, provided that key serialization is available, if it doesn't then it'll fall back to handler level locking that's in the first patch

this way the existing deployments should continue to work and can 'opt-in' to this locking later by setting LOCK_DSN="${REDIS_DNS}" in the .env, while not adding another hard dependency on redis just for this change (not in the mood to add another hard dependency to the list when this thing already uses/needs too much supporting infrastructure)

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 8273b6e to 81020fc Compare March 29, 2024 10:06
@asdfzdfj
Copy link
Contributor Author

asdfzdfj commented Mar 29, 2024

ok, I just now came up with even more crazier proposition: if the main thing you want is limit dispatching multiple update actor messages for the same actor, then just (rate) limit them:

rather than doing the lock + key serialization rigamarole when dispatching the actor update, I've instead added a rate limiter guard such that it would dispatching (auto) update actor once every 5min, to limit a burst of multiple update messages being dispatched in quick succession, while still allow for retry if the dispatched message earlier failed to update the actor, otherwise the outer fetched time guard should take care of the rest (this is based on personal observation where there seem to be some time locality to it, and the rate period could be reduced to 1min I think)

doing it this way means changing the lock store to one with key serialization support is no longer needed, doesn't need special upgrade instruction, and (imo) the added guard logic is more simple and straightforward

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch 2 times, most recently from 5792f00 to 63fef53 Compare April 4, 2024 08:42
@nobodyatroot nobodyatroot added this to the v1.6.0 milestone Apr 5, 2024
@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 63fef53 to 8705d38 Compare April 5, 2024 05:34
@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 8705d38 to c7eac5b Compare April 11, 2024 15:42
Copy link
Member

@BentiGorlich BentiGorlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise it looks good to me. Did you test it thoroughly? If yes then I trust you that I do not need to test it myself.

Just a summary what we have here:

  • we rate limit the dispatches of UpdateActorMessages per actor url, so each actor url can trigger an UpdateActorMessage every 5 minutes
  • we lock the actual updating part for 60 minutes per actor, so every actor is at most updated once every hour

src/Repository/MagazineRepository.php Show resolved Hide resolved
ap_update_actor:
policy: 'sliding_window'
limit: 1
interval: '5 minutes'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use 60 minutes here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intention here is to limit a burst of multiple update messages being dispatched in quick succession (from my observation, when its time to update it tends to go dispatching like six update actor at once and then basically none after update goes through), while still allow for faster retry if the dispatched message earlier failed to update the actor, e.g. from intermittent connectivity issue to remote instance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the lock only blocks the processing while one is currently running right? If yes I just misunderstood it before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@asdfzdfj
Copy link
Contributor Author

asdfzdfj commented May 3, 2024

Code wise it looks good to me. Did you test it thoroughly? If yes then I trust you that I do not need to test it myself.

I have this on my own instance since the PR was created, and from what I could read from debug log traces it seems to be working as intended

debug log sample for a random actor
[2024-05-03T02:13:32.892057+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-02T19:18:32+00:00"} []
[2024-05-03T02:13:32.892994+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-02T19:18:32+00:00"} []
[2024-05-03T02:13:33.017289+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-02T19:18:32+00:00"} []
[2024-05-03T02:13:33.267509+07:00] app.INFO: updating user https://misskey.io/users/9hxecnw2ne {"name":"https://misskey.io/users/9hxecnw2ne"} []
[2024-05-03T18:12:40.367327+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-03T11:17:40+00:00"} []
[2024-05-03T18:12:40.367923+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-03T11:17:40+00:00"} []
[2024-05-03T18:12:40.492754+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-03T11:17:40+00:00"} []
[2024-05-03T18:12:40.496183+07:00] app.INFO: updating user https://misskey.io/users/9hxecnw2ne {"name":"https://misskey.io/users/9hxecnw2ne"} []
[2024-05-03T19:32:16.696423+07:00] app.INFO: updating user https://misskey.io/users/9hxecnw2ne {"name":"https://misskey.io/users/9hxecnw2ne"} []
[2024-05-03T19:32:17.048946+07:00] app.DEBUG: not dispatching updating actor for https://misskey.io/users/9hxecnw2ne: one has been dispatched recently {"actor":"https://misskey.io/users/9hxecnw2ne","retry":"2024-05-03T12:37:16+00:00"} []

Just a summary what we have here:

  • we rate limit the dispatches of UpdateActorMessages per actor url, so each actor url can trigger an UpdateActorMessage every 5 minutes
  • we lock the actual updating part for 60 minutes per actor, so every actor is at most updated once every hour

that sounds about right

@BentiGorlich
Copy link
Member

we lock the actual updating part for 60 minutes per actor, so every actor is at most updated once every hour

I think this is not really correct, since you release the lock at the end 🤔

@asdfzdfj
Copy link
Contributor Author

asdfzdfj commented May 3, 2024

we lock the actual updating part for 60 minutes per actor, so every actor is at most updated once every hour

I think this is not really correct, since you release the lock at the end 🤔

the lock is only meant to prevent more than one UpdateActorMessage workers updating same actor at the same time (though you could say it's superseded by adding rate limiters to avoid sending multiple UpdateActorMessage in the first place), the apFetchedAt time guard is whats limits the actor auto update frequency

@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from c7eac5b to 5c08f6b Compare May 5, 2024 09:01
while working on incoming activity, the UpdateActorMessage could be
fired more than once per actor, and even with #602 applied there could
still be multiple UpdateActorHandler running at the same time updating
the same actor, before the added guard could kick in

this adds a bit of locking in UpdateActorHandler per actor url to try
and ensure that only one handler is actually updating an actor at a
time, and discards any following update actor message that comes after

added the force parameter to UpdateActorMessage and handler to force
actor update regardless of #602 last updated time (still have to get
past the lock though)

the `mbin:actor:update` command also got some jank in there fixed
as they also used UpdateActorMessage to do its job, and the force
parameter is also used here

also try moving the fetched time guard in #602 up to the handler level
the guard is alright but such guard should resides on the higher level
before calling the update method
if what you want is to limit dispatching UpdateActorMessage then just
(rate) limit them

use symfony rate limiter to throttle auto dispatching the
UpdateActorMessage once per 5 minutes, allowing for redispatch recovery
if the previously dispatched message+handler falied to update the actor

it appears that multiple UpdateActorMessage will get dispatched when the
actor is eligible for updating e.g. last fetched over an hour ago

this should prevent dispatching multiple UpdateActorMessage in quick
succession, in addition to the existing guard to dispatch the update
message if the actor last updated is older than an hour
@asdfzdfj asdfzdfj force-pushed the new/update-actor-message-locking branch from 5c08f6b to b4d1868 Compare May 21, 2024 06:36
Copy link
Member

@nobodyatroot nobodyatroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been testing this for several hours now... considering it's been almost 2 months to get this PR reviewed and approved and it doesn't appear to break anything, I'm approving... whether the rate limiter is enough??? not sure.

@asdfzdfj asdfzdfj merged commit 4d53b84 into main May 22, 2024
7 checks passed
@asdfzdfj asdfzdfj deleted the new/update-actor-message-locking branch May 22, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend related issues and pull requests enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants