-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBFTReliability #401
base: master
Are you sure you want to change the base?
NBFTReliability #401
Conversation
For now, my main blocker is not code (I haven't read much of it yet), but lack of documentation:
The filenames are also a bit awkward, why not group all NBFT things into a directory rather than prefixed files (at least for the ones in src)? For the builder, I would ask why they don't just extend the standard builders, but I guess it's mostly because you might need access to data that the builders keep private. If possible, it would be nice to add a small trait to turn do This last comment makes me think: maybe it would be useful to have a (forever unstable) way to break a builder into its component parts, so that |
Most of the answers (if they exist) are there: https://github.com/eclipse-zenoh/roadmap/blob/main/rfcs/ALL/Non%20Blocking%20Fault%20Tolerant%20Reliability.md
Sure
That's indeed the main reason. |
let _cache = session | ||
.declare_reliability_cache(key_expr) | ||
.history(history) | ||
.queryable_prefix(prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does queryable_prefix
do? In what does it differ from the key_expr
used in the declare_reliability_cache
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described here data published on key k
by a publisher with id <publisher_id>
, will be made available by the cache for queries on <publisher_id>/k
. So by changing this prefix you indicate for what publishers this cache is storing data (see here). Typically the cache attached to a NFTReliablePublisher
will have the publisher id as prefix to only store data from this publisher while a cache deployed in the infrastructure will have a *
prefix to store data from all publishers.
The code was partially inspired from the existing publication cache and names were taken from there. But I agree better names could be found.
println!("Declaring NBFTReliabilityCache on {}", key_expr); | ||
let _cache = session | ||
.declare_reliability_cache(key_expr) | ||
.history(history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If history
is not configured, what's the default? Is it mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default history is 1024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of the reliability cache is really dependent on the use case and should be configured accordingly. This will force the user to think how big or small it has to be. Similar thing applies for the publisher and subscriber. Should we make it mandatory?
println!("Declaring NBFTReliablePublisher on {}", &key_expr); | ||
let publ = session | ||
.declare_reliable_publisher(&key_expr) | ||
.with_cache(cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I was expecting with_cache
to accept a cache
object created by declare_reliability_cache
. Instead, it seems to take a bool
. This is a bit counter-intuitive for the with_X
pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cache associated to a publisher will be preconfigured for it. It will typically have the publisher id as queryable prefix. So taking a cache declared independently does not seem the best way to me. Still a better name for the function could be found...
let publ = session | ||
.declare_reliable_publisher(&key_expr) | ||
.with_cache(cache) | ||
.history(history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If history is not configured, what's the default? Is it mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default history is 1024
println!("Declaring NBFTReliableSubscriber on {}", key_expr); | ||
let subscriber = session | ||
.declare_reliable_subscriber(key_expr) | ||
.history(history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If history is not configured, what's the default? Is it mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is false
.
While the history
function of NBFTReliabilityCache
and NBFTReliablePublisher
builders accept an integer and configures the depth of the reliability cache in number of samples for each key, the history
function of NBFTReliableSubscriber
builder defines if yes or no the NBFTReliableSubscriber
should query for historical data at startup. This is indeed a bit confusing. one of them should be renamed.
.accept_replies(ReplyKeyExpr::Any) | ||
.target(self.query_target) | ||
.timeout(self.query_timeout) | ||
.res_sync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be res_async
since it executed within async fn run
.
.callback({ | ||
move |r: Reply| { | ||
if let Ok(s) = r.sample { | ||
let (ref mut states, wait) = &mut *zlock!(handler.statesref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use a sync
lock here instead of an async
one? In case a lock needs to be used both in async
and sync
context I would use an async
lock and use task::block_on
in the sync
context. In this way we avoid to potentially block the executor in the async
context.
move |r: Reply| { | ||
if let Ok(s) = r.sample { | ||
let (ref mut states, wait) = | ||
&mut *zlock!(handler.statesref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that here the lock is taken and kept until handle_sample
has finished. However, handle_sample
calls a callback while keeping the lock. Shouldn't the lock be release before calling the callback?
@OlivierHecart Since it's been a while ago, is the PR still alive? As a guard this week, just want to make sure no PR is forgotten. 😃 |
@OlivierHecart Please change your pull request's base branch to main (new default branch). And rebase your branch against |
This implementation is going to be rewritten accordingly to requirements in #669. This is definitely going to be started after the release, currently we don't have time to it. Converting the PR to draft |
No description provided.