Optimize cursor movement and selection manipulation #45

as-cii · 2018-03-15T11:11:03Z

This pull request contains a few changes that will optimize selection manipulation and cursor movement. I'd recommend reviewing it on a per-commit basis:

f1361e8 fixes a logic error that was using Anchor::offset as an offset into the fragment
cc7ce29 caches anchor positions until the buffer is edited
a829f24 wraps FragmentId's vector of identifiers into an Rc
29b886e uses statically-allocated values for FragmentId::{min, max}_value
2d6c1b7 caches the mapping between offsets and points until the buffer is edited

I have tested the above optimizations following these steps:

Edit the buffer 10000 times
Add 1000 selections
Start profiling
Move right
Stop profiling

Before	After

Note how the time spent in TextEditor (i.e., the call to Editor::move_right) has been reduced by a 35x factor. Performance is similar when moving cursors left, up or down.

Next Steps

We think this is a good first improvement that will allow xray to stay well under the frame budget, even when the user is manipulating thousands of selections at the same time. We are currently focusing on other areas of the system and, therefore, we are thinking to put further improvements to this on hold until more architectural concerns have been sorted out.

In the future we may revisit some areas of this code to optimize it even further, for example:

Accesses to the B+ tree can have more cache-locality. Right now, accessing nodes and fragment ids always requires to follow a pointer wrapped by an Arc. This reduces the chances of having the data we need ready to consume in the cache during a tree traversal. As such, it might be interesting to explore using different representations for Node and FragmentId so that they are more cache-friendly.
Reuse the same cursor across different tree traversals. Cursors use a Vec to represent a stack of seen nodes during a traversal. While a single vector allocation is innocuous, doing one of them for each selection means we will repeatedly allocate a value on the heap and free it shortly afterward. By using the same (pool of) cursor(s) over and over again, we will minimize memory allocations, which we've learned to be very expensive on hot code paths like this. One implementation thought for this could be to instantiate a cursor during the initialization of a buffer, wrapping it in a Arc. Then, we could hand out clones of that Arc and, in order to use the cursor, callers will have to invoke make_mut on it. If the cursor is used by someone else, it will just be copied and only then we will pay for a heap allocation.
Speed up FragmentId comparisons. When resolving the position of an anchor we may end up comparing many fragment identifiers. Right now the size of a fragment id isn't known at compile time, so the number of comparison operations is not constant. We can partially solve this by using a more cache-friendly representation like suggested above, but constant factors may still be relatively high. As such, another idea would be to explore using SIMD instructions to compare two arbitrary fragment ids. This StackOverflow thread gives some insights on how we could do that.
Overall, minimize heap accesses and use the stack as much as possible.

/cc: @nathansobo

This fixes a logic error on master that uses `Anchor::offset` as an offset into the fragment. This is wrong because fragments could be subject to further splits, thus making the relative offset potentially invalid.

This avoids unnecessary lookups in the buffer tree when the anchor's position has already been retrieved since the last edit. We are representing the cache as a RefCell to avoid the need to acquire a mutable reference to the buffer even if we just need to query it. If the cache is already borrowed for some reason, we will simply skip writing to it.

This will allow to efficiently clone the FragmentId (i.e., without heap allocations) for comparison purposes when it's used during a tree traversal.

`lazy_static!` requires variables to be thread-safe (for good reasons), so we have also replaced `Rc` with an `Arc`. This doesn't seem to affect performance in a significant way.

nathansobo · 2018-03-16T12:20:55Z

When visiting a given node in the tree, we currently have to follow all the child pointers in order to fetch their spatial summaries. We should store a copy of the spatial summary of each child in the node that points to those children, so that we don't need to follow the pointer to a child unless we know it contains our target leaf. Combined with your other suggestions, this seems like it will greatly increase the cache friendliness of tree searches. Once we do this, we should experiment with different maximum node sizes to see what works the best, although it probably depends on the size of the summary nodes for any given tree.

It's also worth noting that #24 is relevant to some of what you have described here.

When moving cursors, we heavily rely on `Buffer::len_for_row` to understand when such cursors should wrap. This method is implemented by traversing the tree twice: the first traversal gets the offset of the beginning of the requested line, whereas the second traversal gets the offset of the line next to the requested one. Then, the two offsets are subtracted to obtain the length of the line. With this commit we start caching the mapping between offsets and points. This allows to skip the two traversals and retrieve the length of a line in constant time if the line length had been computed before and no edit has occurred since then.

Antonio Scandurra added 4 commits March 15, 2018 11:22

Use offset in insertion when creating and resolving anchors

f1361e8

This fixes a logic error on master that uses `Anchor::offset` as an offset into the fragment. This is wrong because fragments could be subject to further splits, thus making the relative offset potentially invalid.

Wrap FragmentId's vector in a Rc

a829f24

This will allow to efficiently clone the FragmentId (i.e., without heap allocations) for comparison purposes when it's used during a tree traversal.

Use statically-allocated values for FragmentId min_value and max_value

29b886e

`lazy_static!` requires variables to be thread-safe (for good reasons), so we have also replaced `Rc` with an `Arc`. This doesn't seem to affect performance in a significant way.

nathansobo merged commit dfd692b into master Mar 16, 2018

nathansobo deleted the optimize-selections branch March 16, 2018 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cursor movement and selection manipulation #45

Optimize cursor movement and selection manipulation #45

as-cii commented Mar 15, 2018 •

edited

nathansobo commented Mar 16, 2018

Optimize cursor movement and selection manipulation #45

Optimize cursor movement and selection manipulation #45

Conversation

as-cii commented Mar 15, 2018 • edited

Next Steps

nathansobo commented Mar 16, 2018

as-cii commented Mar 15, 2018 •

edited