What is this?

This repository contains some editing histories from real world character-by-character editing traces. The goal of this repository is to provide some standard benchmarks that we can use to compare the performance of rope libraries and various OT / CRDT implementations.

Where is the data?

This repository stores 2 kinds of data, in 2 subdirectories:

Sequential Traces

The sequential_traces folder contains a set of simple editing traces where all the edits can be applied in sequence to produce a final text document.

Most of these data sets come from individual users typing into text documents. Each editing event (keystroke) has been recorded so they can be replayed later.

Some of these traces are generated by linearizing ("flattening") the concurrent traces (below). Regardless, the data format is the same.

These traces are super simple to replay - just apply each change, one by one, into an empty document and you'll get the expected output.

See sequential_traces/README.md for detail on the data format used and other notes.

These traces are useful for benchmarking how CRDTs behave when there is only a single user making changes to a text document. Or benchmarking rope libraries.

These data sets describe their editing positions using unicode character offsets. If you don't want to think about unicode offsets while benchmarking, use the ascii_only variants of these traces. In the ascii variants, all non-ascii inserts have been replaced with the underscore character.

Concurrent Traces

The concurrent_traces folder contains editing traces where multiple users typed into a shared text document concurrently. (Concurrently means, they were typing at the same time).

These traces are much harder to replay, because each editing position listed in the file is relative to the version of the document on that user's computer when they were typing. This complexity is, unfortunately, necessary to replay a collaborative editing session between multiple users. - Which is what we need when benchmarking text based CRDTs.

See concurrent_traces/README.md for detail on the data format used and notes.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
concurrent_traces		concurrent_traces
rust		rust
sequential_traces		sequential_traces
.gitignore		.gitignore
README.md		README.md
check.js		check.js
stats.js		stats.js
strip_non_ascii.js		strip_non_ascii.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrent_traces

concurrent_traces

rust

rust

sequential_traces

sequential_traces

.gitignore

.gitignore

README.md

README.md

check.js

check.js

stats.js

stats.js

strip_non_ascii.js

strip_non_ascii.js

Repository files navigation

What is this?

Where is the data?

Sequential Traces

Concurrent Traces

About

Contributors 2

Languages

josephg/editing-traces

Folders and files

Latest commit

History

Repository files navigation

What is this?

Where is the data?

About

Topics

Resources

Stars

Watchers

Forks

Languages