Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read references from manager queries #2993

Draft
wants to merge 47 commits into
base: develop
Choose a base branch
from

Conversation

dickermoshe
Copy link
Collaborator

@dickermoshe dickermoshe commented May 5, 2024

🚧 WIP 🚧


The current syntax is as follows:

final listings = await db.managers.listing.withReferences().withProduct().get();
for (var l in listings){
  print(l.listingData);
  print(l.references.product); // Statically type to  ProductData? 
  print(l.references.store) // Statically type to void - this is a compile time error https://dart.dev/diagnostics/use_of_void_result
}

Everything is added with joins, so it's all well optimized

Code to try this out is in the manager testing code

@dickermoshe
Copy link
Collaborator Author

The code generator for this will be an abysmal mess until the api design for this is ironed out

@dickermoshe
Copy link
Collaborator Author

After some investigation, it turns you there is no way to read multiple column of a reverse reference in a single query
I'm going to go with this implementation where a second query is done in transaction

@dickermoshe

This comment was marked as resolved.

@simolus3
Copy link
Owner

simolus3 commented May 6, 2024

The API still looks good to me, but I think there should be a way to be more explicit about the impact of joining a reverse reference. Things that suddenly have the potential to make queries much slower should be more obvious than calling withBackreference() somewhere and forgetting about it.

Also, one thing we should consider is the size of generated code (drift is already generating a fair bit of code, and increasing this further has costs on things like analyzer performance). Users are also typically not too happy about generated code they don't need. Code generators are also harder to understand than library code, so the maintenance cost for us is higher. So we should check whether there are ways to simplify this, such as:

  1. By not generating the join-creating logic for each container reader. Instead, I think the main table manager could hold a structure describing all columns involved in references. The runtime would interpret that structure to add the necessary joins. I think the selectable getter could refer to a helper method receiving a list of references to resolve instead of duplicating most of that logic.
  2. This is heavier and impacts usability quite a bit, but perhaps it's still worth it: Most references are probably never joined, so maybe we could approach this from the other and and let users specify which joins they need and then generate a method for those only?

For 2., a possible API I didn't think through at all could perhaps roughly look like this:

class TodoItems extends Table {
  // ... column definitions

  @JoinQuery()
 SomeQueryType<(TodoItem, Category)> withCategory;
}

And then we'd only generate a manager for the @JoinQuery annotations added on a table? I think that could avoid some of the complexity we're paying for a general solution.

These are just early ideas though, there are probably other approaches to reduce some of the complexity in the generator.

@dickermoshe
Copy link
Collaborator Author

I've spent a considerable amount of time on this, but I'll have to agree, in practice this is handing a loaded gun to a child.
Reverse references should be returned as a new Selectable that can be watched or awaited.

final category =await  category.filter((f)=> f.id(5)).withReferences().getSingle()
todos = await category.references.todos.get()

The question is how regular joins should be handled.

Option 1

final category =await  category.filter((f)=> f.id(5)).withReferences().getSingle()
final user = category.references.user; // Returns actual User, even if not needed
final todos  = category.references.todos // A class to Stream/Await the Todos

I'm tempted to have withReferences return all by default. It's consistent to reverse references, which are included by default (albeit, as a Selectable, not an actual Dataclass). In addition, withReferences().withListings() does looks kinda redundant.

The drawback is that they are forced to all or none, which lowers performance, especially with watch, where changes to any referenced table will trigger a new event.

Option 2

final category =await  category.filter((f)=> f.id(5)).with().users().getSingle()
final user = category.references.user; // Returns actual User, but only if requested, inconsistent with `todos` which always returns
final todos  = category.references.todos // A class to Stream/Await the Todos

Only include the requested dataclasses, this is better in every other way, except with doesnt seem like it's descriptive enough $ todos and user act differently

  1. todos is always included
  2. todos is a selectable

Thoughts?

@simolus3
Copy link
Owner

simolus3 commented May 7, 2024

I've spent a considerable amount of time on this, but I'll have to agree, in practice this is handing a loaded gun to a child.

Yeah, unfortunately this looks like a hard problem to get right, and since nobody likes to rewrite all their queries we can't really change the API substantially once we have released something.

Reverse references should be returned as a new Selectable that can be watched or awaited.

I agree with that API 👍 A downside would be that we then can't automatically run the 1+n statements in a transaction. But I'm fine with that, users will know that they're separate queries because they have to use a second await and so it's expected that the two queries aren't atomic unless manually wrapped in a transaction.

Regarding option one: How terrible do you think it is from a UX perspective if we just make references nullable by default and then have boolean flags to control what to include? E.g. withReferences(user: true)? And then user would be non-null for that query and developers would still have to call category.references.user!. It's not great, but I think we may have to make some references nullable either way since foreign keys aren't enforced by default (or it could be an optional foreign key). So maybe users will just have to know when it's safe to do a null assertion.
Loading all foreign keys by default sounds bad if we have a larger chain of references between tables (e.g. todo -> category -> owning user -> ...).

Only include the requested dataclasses, this is better in every other way, except with doesnt seem like it's descriptive enough $ todos and user act differently

I'm fine with back- and forward-references acting differently.

And maybe that's something that's not ever going to come up, but I still want to ask whether we can make this possible. What if we have unbounded chains of references? E.g. something contrived like

class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  IntColumn get supervisor => integer().references(Employees, #id)();
}

In SQL, I could write a query to find the supervisor two layers up in the hierarchy. I think it would be cool if we could have an API where that can be expressed, but it will likely look a lot different than the things we (and probably most other ORMs) have discussed.
This would be possible if we had a way to generate code for only the joins that were requested, e.g.

abstract class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  IntColumn get supervisor => integer().references(Employees, #id)();

  JoinQuery<({self: Employee, another: Employee})> get withGrandSupervisor => buildJoin(['self', 'another: self.supervisor.supervisor']);
}

I don't want to derail the discussion with these examples (and I'll shut up about them if it sounds like no one will ever need this), but I think that's an opportunity to not become a limitation (every join possible in SQL is possible here), not generate superfluous code (joins have to be requested), and be precise about which tables are included in the result.
A downside is that this is harder to use of course, but perhaps we can ease this somewhat by adding syntax sugar on references? E.g.

abstract class Employees extends Table {
  IntColumn get id => integer().autoIncrement()();
  @GenerateJoin
  IntColumn get supervisor => integer().references(Employees, #id)();
}

@dickermoshe
Copy link
Collaborator Author

I'm still thinking about how to proceed on this...

@simolus3
Copy link
Owner

It's not an easy API to design for sure. My "list all joins beforehand" idea also suffers from poor discoverability. I'm happy to help if as well if you have some rough ideas for an API and how the implementation could look like of course.

@dickermoshe
Copy link
Collaborator Author

Let me walk the logic:

Requirements
Getting references should be supported without any additional code being written.
Reading any other information out of the query (Aggregates) are out of scope at this time, but we should try to do this in a way that will work with aggregates.
Everything should be statically typed.
Should be well optimized & powerful while still ensuring that developers don't create any performance issues

We should only prefetch what a user explicitly asks for. The withReferences().withUser() works well for this.
We should still provide a async way to get the item even if it isn't explicitly requested.

todos.withReferences().withUser().getSingle().references.user // Returns User
todos.withReferences().getSingle().references.user // Returns Future<User> OR Selectable<User>

The above should work with back references too.

The only issue that we are faced with is streaming.
Whether we support it, or we don't, we are faced with the same issue. Performance.

If we support the above for streaming, then it is possible that the user can create a stream that is tied to every table in the database.

todos.withReferences().withUser().stream() // Re-trigger on every write to `todos` or `users`

On the other hand, if we don't add this feature, users may do the following

final results = todos.stream.asyncMap((e) async => (todos: e, user: await  users.filter((f)=>f.id(e.user)).getSingleOrNull()))
results.first.todo // The 1st todo
results.first.user // & it's first user

This means that there is a stream for every object being returned, this is really bad.
Also, there is no simple way for him to stream the user, unless he starts going into the weeds with RxDart with combineStream.

In a perfect world the user should only be streaming what he needs, in other words, pagination.
Pagination is the solution for all of this, allow the user to read entire tables with references, but only stream in batches.

What such an API could look like, I'm still thinking about...

@simolus3
Copy link
Owner

Pagination would definitely be awesome to have, but I think it doesn't really solve the stream invalidation problem (every write to either table would still invalidate every page in every paginated stream, right?). We'd have to parse fewer rows into result classes, but ultimately not run fewer queries than before.

That's not a problem directly related to a join API in my opinion - manual joins also suffer from the same problem. So a pagination API should be orthogonal to a join API and work with or without it (users may also want to paginate simple reads without references to a single table).

0biWanKenobi and others added 27 commits May 19, 2024 12:47
In `data_class_writer.dart` we expect a mixin named `${table.entityInfoName}ToColumns`, but then we create one named `${table.baseDartName}ToColumns` in `table_writer.dart.
define `toColumnsMixin` String in `DriftElementWithResultSet`
and reuse it in `DataClassWriter` and `TableWriter` to avoid
future mismatches in mixin name.
Fix: toColumns mixin generated with wrong name
Mention minimum version in docs and use it in CI tests
The companion class makes an extremely useful way to bundle up a set
of changes to make to a record.  This copyWithCompanion method makes
it possible to apply those changes directly to a record as represented
in Dart, as an instance of the data class.

For example, it can be used by a wrapper that keeps a write-through
cache in Dart for a given database table.  Here's slightly simplified
from a real example we have in Zulip:

    /// Update an account in the store, returning the new version.
    ///
    /// The account with the given account ID will be updated.
    /// It must already exist in the store.
    Future<Account> updateAccount(int accountId, AccountsCompanion data) async {
      assert(!data.id.present);
      await (_db.update(_db.accounts)
        ..where((a) => a.id.equals(accountId))
      ).write(data);
      final result = _accounts.update(accountId,
        (value) => value.copyWithCompanion(data));
      notifyListeners();
      return result;
    }

It's possible to write such a method by hand, of course (in an
extension), or to call copyWith directly.  But it needs a line for
each column of the table, which makes either of those error-prone:
in particular, because copyWith naturally has its parameters all
optional, it would be very easy for someone adding a new column to
overlook the need to add a line to this method, and then updates to
the new column would just silently get dropped.  So this is a case
where there's a large benefit to generating the code.
@dickermoshe
Copy link
Collaborator Author

dickermoshe commented May 28, 2024

@simolus3
What do you think of following?

Let's have all references be accessible by FutureOr.
Using groups.withReferences(users: true) will prefetch the users and place it in a cache.

final group1 = await groups.withReferences(users: true).getSingle(); // Get first group, loads users into cache
final users = await group1.users; // Get's users from the cache, doesn't make another query

In previous examples, withReferences().withUsers() would do some typing magic to make group1.users synchronous,
However, this increased complexity substantially.

There isn't really a practical benefit from a performance perspective between awaiting FutureOr<User> and User.
The only issue is that Flutter has to wait a single frame between loading User.
This could be a workaround for that https://stackoverflow.com/questions/73024738/can-i-use-a-futurebuilder-with-a-value-of-the-type-futureort

@simolus3
Copy link
Owner

Since you have to await the query either way I'm not sure it makes much of a difference in the end, most methods using the API will be async either way. So the FutureOr trick looks fine to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants