Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #27452 -- Added serial fields to contrib.postgres. #18123

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

csirmazbendeguz
Copy link

@csirmazbendeguz csirmazbendeguz commented May 3, 2024

Trac ticket number

ticket-27452

Branch description

Added SmallSerialField, SerialField, BigSerialField to contrib.postgres.

Previous PR

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@ngnpope ngnpope self-requested a review May 3, 2024 06:37
@csirmazbendeguz csirmazbendeguz force-pushed the ticket_27452 branch 4 times, most recently from e892da4 to 97cbc23 Compare May 3, 2024 08:31
@csirmazbendeguz csirmazbendeguz marked this pull request as draft May 3, 2024 10:21
@csirmazbendeguz csirmazbendeguz force-pushed the ticket_27452 branch 12 times, most recently from fccac26 to 2b48193 Compare May 4, 2024 13:05
Copy link
Contributor

@LilyFoote LilyFoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start!

In addition to the specific comments I added, we'll need some migrations tests to cover adding/removing serial fields and converting between serial and auto fields.

django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/operations.py Show resolved Hide resolved
django/db/backends/postgresql/schema.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/schema.py Show resolved Hide resolved
Comment on lines 5742 to 5812
@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_serial_to_integer(self):
from django.contrib.postgres.fields import SerialField

self.serial_to_integer_test(SerialField, IntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_serial_to_small_integer(self):
from django.contrib.postgres.fields import SerialField

self.serial_to_integer_test(SerialField, SmallIntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_serial_to_big_integer(self):
from django.contrib.postgres.fields import SerialField

self.serial_to_integer_test(SerialField, BigIntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_small_serial_to_integer(self):
from django.contrib.postgres.fields import SmallSerialField

self.serial_to_integer_test(SmallSerialField, IntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_small_serial_to_small_integer(self):
from django.contrib.postgres.fields import SmallSerialField

self.serial_to_integer_test(SmallSerialField, SmallIntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_small_serial_to_big_integer(self):
from django.contrib.postgres.fields import SmallSerialField

self.serial_to_integer_test(SmallSerialField, BigIntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_big_serial_to_big_integer(self):
from django.contrib.postgres.fields import BigSerialField

self.serial_to_integer_test(BigSerialField, BigIntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_big_serial_to_integer(self):
from django.contrib.postgres.fields import BigSerialField

self.serial_to_integer_test(BigSerialField, IntegerField)

@unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
@isolate_apps("schema")
@tag("serial")
def test_big_serial_to_small_integer(self):
from django.contrib.postgres.fields import BigSerialField

self.serial_to_integer_test(BigSerialField, SmallIntegerField)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a good use case for subTest to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the issue is we would either need to generate the test models with a dynamic name, or clean up the test models after the subtests - IMO it's more trouble than worth it.

tests/schema/tests.py Outdated Show resolved Hide resolved
@csirmazbendeguz csirmazbendeguz force-pushed the ticket_27452 branch 2 times, most recently from e7fdd21 to 8052e52 Compare May 5, 2024 10:37
@csirmazbendeguz
Copy link
Author

This looks like a great start!

In addition to the specific comments I added, we'll need some migrations tests to cover adding/removing serial fields and converting between serial and auto fields.

@LilyFoote , thanks for the review! Could you point me to some tests I could copy to get started? I'm not sure how to test the migrations other than with the tests/schema tests.

@LilyFoote
Copy link
Contributor

For database defaults I added tests in tests/migrations/test_autodetector.py and tests/migrations/test_operations.py.

@csirmazbendeguz csirmazbendeguz force-pushed the ticket_27452 branch 5 times, most recently from 569c960 to 0ebc30b Compare May 6, 2024 06:40
Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @csirmazbendeguz
I think there are a few tests/asserts we need to add 👍

django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/contrib/postgres/fields/serial.py Outdated Show resolved Hide resolved
django/core/management/commands/inspectdb.py Show resolved Hide resolved
django/db/backends/postgresql/operations.py Show resolved Hide resolved
@csirmazbendeguz
Copy link
Author

Thanks @sarahboyce , fair enough, I made some adjustments.

Copy link
Contributor

@InvalidInterrupt InvalidInterrupt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really read many of the tests yet; not much has caught my eye that isn't already being discussed.

Comment on lines 41 to 50
def _check_default(self):
if self.default is NOT_PROVIDED:
return []
return [
checks.Error(
f"{self.__class__.__name__} does not accept default values.",
obj=self,
id="fields.E014",
),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I would expect most users to operate this way; it feels overly prescriptive to me.

Suppose for some reason I only wanted a subset of objects to be part of the sequence, and the rest to default to 0, I might want to set default on the field and pass DatabaseDefault() only when I want to use the sequence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @InvalidInterrupt !

I don't think this is an issue, it's possible to create an object without using the sequence:

class Dummy(Model):
    serial = SerialField()

Dummy.objects.create(serial=0).serial  # 0
Dummy.objects.create(serial=0).serial  # 0
Dummy.objects.create(serial=0).serial  # 0
Dummy.objects.create().serial  # 1
Dummy.objects.create().serial  # 2
Dummy.objects.create().serial  # 3
Dummy.objects.create(serial=1).serial  # 1
Dummy.objects.create().serial  # 4

Is this what you're suggesting?

class Dummy(Model):
    serial = SerialField(default=0)

Dummy.objects.create().serial  # 0
Dummy.objects.create().serial  # 0
Dummy.objects.create().serial  # 0
Dummy.objects.create(serial=DatabaseDefault()).serial  # 1
Dummy.objects.create(serial=DatabaseDefault()).serial  # 2
Dummy.objects.create(serial=DatabaseDefault()).serial  # 3
Dummy.objects.create(serial=1).serial  # 1
Dummy.objects.create(serial=DatabaseDefault()).serial  # 4

I think the first one is better because it's consistent with how AutoField works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing on a little project, if we add a SerialField to an existing model I cannot make it nullable or assign a default and so I have to give a temporary default in order to create a migration and add the field.

However, whatever I give here is ignored and the field is populated everywhere with as a serial starting from 1
I feel like it shouldn't prompt us in that case (forcing my input and then ignoring it to me is a bug).
Also considering this, a default option might be nice 🤔 (cc @LilyFoote)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 🤔 Thanks for catching that @sarahboyce ! I tested it too but I didn't notice the the input was ignored. 😱 I'll investigate.

Yes, default makes sense in the context of migrations.

Copy link
Contributor

@LilyFoote LilyFoote May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I think the minimal change feature-wise would be to adapt the migration generation code to know that SerialField can be created without an explicit default.

For a user-chosen default my question would be "why a SerialField and not an IntegerField?".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining @InvalidInterrupt !

I think the only place this could be useful is in migrations. So when adding a serial field, it could be pre-filled with a default number.

I would find this API confusing otherwise, users may confuse it with the start parameter for example.
I don't think it's too restrictive since users can still opt out of using the sequence by providing an explicit value.

That said, it's always easier to remove a restriction than to add one.

In my opinion, the use-case of migrations is an edge-case and it can be solved by first creating an IntegerField then changing it to a SerialField, so it wouldn't justify enabling this.

If someone can provide a good use-case this restriction could be removed in the future, but for now I have to disagree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My description of the features potential use may have been confusing; but using default in this manner would be just like (every?) other field type, so I'm sceptical it would be confused for something like start outside of the context of this thread. In regards to API design, as it stands SerialField will be the exception. Eliminating that exception would probably also fix the migrations without adding a kludge to the migration code for this.

I've just finished mocking up the removal of this restriction to confirm that it works as expected and doesn't break anything else. I'm going to finish cleaning up the code and then push it to my fork for reference as a PoC (if anyone is curious: https://github.com/InvalidInterrupt/django/tree/ticket_27452).

That being said, I don't think this is a big issue; and it shouldn't hold up this PR. If I'm not persuading anyone that we should remove this guardrail and trust the Django users, then we should just move along for now (Sarah's migration issue notwithstanding).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, even AutoField allows you to set a default, even though that means that you'll only be able to save a single instance of the model without specifying an explicit value for the PK

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and AutoField allows a default too, it will cause integrity errors of course but it allows it. So yes you are raising a valid point, this would be an exception, thank you for pointing that out. It may be worth enabling for consistency's sake. @LilyFoote , @sarahboyce what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's a problem with migrating a SerialField with a default value. It's impossible to do this in Postgres, since serial already implies the field has a default sequence. As far as I can tell, there's no trivial way to add a serial field to a table with an initial default value other than the sequence.

So I think the simplest option would be to disable default after all. Or to not use the built-in serial type when creating a field.

docs/ref/contrib/postgres/fields.txt Outdated Show resolved Hide resolved
@csirmazbendeguz
Copy link
Author

Thanks for the review @InvalidInterrupt , it's much appreciated! 🙏

@@ -75,6 +76,7 @@ def handle_inspection(self, options):
"field names."
)
yield "from %s import models" % self.db_module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csirmazbendeguz these are minor clean up changes that will happen on the PR either by you or by a merger

  • we want to limit the changed lines to keep the diff really tight to what's important, occasionally you have added a new line or removed a line, I will probably revert some of this (you can too if you like). This is considered formatting changes
  • you have added @tag("serial") for your tests, this is not how we organise tests generally and needs to be removed
  • the commits need to be squashed. This will likely be 1 commit (or if we can think of a nice way of splitting them we will)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for informational purposes 👍

I forgot to say that I have tested this with a local project and tried out a number of things to try and see if I can find any issues (had a look in the admin, reversed migrations, used them in GeneratedFields) and this comment is the only thing I'm concerned about. It looks really good ⭐

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented May 29, 2024

I realized instead of adding new system checks, I could just raise a ValueError in __init__. This is how GeneratedField does it. WDYT?

I think it's better, because:

  1. it's consistent with how it's done elsewhere
  2. it's stricter, the code will fail immediately, doesn't need to wait for checks to run
  3. it doesn't require adding new checks

tests/inspectdb/tests.py Outdated Show resolved Hide resolved
django/db/backends/postgresql/schema.py Outdated Show resolved Hide resolved
django/db/migrations/autodetector.py Outdated Show resolved Hide resolved
@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented May 29, 2024

So I would like to summarize what I figured out today for both myself and others to see.

We have two options: either enable serial fields to define a default value, or not. As @InvalidInterrupt pointed out, it would be preferrable to enable this for consistency's sake.

If default is not allowed, then:

  1. Adding a serial field to a table shouldn't trigger a prompt for a default value. The new field will be populated by the sequence.
  2. Changing a nullable integer field to a serial field will trigger an error if the table contains any null values. I am looking into if we could also populate the null values with the sequence in this case.

If default is allowed, then:

  1. Adding a serial field to a table should trigger a prompt for a default value. The new field should be populated by the default value. QUESTION: what if I want to populate the field with a sequence by default? I think if we need to pick one, it's more sensible to populate the serial field with a sequence instead of a single default value.
  2. We can't use Postgres's built-in serial type when creating a serial field. This is because it's impossible to create a serial field with a default value, since serial fields already have a sequence by default. We could create a regular integer field instead and attach a sequence to it in multiple steps.

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented May 29, 2024

some notes so I don't forget:
when changing a nullable integer to a serial field, the issue is with the order the queries are executed (fragment vs other_actions).
it would be nice if _alter_column_type_sql could return a list of actions instead of a single action (fragment). this is because setting up a serial field needs multiple queries.

EDIT:
it's not necessary, queries can be executed directly in _alter_column_type_sql instead.

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented May 30, 2024

So I decided to stop fighting against what's already in place and try to conform to the existing code. I managed to achieve the following:

  1. no default is allowed, it's always set to DatabaseDefault
  2. added the SerialFieldSequence class for serial fields db_default
  3. adding a serial field to an existing table doesn't trigger a prompt
  4. adding a serial field to an existing table will populate it with a sequence starting from 1
  5. changing a nullable integer to a serial field will populate the null values with a sequence starting from max(field)

def test_serial_to_integer(self):
from django.contrib.postgres.fields import SerialField

self.serial_to_integer_test(SerialField, IntegerField)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wish we tested all these in something like postgres_tests/test_schema but given this is a pre-existing pattern here import testing against django.contrib.postgres this doesn't have to be done in this PR.

Comment on lines 12 to 14
table = self.field.model._meta.db_table
column = self.field.column
return f"nextval(pg_get_serial_sequence('{table}', '{column}'))", []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an SQL injection vector and will crash if any table or column name includes a quote.

Suggested change
table = self.field.model._meta.db_table
column = self.field.column
return f"nextval(pg_get_serial_sequence('{table}', '{column}'))", []
table = self.field.model._meta.db_table
column = self.field.column
return "nextval(pg_get_serial_sequence(%s, %s))", [table, column]

I would personally use Func instead to avoid this error prone boilerplate

class NextSerialSequence(Func):
    template = "nextval(pg_get_serial_sequence(%(expressions)s))"

And override SerialFieldMixin._db_default_expression to return

@cached_property
def _db_default_expression(self):
    return NextSerialSequence(
        Value(self.model._meta.db_table),
        Value(self.column),
        output_field=self,
    )

Which should also avoid the db_default injection in __init__ and removal in deconstruct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charettes , fair enough. We should really set db_default to something though, overriding _db_default_expression is not enough because this variable is checked during migrations.

Comment on lines +43 to +47
def get_prep_value(self, value):
value = super().get_prep_value(value)
if value is None:
value = self.get_default()
return value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is required given we set default = DatabaseDefault()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for supporting None arguments:

SerialModel.objects.create(serial=None)

Since it's possible to provide None to auto fields, it should also be possible to provide it for serial fields.

If that's not necessary, we can remove it.

Comment on lines +16 to +25
sql_create_sequence = (
"CREATE SEQUENCE IF NOT EXISTS %(sequence)s "
"AS %(data_type)s "
"OWNED BY %(table)s.%(column)s"
)
sql_reset_sequence = (
"SELECT setval(pg_get_serial_sequence(%s, %s), "
"coalesce(max(%s), 1), max(%s) IS NOT NULL) FROM %s;"
)
sql_rename_sequence = "ALTER SEQUENCE %s RENAME TO %s;"
Copy link
Member

@charettes charettes Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it isn't there something fundamentally wrong here about not using the native serial types?

Why are we going the extra length of manually emulating through schema definition and explicitly using nextval(pg_get_serial_sequence(...)) on INSERT when simply using serial and friends does it by itself?

I could be missing something here but it seems that most of the complexity of this patch stems from trying to emulate what Postgres does very well already by itself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charettes , it's because Postgres doesn't support it very well. serial is not a true type, and it's not possible to alter a field's type to serial. we have to emulate it because that's the only way it can work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on how Postgres doesn't support serial very well? It has been used to power AutoField for over a decade before we switched to IDENTITY so I'm surprised to learn that it poses problems.

From my understanding the main reason for the manual sequence management and db_default assignment is to support altering back and forth between integer, auto, and serial fields. Is that really going to be a common enough use case that it warrants all the complexity it requires?

I think it's a question worth asking because there are no mentions of supporting this use case in the accepted ticket-27452 and I personally don't think that it will be common enough operation to warrant that much extra complexity to the schema editor.

Don't get me wrong, it is impressive that you got it working and I'm sure you learned a ton doing so. I'm just doubtful that the code complexity tradeoff is worth it (1.4k additional line) considering that this could be simply implemented using mixins including db_returning = True and using get_db_type = "serial" at first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use serial when creating a table or when adding a field, but you can't use it for anything else, e.g. you can't alter a column's type to serial and you can't cast to serial. But do let me know if I'm wrong, it didn't work when I tested it.

I agree it adds some complexity, but the majority of that 1.4k lines is just tests, the schema editor received 160 additions and 44 removals only. I assumed the functionality to transition between integer fields <-> serial fields <-> auto fields would be a requirement because of the similarities between these fields. If all we wanted to support was creating serial fields, I'm not sure this would need any changes to core, it could be a package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants