Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of value semantics? #387

Open
AprilArcus opened this issue Dec 24, 2023 · 57 comments
Open

Status of value semantics? #387

AprilArcus opened this issue Dec 24, 2023 · 57 comments

Comments

@AprilArcus
Copy link

According to @littledan on JS Party Episode 305, value semantics for records and tuples are being reconsidered by the committee. Could you expand a little more on the concerns here?

@acutmore
Copy link
Collaborator

acutmore commented Jan 8, 2024

Hi @AprilArcus, I'm hoping something more complete can be written up on the status of this soon.

In lieu of that, a short summary is that we have received feedback from multiple major JS engine implementation teams that their investigations around === equality for R&T suggests this will be more complicated than we initially hoped and even if implemented there is doubt that the performance of the operation would match developer expectations. Additionally due to the way most JS engines are architected it would add an extra load instruction to existing usages of ===, so existing apps that haven't adopted R&T could still be negatively impacted.

This was not the only feedback but is maybe the most significant in terms of the proposal now looking at other potential designs for how the equality could work.

I'll be sure to post an update here when there is a link to more information.

@acusti
Copy link
Contributor

acusti commented Jan 11, 2024

@acutmore thanks for the update! does this preclude different R&T instances with equivalent values being SameValue equals? my immediate expectation would be that if strict equality isn’t possible, that they could be compared using Object.is.

@demurgos
Copy link

What about using these types as keys in Map and Set? Using a tuple such as [x, y] as key currently is a bit awkward since you have to convert it to a primitive first. Being able to represent composite key more easily was a great benefit of this proposal.

@acutmore
Copy link
Collaborator

does this preclude different R&T instances with equivalent values being SameValue equals? my immediate expectation would be that if strict equality isn’t possible, that they could be compared using Object.is.

@acusti mostly like it does preclude that. SameValue and Object.is are very exact comparisons. Ignoring NaN canonicalisation, when two values are considered "same" then there is no way to tell them apart. They are literally the same value from the language's perspective.

That said there could be a new API Something.equals that would be a new form of equality that includes checking R&T.

@demurgos, similar to above. It's likely that by default R&T would behave the same as any other object in Map and Set, but there can be either new collection types or a flag that is passed to the constructor which provides a type of Map or Set which would allow R&T to be used as a more descriptive key.

@demurgos
Copy link

Thank you for your reply. It feels like R&T will end up being syntax sugar for Object.freeze (and no prototype). If so, it will still be useful but it's disappointing that perf concerns caused such a scope reduction ☹️

@mhofman
Copy link
Member

mhofman commented Jan 17, 2024

@demurgos the R&T champions and other delegates are still very much interested in supporting the use cases that R&T would solve. For example, composite keys as you mention is definitely something I would like to be better supported natively as it's extremely painful to build in userland. However for various compatibility and performance reasons it likely will require some kind of explicit opt-in.

@AprilArcus
Copy link
Author

If === is too performance-sensitive to be extended to cover a special case for Records & Tuples, could a useful set of semantics be reclaimed from the already slower and more complicated == operator? Or could a novel operator be introduced such as =#=?

@acutmore
Copy link
Collaborator

Hi @AprilArcus, are there particular situations where you would find having an operator useful over having to use a function call?

Adding a new overload to == is, I feel, also unlikely. While it does have slow paths, comparing two objects is not one of them. There is also a general theme in the ecosystem to prefer === over == (though == null is sometimes seen as an OK exception), so adding new functionality to it might muddy the water in terms of advice on which to use.

Adding a new operator, like any new syntax, tends to need a stronger justification compared to adding new API. This is likely because the remaining design space is somewhat limited (only so many non-letter ascii characters to choose from), and they can be harder to look up their documentation (IDEs currently tend to expose docs for APIs but not syntax). A new operator could also be researched as a separate follow on proposal.

@gregmartyn
Copy link

If the problem (interning) already exists with strings, (does it?) why is it not acceptable for R&T? Shouldn't the problem be addressed in the same way? I.e. instead of introducing an e.g. RnT.sometimesSlowCompare(), have === be the sometimes-slow one -- like it already is for strings -- and introduce a constantTimeCompare() function for interned strings & R&T?

I'm sure there's something(s) I'm missing though because I didn't follow the bit about why that would "add an extra load instruction to existing usages of ===".

@errorx666
Copy link

I'd rather not have R&T at all than have them without value comparison semantics.

@acutmore
Copy link
Collaborator

Hi @gregmartyn, I'll try and respond as best I can!

If the problem (interning) already exists with strings,

You are correct that comparing string with === has a similar risk of also being a 'slow' (non-constant-time) comparison. The concerns are a mix of:

  • strings already having this risk does not necessarily make it OK to increase the number of types with this risk
  • applying the same string lazy-interning optimisations to R&T makes it much-harder for R&T to re-use the existing optimisations that exist for Objects and Arrays - as the memory layouts won't match.

I didn't follow the bit about why that would "add an extra load instruction to existing usages of ===".

This does not apply to all JS engines, it depends on the information that comes with their 'tagged-pointers'. In some engines the pointer itself encodes that it points to a JS-object or a JS-string. This means that the instructions for === have a fast path where if the two pointers point to JS-objects they can immediately return the comparison of the two addresses. If R&T are built re-using most of the existing mechanisms for JS-objects, so they can befit from the existing optimisations and GC, then this means that whenever two objects are compared with === the instructions now lose that fast path and will need to load/follow the two addresses to further inspect if the two objects are regular objects or R/T to know if they can be compared by address or by content. This extra load instruction while small on it's own could add observable overhead when used in a hot-path of an application.

@lautarodragan
Copy link

What about using these types as keys in Map and Set? Using a tuple such as [x, y] as key currently is a bit awkward since you have to convert it to a primitive first. Being able to represent composite key more easily was a great benefit of this proposal.

I'd rather not have R&T at all than have them without value comparison semantics.

@demurgos

Myself, I'd still prefer this to go forward, even without them.

fwiw, even though I agree it'd be ideal to have strict equality, a Map[Symbol.keyEquality] would more or less address this issue. Less than ideal since it moves the problem to userland and will probably have less performance, but it could work. Or a new, nativeRecordTupleMap that implements a new key equality algorithm, provided by js engines, which would be more performant.

None of these options are ideal, but moving forward with R&Ts without strict equality doesn't close these doors, and decoupling it from this requirement seems to be an acceptable path forward.

A future, separate proposal could address === comparing by value rather than reference for records and tuples, which may have much less friction if records and tuples are already implemented in engines, or at least the proposal is in stage 3 or 4.

Also, I'm not sure changing === would affect Same-value-zero equality — isn't that an internal engine-specific algorithm that is not necessarily using === internally? If so, even if this proposal did change how this operator works, that might still not affect how Maps and Sets compare keys. These problems might be decoupled from each other — proposing that the Same-value-zero equality compares records and tuples by value regardless of what ===, Object.is, etc do, might be a more addressable proposal.

@ljharb
Copy link
Member

ljharb commented Feb 2, 2024

A future proposal would never ever be capable of changing how any operator behaves with a specific kind of object. Once R&T ships as a non-primitive, that is forever an unavailable path for them.

@demurgos
Copy link

demurgos commented Feb 19, 2024

Myself, I'd still prefer this to go forward, even without them.

If we take a step back, what is even the goal that this proposal is trying to achieve? Is it a new feature or new syntax? In a reply above I was disappointed in the scope reduction but felt the sugar may still be useful. Since then I had some time to think more about it, and I don't want to give up the value semantics. Please keep them, they are the core of this proposal.

The first paragraphs of the README are (emphasis mine):

Records and Tuples can only contain primitives and other Records and Tuples. You could think of Records and Tuples as "compound primitives". By being thoroughly based on primitives, not objects, Records and Tuples are deeply immutable.

Records and Tuples support comfortable idioms for construction, manipulation and use, similar to working with objects and Arrays. They are compared deeply by their contents, rather than by their identity.

The expectation from the start was that the goal was to get compound primitives, aka value semantics. These are an incredibly powerful feature because they fully integrate with regular operators and methods. ===, Object.is, Array#indexOf, etc. just work and it allows to avoid round-tripping to a string serialization. Compound values become regular values with R&T.

I don't want to be too harsh for all the people working on this who want to drop value semantics, but it's also one of the most important ongoing proposals for JS. Syntax sugar can be great, for example the class keyword did not unlock new features initially but made an awful lot of patterns more convenient. Without value semantics, #[] and #{} could be an Object.deepFreeze method, saving a bit of keystrokes. But is it really what the proposal intended to achieve?

I understand that compound primitives are harder to support, but it may lead to a way better language in the long run. The JS module system is one of the best of any language I ever used, it required huge amounts of effort but the result was well worth it. Native async support was also great and deeply transformed JS for the better. R&T with value semantics is one of those landmark proposals that should be fully realized IMO. JS does not have stuff like operator overloading, so R&T is the best shot to get first class language support for composite values without stringifying the world. Compound primitives are also a solid foundation to expand on in the future; so it's better to take a bit more time and have the best version of them that we can.

@Samuel-Martineau
Copy link

The most common case I run into where I feel like records and tuples would be a godsend is when I am trying to use structured data as a map key, without having to encode it as string or resorting to any other hacks. However, this doesn't work at all if records / tuples are just frozen objects instead of primitives. Although I appreciate the technical difficulty of adding them, I feel like it isn't worth it without this key feature.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

That use case - being able to control the "key" of Maps and Sets - is valuable to solve, and doesn't need R&T to do so.

@Samuel-Martineau
Copy link

That use case - being able to control the "key" of Maps and Sets - is valuable to solve, and doesn't need R&T to do so.

Whilst you are right that R&T is not needed for this use case, it feels natural to use an immutable, compared by value data structure for a Map key, doesn't it? It is simply my opinion that the real value of R&T isn't so much the immutability as the possibilities that are opened by a complex data primitive.

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

Immutability is a requirement for any stable keying (it's also the reason the value must not be key-comparable before being stable). What is not automatically needed is for the composite key value to expose the values composing it.

The original R&T proposal solved this by making values born deeply immutable, which made it itself usable as a composite key. What the champions are now exploring is the problem space if the immutable R/T and the composite key are not the same thing.

@demurgos
Copy link

demurgos commented Feb 20, 2024

Having key semantics different from equality leads to many inconsistencies. It's possible to build many examples, but they all boil down to expecting that a Map/Set having a key implies that there exists an item equal to the key. In particular, I expect all three functions below to be equivalent*.

function setHas1(s: Set<unknown>, x: unknown): boolean {
  return s.has(x)
}

function setHas2(s: Set<unknown>, x: unknown): boolean {
 for (const v of s) {
    if (v === x) {
      return true;
    }
    return false;
  }
}

function setHas3(s: Set<unknown>, x: unknown): boolean {
  return [...s].indexOf(x) >= 0;
}

The strength of value semantics is that it's not a special case for Map and Set, but that it fully integrates with the language at all levels.

*: Yes, I'm aware of NaN, it does not mean that we should add more edge cases

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

@demurgos thats not an assumption that holds for subclasses, so it’s a false assumption. Only the first is valid.

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

boil down to expecting that a Map/Set having a key implies that there exists an item equal to the key

I don't think anyone has suggested changing the default key semantics of Map/Set. However one line being explored is that Map/Set could be configured at construction with a keying function to derive the keying value from the provided "key" (This is roughly equivalent to an alternative Map/Set implementation). The keying value would have normal equality / SameValue semantics.

I expect all three functions below to be equivalent*.

I don't believe a program can assume the behavior of dynamically invoked methods on parameters. Anyone today can already provide a custom map/set/array instance with overriden methods.

The program can only trust the behavior of instances it itself creates (modulo prototype pollution which is an orthogonal concern). Since noone is proposing to change the default behavior of existing Map/Set, I don't believe there is a problem.

@demurgos
Copy link

I understand that JS is dynamic by nature, everything can be patched, etc. But I would still argue that any subclass or method override where iteration disagrees with membership no longer provides a consistent Set interface.

JS already has multiple ways to check for equality Object.is, ==, ===; so I get why parametrizing a map/set so it uses a different behavior is a reasonable idea. It's actually a good idea, but I don't think it's the best for R&T. I'll wait to see what this exploration will lead to.

@Maxdamantus
Copy link

The keying function idea seems a bit awkward to me, because it's possible that the keying function will return different values for the same original key over time (eg, k => JSON.stringify(k) will vary if someone modifies the original k object, or an object referred to by it, etc).

This means that either the mapped key needs to be stored indefinitely in the Map, or else the Map would be able to get into an undefined state due to inconsistencies in derived key hashing (eg, if the implementation uses a hash map or similar) or key ordering (eg, if the implementation uses a tree map or similar).

const map = new Map(k => JSON.stringify(k));
const k = { foo: 42 };
map.set(k, "hello");
k.foo++;
map.set(k, "world");

Presumably the Map above now contains two entries with the same original key value k, but if it's backed by a hash map and the mapped values ("{\"foo\":42}", "{\"foo\":43}") are not stored, there's a chance that both set operations hash to the same bucket so it only ends up with one entry, thus exposing behaviour of the underlying hashing/bucket.

I can imagine there might be some use for it, but I think if it becomes the main way to implement "compound keys", it could be a big footgun.

Having a separate constructor for compound keys just seems more sound, but it does indeed just sound like a less useful version of tuples (since they're basically just tuples that can't be read).

@mhofman
Copy link
Member

mhofman commented Feb 20, 2024

The keying function idea seems a bit awkward to me, because it's possible that the keying function will return different values for the same original key over time (eg, k => JSON.stringify(k) will vary if someone modifies the original k object, or an object referred to by it, etc).

It's the responsibility of the creator of the Map/Set to use a keying function that's stable. Relying on the data in the key itself with the key being mutable is obviously not compatible with stability, hence why I mentioned that immutability is a requirement for any stable keying

the mapped key needs to be stored indefinitely in the Map

Yes, the Map implementation would not rely on the keying function for existing entries, not just because of stability, but because otherwise it would expose the internal implementation of the Map to the keying function.

Having a separate constructor for compound keys just seems more sound

Right, a composite key is a useful building block for custom collection keying: the keyBy options can just return a composite key. I personally believe it actually should be the only accepted return values of keyBy, to avoid potential footguns like you describe.

@Maxdamantus
Copy link

Maxdamantus commented Feb 20, 2024

I personally believe it actually should be the only accepted return values of keyBy, to avoid potential footguns like you describe.

How does that avoid the footgun? Presumably k => Map.key(k.foo) isn't any more stable than k => JSON.stringify(k) in the example I gave. Even if it returns an object, I would expect the underlying equality (and hash) operation to continue to be stable (eg, SameValue or SameValueZero isn't going to give different results for the same pair of values, even if the values are mutable objects).

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

My expectation is that a keying function would only ever be called once for any given value, and the result memoized.

@acutmore
Copy link
Collaborator

While a dedicated key creating API does not prevent mutated objects being unstable sources of key generation, it does still have benefits compared to JSON.stringify - it could help avoid worrying about key order (JSON.stringify({ a: 1, b: 2 }) !== JSON.stringify({ b: 2, a: 1 })), and support more values than JSON (JSON.stringify([1n]) // throws).

While the primitive, value semantics, R&T design guarantees stability - the same indirection issues can arise because R&T could only store other primitives. So indirection (via symbols in weakMaps) was required to store non-primitive values (e.g. Temporal.ZonedDateTime), just like someone can provide a non-stable keyby function to a Map or Set, someone could have a non-stable lookup for the objects referenced by their R&T.

So to me the question is less of, is it possible to create an odd scenario with the API, and more of if the APIs encourage good patterns by default and does less safe usages of the API clearly standout when reading the code.

@Maxdamantus
Copy link

My expectation is that a keying function would only ever be called once for any given value, and the result memoized.

If that's taken literally, it means it would need to remember multiple original key values, eg:

const map = new Map(k => {
    console.log("called");
    return JSON.stringify(k);
});
const k0 = { foo: 42 };
const k1 = { foo: 42 };
map.set(k0, "hello"); // "called"
map.set(k1, "world"); // "called"
map.get(k0); // ??
map.get(k1); // ??

Either both distinct key objects (k0 and k1) would need to be stored in the map, or at least one of the get operations will need to call the keying function again.

The former sounds very weird to me, since repeated sets of the "same" key will grow the map. Perhaps the implementation can mitigate this by having an implicit weak map associated (so temporary objects at least are not maintained), but this seems like a lot of undesirable overhead.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

yes, i'd expect k0 and k1 to both be stored somewhere, mapping to the json stringification of the key. Object keys would indeed need to be held weakly in the "somewhere".

@sethfowler
Copy link

I'm just a JS developer, and not part of the standards body, so this is just my two cents, but my preference would be to either retain the value semantics or abandon the entire proposal.

Records and tuples with value semantics would be transformative. As I'm sure everyone posting here already knows, many popular libraries and frameworks in the JavaScript ecosystem (React, Redux, RxJS, etc) are "reactive" in the sense that they involve computing the difference between a new state and a prior state and taking some action based on the changes. Any programmer who works professionally with these libraries always needs to be aware of the reference semantics of objects and arrays; carelessly replacing one object or array with another, identical object or array can trigger a large amount of unnecessary downstream work or even an infinite loop. Records and tuples with value semantics would make it much easier to write correct, performant code in these situations.

Record and tuples without value semantics, by contrast, don't really help with this problem at all. It doesn't seem to me that they would offer much value in terms of solving my day-to-day problems programming in the JS ecosystem. Since there's a limited amount of nice syntax available in the language, if value semantics are not on offer, my preference would be to save the syntax for something more impactful.

@slorber
Copy link

slorber commented Apr 2, 2024

Frameworks benefits of value semantics

my preference would be to either retain the value semantics or abandon the entire proposal.

100% agree with this. I don't see myself using R&T much if it's only about immutability, and not value semantics.

I've tried to express the importance of this behavior on my Records & Tuples for React
, but IMHO this is not just for React, and many other frameworks would benefit from this behavior (for example Angular ChangeDetection onPush ?

IMHO JavaScript makes the React idea f(state) = UI hard to use efficiently because reconciliation is not optimal, and that's also why more and more frameworks move to a signal-based approach.


Upcoming meeting and push back

As far as I understand the new direction will be discussed in a few days at the next TC39 meeting, and I hope the value semantic will remain part of this proposal

CleanShot 2024-04-02 at 17 12 06

IMHO interning is not a bad idea even if there's a creation time cost (is it high?). These new data structures are opt-in, people are not forced to use them everywhere. Teaching that Records are slower to create and should be used purposefully doesn't seem like a huge deal to me in a world where devs are not already spending much time micro-optimizing their code.

I mean, aren't Set/Map also more expensive to create, and only then improve lookup performance? (that's my intuition it's the case but I may be wrong here). I don't think anyone here would argue that introducing Set was a bad idea despite that tradeoff. People usually don't create short-lived Map/Set just to do a single set.has() call behind.

Devs could start to adopt this by simply wrapping JSON fetch results at the edge of their programs. A JSON payload transposes well to a Record, and this would very likely automatically optimize React apps on refetches that return the same data, a quite low-hanging fruit.

In addition, the friction of not being able to add a Date or function to a Record without using symbols will likely discourage many devs from using Records as a default data structure. Lint rules can also be set up to ensure they are only used for good reasons, on long-lived objects at the edges of programs.


Related to the structs proposal?

As far as I understand, browser implementors push back on value semantics. Notably @syg from V8 expressed concerns, and also works on a structs proposal allowing to share data structure across agents (like Workers).

The 2 proposals do not have the same goal, but to me they remain a bit related. That's also what others like @Jarred-Sumner think apparently:

CleanShot 2024-04-02 at 17 39 09

There's also this issue from @leebyron

#269

I envision one of R/T's long term benefits to be unlocking multithreaded programming models for JavaScript.

I really like the current Records & Tuples proposal, but to be honest I feel like not being able to efficiently share a Record across agents would also limit its usefulness.

IMHO the most powerful would be to have both: compound primitives that can be shared across agents efficiently.

Of course, easier said than done 😅 but this would be a truly disruptive capability that would probably lead to the creation of a new wave of frontend frameworks. I envision usage of a Record-based JSX where you can render things concurrently.


Test traction in userland first?

Sometimes I wonder: what prevents implementing interning in userland, as a lib?
(apart from paying a higher creation-time upfront cost)
That's an idea I have for a while but never had time to execute it.

People are already using things like deepEqual(), JSON.stringify() and use-deep-compare-effect anyway 🤷‍♂️ .
I recently came across this article for example:
https://profy.dev/article/react-useeffect-with-object-dependency

Maybe if the community show traction for a userland implementation of the patterns enabled by Records & Tuples, there would be more incentives for browsers to implement this natively?

I mean something like this:

function userlandInterning(obj) {
  // TODO
}

async function doFetchtUser() {
  // replace with actual fetch call
  return {
    id:1, 
    name: "Seb", 
    company: {
      id: 1, 
      name: "This Week In React"
    }
  }
}


export async fetchUser() {
  return doFetchtUser()
           .then(userlandInterning)
           .then(applyDeepFreeze)
}

await fetchUser() === await fetchUser()

(await fetchUser()).company === (await fetchUser()).company;

Here's again the example of fetching JSON payloads, but there are other places where this could be useful.

For example, it would be very convenient for React apps if reading URL search params would always give an object with the same identity (unless those params change of course)


Does it make any sense?

@sebinsua
Copy link

sebinsua commented Apr 2, 2024

It's surprising that there's been push-back over the performance impact. As others have mentioned already, due to the way reactive applications tend to work in JavaScript, we all have to jump through a lot of hoops to memoise our arrays and objects so that our applications can use inequality to determine whether extra work should be done. Because of this, the performance of our applications is often fragile and will break if we accidentally create a new array or object (a new memory reference causes an avalanche of reactive/reconciliation work).

It'd help tremendously if we had "value semantics" -- I suspect that in React applications this would give us much better performance as well as making it easier to reason over.

@bakkot
Copy link
Contributor

bakkot commented Apr 3, 2024

Sometimes I wonder: what prevents implementing interning in userland, as a lib?

Nothing; see for example the following npm packages:

@Maxdamantus
Copy link

It's surprising that there's been push-back over the performance impact.

Indeed, this has been bugging me for a while, so I thought I'd try to express some of my thoughts around it:

#387 (comment)

Additionally due to the way most JS engines are architected it would add an extra load instruction to existing usages of ===, so existing apps that haven't adopted R&T could still be negatively impacted.

#387 (comment)

You are correct that comparing string with === has a similar risk of also being a 'slow' (non-constant-time) comparison. The concerns are a mix of:
* strings already having this risk does not necessarily make it OK to increase the number of types with this risk
* applying the same string lazy-interning optimisations to R&T makes it much-harder for R&T to re-use the existing optimisations that exist for Objects and Arrays - as the memory layouts won't match.
...
If R&T are built re-using most of the existing mechanisms for JS-objects, so they can befit from the existing optimisations and GC, then this means that whenever two objects are compared with === the instructions now lose that fast path and will need to load/follow the two addresses to further inspect if the two objects are regular objects or R/T to know if they can be compared by address or by content.

As well as strings, this presumably applies to bigints as they are similarly of arbitrary size so might also be represented by a pointer to data that needs to be compared. I suspect even numbers are problematic, since the possibility of comparing NaN values will also mean that a byte-wise comparison is insufficient.

I would expect an optimal JS implementation supporting R&T (with "value semantics") would treat R&T values as having types distinct from object, just as it must already do for string, bigint and number.

There's a question of whether typeof #[] is "object" or "tuple" (#292 (comment)), but I think this should be mostly irrelevant to the underlying representation of tuple values [0]. I think this would only really be relevant in cases such as:

if (typeof v0 === "object" && typeof v1 === "object") {
    // `v0` and `v1` are statically known to be pointer values, below comparison is FAST
    const isEqual = v0 === v1;
}

But I really don't see how this is relevant to most code. In my experience, the vast majority of uses of === or !== are not within blocks of code with some typeof v === "object" checks, so if the engine knows, checks or assumes (eg, using potential deoptimisations) that the compared values are objects, it should be able to use the same logic in a R&T world, where R&T values simply have some different underlying type/representation, again just as it already does for string/bigint/number.

Additionally, as mentioned above, I think native R&T implementations with value semantics will allow people to naturally write better-performing code, and I suspect this will greatly outweigh the performance losses that might occur within hypothetical typeof v === "object"-constrained code.


[0] I don't see why the engine can't have multiple underlying representations mapping to the same typeof value. I suspect engines already do this with numbers, for supporting both IEEE-754 and integer representations, which would be another complication for the === operator that is already managed.

@acutmore
Copy link
Collaborator

acutmore commented Apr 3, 2024

Sometimes I wonder: what prevents implementing interning in userland, as a lib?

Nothing; see for example the following npm packages:

Yep, and the R&T polyfill linked in this repo's README https://github.com/bloomberg/record-tuple-polyfill.

One thing that these userland implementations do show is that it's not easy to get them 100% right. There are common issues that tend to crop up.

Even if they get things 100% correct, using these values in a Weak{Map,Set,Ref} should be avoided as they can get collected. And while using FinalizationRegistry can help reduce total memory usage, the maps created to perform the lookups are likely to out-live the young generation and not be collected until there is a a more complete GC.

So while it is possible in userland today, there are still advantages to a level of interning being provided by the language.

@bakkot
Copy link
Contributor

bakkot commented Apr 3, 2024

Even if they get things 100% correct, using these values in a Weak{Map,Set,Ref} should be avoided as they can get collected.

From the implementations I wouldn't expect that to be true for tuplerone or immutable-tuple? Those libraries use a global interning table and do not use WeakRefs, so if all of the constituents are still live that should prevent collecting the value. (The values can get collected if they're no longer held and not all the constituents are still live, but in that case it would be impossible to recreate the value anyway, so that's expected.) Of course that does mean that if you create a tuple holding only primitives it can't be collected (which tuplerone solves by forbidding this).

I might be missing something though. It is certainly true that getting this right in userland without leaks is quite subtle.

@acutmore
Copy link
Collaborator

acutmore commented Apr 3, 2024

From the implementations I wouldn't expect that to be true for tuplerone or immutable-tuple

Yeah, for userland today there are essentially two choices:

  • only WeakMap
    • pro: Now the key will remain alive/reachable as long as all the object parts of it are
    • con: key(1) and key(Object, 1) are effectively leaks
  • WeakMap + WeakRef + FinalizationRegistry
    • pro: Now the key will remain alive only while the key itself is held by something. Dropping the key lets parts of the interning table get collected
    • con: if the key is only held weakly, then it can disappear - even if it can technically be re-minted.
      • this can be worked around by patching Weak{Map,Set,Ref} to intercept and handle these keys in a different way but patching globals comes with its own set of issues

@bakkot
Copy link
Contributor

bakkot commented Apr 3, 2024

I suppose there's a third option for native, which is that the resulting objects are branded and prevented from being used as keys in weak collections (or maybe have this restriction apply only to tuples whose values are all primitives). That makes the GC unobservable so you can drop them once they're not held even if they could be re-created.

But otherwise native has basically the same choices as userland, I think? Native could do it a little more efficiently, but the tradeoffs are basically the same.

(Which isn't to say I'm opposed to having this be part of the standard library!)

@acutmore
Copy link
Collaborator

acutmore commented Apr 3, 2024

Object interning in Java (Project Valhalla) had/has an enum for how interned objects should behave in a weak map: https://github.com/openjdk/valhalla/blob/af7f7aba57670bacc3f9d556a5554828b4a42c1f/src/java.base/share/classes/java/util/WeakHashMap.java#L1590-L1609

  • soft - allow interned values but they may be dropped if there is memory pressure
  • strong - allow interned values but they are held strongly by the weakmap
  • discard - silently ignore interned values, they are never in the weakmap
  • throw - loudly throw on attempts to insert interned values into the weakmap

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

Having anything that's an object not be weakly holdable would be a pretty disruptive breaking change, given that browsers refused a "canBeWeaklyHeld" predicate (due to web compat concerns over time)

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

That wouldn't be a breaking change, at least not the way those words are usually used. Do you have a link to the discussion where browsers refused a "canBeHeldWeakly" predicate? I'd hope that if we went this route a such a predicate would be palatable.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

The reason they rejected it, as I understand, is because they didn't want the answer such a predicate gives, for a given value, to change over time. The counterargument I and others offered was that that was the entire benefit of such a predicate - that you didn't have to look at the object, you just would branch on the predicate, and so it would be safe to migrate all current checks to use this predicate since it would always do what's expected even if the set of things changed.

I can't locate the specific point in the notes, but I believe @syg was the more vocal folks on the other side of that discussion?

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

The reason they rejected it, as I understand, is because they didn't want the answer such a predicate gives, for a given value, to change over time.

Ah, sure. But now that we have symbols as WeakMap keys, it wouldn't change over time - there's no more values for which the answer to that question could plausibly go from "false" to "true", and of course you can't change the answer from "true" to "false" for anything without that actually being a breaking change (because people could have been using those values in a WeakMap). So if that was the only objection I don't see a reason not to add the predicate now, especially if we take the "tuples are objects but can't be held weakly" route.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

Because current code will already exist that says:

if (x && typeof x === 'object') {
  weakset.add(x);
}

and rely on it not throwing for any x.

@bakkot
Copy link
Contributor

bakkot commented Apr 4, 2024

That code will not break unless someone passes one of the new values in. That's not a breaking change any more than adding a new kind of primitive would be - in a very similar vein, lots of code assumes that they have an exhaustive switch over possible typeof values.

"Old code will break if you give it a new kind of value" isn't something we regard as a breaking change.

@syg
Copy link

syg commented Apr 4, 2024

Right, the compat requirement is that the predicate does not change its return value for any particular value over time. Adding new kinds of values means you have to decide on what the new return value ought to be for the predicate, which isn't the same thing as changing the result for existing values.

@ljharb
Copy link
Member

ljharb commented Apr 4, 2024

Fair enough. I still think that making the description of "what can be weakly held" even more complex, absent a predicate, is not a positive change in the language.

@bakkot
Copy link
Contributor

bakkot commented Apr 8, 2024

WeakSets and WeakMaps do guarantee that values are held as long as the key is reachable, so no, it can't just do nothing. But throwing is fine.

@travislwatson
Copy link

If we wanted to avoid impacting performance of existing applications, couldn't the performance penalty of equality be paid when #[] and #{} are used, rather than waiting for them to be compared? The constructor could first look for existing reference before allocating new memory. That could then fall back to standard reference equality. That seems to be what the userland offerings do.

@acutmore
Copy link
Collaborator

couldn't the performance penalty of equality be paid when #[] and #{} are used, rather than waiting for them to be compared

Yes, this approach is referred to as "interning" and is how the polyfill works. There are downsides of this approach. -0 can't be both stored in the structure and also equal to 0, so one of those would have to give. Also this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability. There was also concern from implementors that this approach would be difficult to optimize the garbage collector to handle as it could cause frequent invalidation of "the generational hypothesis" that most objects are short lived.

@travislwatson
Copy link

There are downsides of this approach. -0 can't be both stored in the structure and also equal to 0, so one of those would have to give.

Isn't that just a limitation of having to use Weak{Map,Set,Ref} for userland/polyfill?

this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability.

Are we certain on this? If the lookup is not performed, then new objects have to be created that may not be needed, both unnecessary memory allocations and garbage collections.

@AprilArcus
Copy link
Author

AprilArcus commented May 10, 2024

-0 can't be both stored in the structure and also equal to 0, so one of those would have to give. Also this makes the values more expensive in cases where the values are not being compared, if they are only being used for the immutability.

these seem like reasonable and easily documented trade-offs to me.

@Maxdamantus
Copy link

I agree about interning being generally undesirable performance-wise (otherwise engines would be interning strings and bigints, and they demonstrably don't: #292 (comment)), but it would be nice if someone can explain the actual performance reasons as to why another type of content-compared value can't be added.

Was this not a concern when adding bigint (another type of arbitrary-sized value that in general isn't interned)? Does this mean that there will never be another similar type (eg, decimal)?

Is the performance issue contingent on typeof #[] === "object", or does it apply even if typeof #[] === "tuple"? In the former case I imagine someone would need to look at the proportion of comparisons where a similar expression condition has been applied (I suspect this is very low), and in the latter case I think some explanation would be needed as to why R/T values are fundamentally different to bigint or string.

@ljharb
Copy link
Member

ljharb commented May 10, 2024

@Maxdamantus yes, my understanding is that the lack of adoption of bigint has made browsers feel that the effort isn't worth it, which means that (unless that changes) there will never again be another new primitive type, which precludes both R&T and Decimal as primitives.

@Maxdamantus
Copy link

@ljharb That doesn't really answer the question though. Was there a significant performance impact from adding bigint to the language? All major browsers currently support it. And if there was an impact, will it be increased by R/T?

As for usefulness, I suspect the addition of R/T values is more important than bigint values was, since emulation of compound comparisons is fairly common (confer React.useMemo, or the discussions above about "keying functions" in Map).

I suspect another issue with bigint usability that it was harder to polyfill (maybe this doesn't apply nowadays, but I feel like that probably limited adoption when people should have been excited to use it), whereas R/T can already be polyfilled (with the minor caveat of normalising -0 to 0).

@ljharb
Copy link
Member

ljharb commented May 10, 2024

I'm not clear on how much of the large impact was performance and how much was implementation difficulty, but the impact was large, and for R&T it would again be, and they aren't willing to repeat it.

I 100% agree that R&T (and Decimal) are each much more useful than BigInt, and would have gotten way more adoption, and that it's turned out to be a shame that BigInt went first - but this is where we are.

@errorx666
Copy link

I don't really have a use-case that needs BigInts, but R&T helps with the ubiquitous problem that is state management.

I also don't feel that implementation difficulty should be a driving factor in steering the language design. My primary concern is always what's best for the language itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests