SPICE-0010: Overhauled mapping/listing typechecks #11

bioball · 2024-08-15T20:11:25Z

This addresses the following issues:

The implementation for this SPICE can be found here: apple/pkl#628

translatenix · 2024-10-22T06:45:45Z

spices/SPICE-0010-overhauled-mapping-listing-typechecks.adoc

+Effectively, `listing` and `listingOfNumbers` are _different_ in-memory objects; one which has type assertions at the end of each member, and one which does not.
+This difference is opaque to users; these two objects have the same hash code, and same equality semantics.
+
+In the amends chain, `listingOfNumbers` has the same parent as `listing`. It delegates its member lookups to `listing`, and adds a typecheck at the end of each lookup.


@bioball Isn’t this an undesirable breaking change? If I’m not mistaken, introducing a delegation step means that overridden values are no longer reflected in dependent values, breaking late binding. Same for the as operator.

Not totally sure what you mean. Can you provide a snippet that demonstrates what you believe is breaking?

BTW: this PR is already merged into main, so feel free to play around with it and see if there is a regression here in Pkl's late-binding semantics.

I thought this would break, but fortunately it doesn't:

m1: Mapping = new { ["name"] = "Bob" ["name2"] = this["name"] } m2: Mapping<String, String> = (m1) { ["name"] = "Marley" } // if m2["name2"] simply delegated to m1["name2"], I'd expect "Bob" here name2 = m2["name2"] // "Marley"

One more question:
Will the following result in 10 objects and 9-fold delegation from m10 to m1?

m1: Mapping<String, String> = new { ["name"] = "Bob" } m2: Mapping<String, String> = m1 m3: Mapping<String, String> = m2 ... m10: Mapping<String, String> = m9

There's an optimization that we make here: if the target type results in the same typecheck, it doesn't do any delegation. So, in this case, there's still just one mapping. See https://github.com/apple/pkl/blob/a7cc09892533b12a80c434937544406b18ae24b2/pkl-core/src/main/java/org/pkl/core/ast/type/TypeNode.java#L1621-L1623 for implementation details!

This is the delegation that's happening:

m1: Mapping = new { ["name"] = "Bob" ["name2"] = this["name"] } m2: Mapping<String, String> = (m1) { ["name"] = "Marley" } ^^^^^^^^^^^^^^^^^^^^^^^^ // original VmMapping ^^^^^^^^^^^^^^^^^^^^^^^ // new VmMapping that delegates to the one above

I'm still trying to wrap my head around how the delegation works:

m1: Mapping = new { ["name"] = "Bob" ["name2"] = this["name"] } // apparently, `m2 = m1 as Mapping<String, String>` isn't equivalent // to the following. Otherwise, late binding would break (see m3). m2 = new Mapping { ["name"] = m1["name"] as String ["name2"] = m1["name2"] as String } m3 = (m2) { ["name"] = "Marley" }

Don't know if this is any more helpful, but... the lookup on name isn't m1["name"] as String; it's ((m1) { ["name"] = "Marley" })["name"] as String.

After checking the implementation, I think I understand what's going on:
The new object resulting from m1 as Mapping<String, String> doesn't really define new members that delegate to m1's members.
Instead, the new object knows its value type, and as String happens as part of member access.

Yup, that's right!

translatenix · 2024-10-22T22:38:00Z

spices/SPICE-0010-overhauled-mapping-listing-typechecks.adoc

+
+listing2 = (listing) { "hello" } // <1>
+----
+<1> Okay; result is `new Listing { 1; 2; 3; "hello" }`


Intuitively, this makes sense to me:
We append a string to a Listing<Int> and get a listing containing ints and strings.

However, the following makes less sense to me:

local listing1: Listing<Int> = new { 1 } local listing2: Listing<String> = listing1 local listing3 = (listing2) { "2" } // append a string to a Listing<String> first = listing3[0] // we got an int (and no error)

Hm, there must be a bug, then. I would expect this to error. Good catch!

Fix: apple/pkl#725

translatenix · 2024-10-23T19:06:36Z

spices/SPICE-0010-overhauled-mapping-listing-typechecks.adoc

+
+There are two underlying issues with this approach.
+
+=== Performance cost


Note that the current implementation of surrogate objects is itself quite expensive: It adds 5 fields to, and instantiates two empty collections for each instance of, class VmListing and VmMapping (via a common superclass). These classes aren't only used for surrogate objects but also for regular listings/mappings. Additionally, case distinctions for surrogate objects are added throughout the codebase.

One way to improve on the current implementation is to move towards an object model that supports having multiple internal representations for the same type of Pkl object. For example, different representations could be used for small, large, and surrogate listings/mappings. Truffle offers special support for implementing such object models.

translatenix · 2024-10-25T21:56:34Z

spices/SPICE-0010-overhauled-mapping-listing-typechecks.adoc

+
+== Alternatives considered
+
+N/A


Have you considered to preserve the element type of Listing<X> when the listing is amended? Although a breaking change, this doesn't break the common case xs: Listing<X> where xs is repeatedly amended because the property type Listing<X> rules out adding elements of other types. And it's much more space/time efficient because it doesn't require adding another surrogate after every amend.

We've thought about this in the past, but this would be quite limiting on the usage side.

For example:

dogs: Listing<Dog> = new { new { breed = "Greyhound" } } animals: Listing<Animal> = (dogs) { new Horse { ... } // fail; can only add `Dog` here }

I wonder how often this occurs in practice, and if something like the following would be acceptable in such cases:

animals: Listing<Animal> = new { ...dogs new Horse { ... } }

"obj-surrogate-obj-surrogate-..." is quite an unfortunate data structure to handle the common case.
I think the current surrogate implementation roughly quadruples the memory overhead of listings and mappings in this common case. A deep chain will probably also slow down many operations.

FYI: I'm switching terminology to "delegate" for this, because "surrogate" turned out to be kind of a confusing term when discussing this concept internally. Will update docs.

I don't think this is quite so bad for perf. You only get one delegate per typecast, and in the normal case, this happens infrequently.

Idiomatic Pkl code typically looks like this, where there is a schema, and then data that adheres to the schema:

// MyPeople people: Listing<Person>

// myData.pkl amends "MySchema.pkl" people { new { name = "Barney" } }

This results in only one delegate when myData is eval'd; when people is accessed and cast to Listing<Person> ¹. Creating new members within people here isn't any more expensive than it is today.

There are two cases that I can think of where this might end up being problematic:

There are many collections, rather than many members within a collection. E.g. data that ends up looking like [[1], [2], [3], [4], [5]].

A type-cast is executed, where the member type is different, and both the original and the casted object are forced (both original and casted receive their own cached members):
people1: Listing<People> = new { ... } people2: Listing<People(lastName == "Smith")> = people1

In any case, I think this is an optimization problem, rather than a problem with the design of the language. Changing this would be a big breaking change, and it would also differ from how this works in most languages (concat/plus usually accepts contravariant members).

Footnotes

Note: people { ... } desugars to people = (super.people) { ... }, except we do not type-check when accessing super.people. ↩

Design and implementation should inform each other.

Throwing away all reified type information on every amend is unfortunate from an implementation perspective:

It is expensive in the common case where a listing is amended and assigned to a property of the same type.
(Each assignment requires a new delegate.)

It rules out many potential optimizations, such as using a special representation for Listing<Int>.

The current design also means that bugs can go undetected in programs that aren't fully typed.

Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).

It is expensive in the common case where a listing is amended and assigned to a property of the same type.
(Each assignment requires a new delegate.)

Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.

It rules out many potential optimizations, such as using a special representation for Listing.

I'm not so sure; imagine we have a special representation of Listing<Int> called VmIntListing:

class ListingAmendNode { @Specialization VmIntListing eval(VmIntListing parent, long[] members) { // we can keep the specialized representation } @Fallback VmListing eval(VmIntListing parent, Object[] members) { // need to drop back to generic `VmListing` now } }

Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).

Yup! And this is a orthogonal to this SPICE anyway (how typechecking works vs. how amend is typed).

Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.

a: Listing<Int> = new { 0 } b: Listing<Int> = (a) { 1 } c: Listing<Int> = (b) { 2 }

In the common case that an amended listing is assigned back to a property of the same type (or even the same property), one delegate is created for every amend, doubling the number of VmListing objects.

It gets worse for Listing<Listing<Int>>.

I'm not so sure; imagine we have a special representation of Listing called VmIntListing:

Right, that's exactly what I imagined. My point is that if we have an optimized VmIntListing representation, we likely don't want to have both VmListing and VmIntListing in a parent/delegate chain.

Add SPICE for overhauled mapping/listing typechecks

8c0032a

bioball changed the title ~~Add SPICE for overhauled mapping/listing typechecks~~ SPICE-0010: Overhauled mapping/listing typechecks Aug 15, 2024

bioball mentioned this pull request Aug 15, 2024

Overhaul typechecks for Mapping/Listing apple/pkl#628

Merged

bioball added 2 commits August 15, 2024 19:10

Update SPICE-0010-overhauled-mapping-listing-typechecks.adoc

eac0c1e

Update SPICE-0010-overhauled-mapping-listing-typechecks.adoc

6672131

translatenix reviewed Oct 22, 2024

View reviewed changes

translatenix reviewed Oct 23, 2024

View reviewed changes

translatenix reviewed Oct 25, 2024

View reviewed changes

Update terminology: surrogate -> delegate

d342d2f

bioball merged commit c4c48d7 into apple:main Nov 5, 2024

bioball deleted the overhauled-mapping-listing-typechecks branch November 5, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPICE-0010: Overhauled mapping/listing typechecks #11

SPICE-0010: Overhauled mapping/listing typechecks #11

bioball commented Aug 15, 2024 •

edited

Loading

translatenix Oct 22, 2024 •

edited

Loading

bioball Oct 22, 2024 •

edited

Loading

translatenix Oct 22, 2024 •

edited

Loading

bioball Oct 22, 2024

bioball Oct 22, 2024 •

edited

Loading

translatenix Oct 22, 2024

bioball Oct 22, 2024

translatenix Oct 22, 2024

bioball Oct 23, 2024

translatenix Oct 22, 2024 •

edited

Loading

bioball Oct 22, 2024

bioball Oct 23, 2024

translatenix Oct 23, 2024 •

edited

Loading

translatenix Oct 25, 2024 •

edited

Loading

bioball Oct 25, 2024 •

edited

Loading

translatenix Oct 25, 2024 •

edited

Loading

bioball Oct 27, 2024 •

edited

Loading

translatenix Nov 1, 2024

bioball Nov 4, 2024 •

edited

Loading

translatenix Nov 5, 2024 •

edited

Loading


		There are two underlying issues with this approach.

		=== Performance cost

SPICE-0010: Overhauled mapping/listing typechecks #11

SPICE-0010: Overhauled mapping/listing typechecks #11

Conversation

bioball commented Aug 15, 2024 • edited Loading

translatenix Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

bioball Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

translatenix Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bioball Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

translatenix Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

bioball Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

translatenix Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

bioball Oct 27, 2024 • edited Loading

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

bioball Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

translatenix Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

bioball commented Aug 15, 2024 •

edited

Loading

translatenix Oct 22, 2024 •

edited

Loading

bioball Oct 22, 2024 •

edited

Loading

translatenix Oct 22, 2024 •

edited

Loading

bioball Oct 22, 2024 •

edited

Loading

translatenix Oct 22, 2024 •

edited

Loading

translatenix Oct 23, 2024 •

edited

Loading

translatenix Oct 25, 2024 •

edited

Loading

bioball Oct 25, 2024 •

edited

Loading

translatenix Oct 25, 2024 •

edited

Loading

bioball Oct 27, 2024 •

edited

Loading

bioball Nov 4, 2024 •

edited

Loading

translatenix Nov 5, 2024 •

edited

Loading