(Translated by https://www.hiragana.jp/)
SPICE-0010: Overhauled mapping/listing typechecks by bioball · Pull Request #11 · apple/pkl-evolution · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPICE-0010: Overhauled mapping/listing typechecks #11

Merged
merged 4 commits into from
Nov 5, 2024

Conversation

bioball
Copy link
Contributor

@bioball bioball commented Aug 15, 2024

This addresses the following issues:

apple/pkl#405
apple/pkl#406

The implementation for this SPICE can be found here: apple/pkl#628

@bioball bioball changed the title Add SPICE for overhauled mapping/listing typechecks SPICE-0010: Overhauled mapping/listing typechecks Aug 15, 2024
Effectively, `listing` and `listingOfNumbers` are _different_ in-memory objects; one which has type assertions at the end of each member, and one which does not.
This difference is opaque to users; these two objects have the same hash code, and same equality semantics.

In the amends chain, `listingOfNumbers` has the same parent as `listing`. It delegates its member lookups to `listing`, and adds a typecheck at the end of each lookup.
Copy link
Contributor

@translatenix translatenix Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bioball Isn’t this an undesirable breaking change? If I’m not mistaken, introducing a delegation step means that overridden values are no longer reflected in dependent values, breaking late binding. Same for the as operator.

Copy link
Contributor Author

@bioball bioball Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure what you mean. Can you provide a snippet that demonstrates what you believe is breaking?

BTW: this PR is already merged into main, so feel free to play around with it and see if there is a regression here in Pkl's late-binding semantics.

Copy link
Contributor

@translatenix translatenix Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would break, but fortunately it doesn't:

m1: Mapping = new {
  ["name"] = "Bob"
  ["name2"] = this["name"]
}
m2: Mapping<String, String> = (m1) {
  ["name"] = "Marley"  
}
// if m2["name2"] simply delegated to m1["name2"], I'd expect "Bob" here
name2 = m2["name2"] // "Marley"

One more question:
Will the following result in 10 objects and 9-fold delegation from m10 to m1?

m1: Mapping<String, String> = new { ["name"] = "Bob" }
m2: Mapping<String, String> = m1
m3: Mapping<String, String> = m2
...
m10: Mapping<String, String> = m9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an optimization that we make here: if the target type results in the same typecheck, it doesn't do any delegation. So, in this case, there's still just one mapping. See https://github.com/apple/pkl/blob/a7cc09892533b12a80c434937544406b18ae24b2/pkl-core/src/main/java/org/pkl/core/ast/type/TypeNode.java#L1621-L1623 for implementation details!

Copy link
Contributor Author

@bioball bioball Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the delegation that's happening:

m1: Mapping = new {
  ["name"] = "Bob"
  ["name2"] = this["name"]
}

m2: Mapping<String, String> = (m1) { ["name"] = "Marley"  }
                                   ^^^^^^^^^^^^^^^^^^^^^^^^ // original VmMapping

    ^^^^^^^^^^^^^^^^^^^^^^^                                 // new VmMapping that delegates to the one above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around how the delegation works:

m1: Mapping = new {
  ["name"] = "Bob"
  ["name2"] = this["name"]
}

// apparently, `m2 = m1 as Mapping<String, String>` isn't equivalent
// to the following. Otherwise, late binding would break (see m3).
m2 = new Mapping {
  ["name"] = m1["name"] as String 
  ["name2"] = m1["name2"] as String
}

m3 = (m2) {
  ["name"] = "Marley"  
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if this is any more helpful, but... the lookup on name isn't m1["name"] as String; it's ((m1) { ["name"] = "Marley" })["name"] as String.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the implementation, I think I understand what's going on:
The new object resulting from m1 as Mapping<String, String> doesn't really define new members that delegate to m1's members.
Instead, the new object knows its value type, and as String happens as part of member access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's right!


listing2 = (listing) { "hello" } // <1>
----
<1> Okay; result is `new Listing { 1; 2; 3; "hello" }`
Copy link
Contributor

@translatenix translatenix Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively, this makes sense to me:
We append a string to a Listing<Int> and get a listing containing ints and strings.

However, the following makes less sense to me:

local listing1: Listing<Int> = new { 1 }
local listing2: Listing<String> = listing1
local listing3 = (listing2) { "2" } // append a string to a Listing<String>
first = listing3[0] // we got an int (and no error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there must be a bug, then. I would expect this to error. Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


There are two underlying issues with this approach.

=== Performance cost
Copy link
Contributor

@translatenix translatenix Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the current implementation of surrogate objects is itself quite expensive: It adds 5 fields to, and instantiates two empty collections for each instance of, class VmListing and VmMapping (via a common superclass). These classes aren't only used for surrogate objects but also for regular listings/mappings. Additionally, case distinctions for surrogate objects are added throughout the codebase.

One way to improve on the current implementation is to move towards an object model that supports having multiple internal representations for the same type of Pkl object. For example, different representations could be used for small, large, and surrogate listings/mappings. Truffle offers special support for implementing such object models.


== Alternatives considered

N/A
Copy link
Contributor

@translatenix translatenix Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered to preserve the element type of Listing<X> when the listing is amended? Although a breaking change, this doesn't break the common case xs: Listing<X> where xs is repeatedly amended because the property type Listing<X> rules out adding elements of other types. And it's much more space/time efficient because it doesn't require adding another surrogate after every amend.

Copy link
Contributor Author

@bioball bioball Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've thought about this in the past, but this would be quite limiting on the usage side.

For example:

dogs: Listing<Dog> = new {
  new { breed = "Greyhound" }
}

animals: Listing<Animal> = (dogs) {
  new Horse { ... } // fail; can only add `Dog` here
}

Copy link
Contributor

@translatenix translatenix Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how often this occurs in practice, and if something like the following would be acceptable in such cases:

animals: Listing<Animal> = new {
  ...dogs
  new Horse { ... }
}

"obj-surrogate-obj-surrogate-..." is quite an unfortunate data structure to handle the common case.
I think the current surrogate implementation roughly quadruples the memory overhead of listings and mappings in this common case. A deep chain will probably also slow down many operations.

Copy link
Contributor Author

@bioball bioball Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I'm switching terminology to "delegate" for this, because "surrogate" turned out to be kind of a confusing term when discussing this concept internally. Will update docs.

I don't think this is quite so bad for perf. You only get one delegate per typecast, and in the normal case, this happens infrequently.

Idiomatic Pkl code typically looks like this, where there is a schema, and then data that adheres to the schema:

// MyPeople
people: Listing<Person>
// myData.pkl
amends "MySchema.pkl"

people {
  new { name = "Barney" }
}

This results in only one delegate when myData is eval'd; when people is accessed and cast to Listing<Person> 1. Creating new members within people here isn't any more expensive than it is today.

There are two cases that I can think of where this might end up being problematic:

  1. There are many collections, rather than many members within a collection. E.g. data that ends up looking like [[1], [2], [3], [4], [5]].
  2. A type-cast is executed, where the member type is different, and both the original and the casted object are forced (both original and casted receive their own cached members):
    people1: Listing<People> = new { ... }
    people2: Listing<People(lastName == "Smith")> = people1

In any case, I think this is an optimization problem, rather than a problem with the design of the language. Changing this would be a big breaking change, and it would also differ from how this works in most languages (concat/plus usually accepts contravariant members).

Footnotes

  1. Note: people { ... } desugars to people = (super.people) { ... }, except we do not type-check when accessing super.people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design and implementation should inform each other.

Throwing away all reified type information on every amend is unfortunate from an implementation perspective:

  • It is expensive in the common case where a listing is amended and assigned to a property of the same type.
    (Each assignment requires a new delegate.)
  • It rules out many potential optimizations, such as using a special representation for Listing<Int>.

The current design also means that bugs can go undetected in programs that aren't fully typed.

Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).

Copy link
Contributor Author

@bioball bioball Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expensive in the common case where a listing is amended and assigned to a property of the same type.
(Each assignment requires a new delegate.)

Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.

It rules out many potential optimizations, such as using a special representation for Listing.

I'm not so sure; imagine we have a special representation of Listing<Int> called VmIntListing:

class ListingAmendNode {
  @Specialization
  VmIntListing eval(VmIntListing parent, long[] members) {
    // we can keep the specialized representation 
  }

  @Fallback
  VmListing eval(VmIntListing parent, Object[] members) {
    // need to drop back to generic `VmListing` now
  }
}

Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).

Yup! And this is a orthogonal to this SPICE anyway (how typechecking works vs. how amend is typed).

Copy link
Contributor

@translatenix translatenix Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.

a: Listing<Int> = new { 0 }
b: Listing<Int> = (a) { 1 }
c: Listing<Int> = (b) { 2 }

In the common case that an amended listing is assigned back to a property of the same type (or even the same property), one delegate is created for every amend, doubling the number of VmListing objects.

It gets worse for Listing<Listing<Int>>.

I'm not so sure; imagine we have a special representation of Listing called VmIntListing:

Right, that's exactly what I imagined. My point is that if we have an optimized VmIntListing representation, we likely don't want to have both VmListing and VmIntListing in a parent/delegate chain.

@bioball bioball merged commit c4c48d7 into apple:main Nov 5, 2024
@bioball bioball deleted the overhauled-mapping-listing-typechecks branch November 5, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants