-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPICE-0010: Overhauled mapping/listing typechecks #11
SPICE-0010: Overhauled mapping/listing typechecks #11
Conversation
Effectively, `listing` and `listingOfNumbers` are _different_ in-memory objects; one which has type assertions at the end of each member, and one which does not. | ||
This difference is opaque to users; these two objects have the same hash code, and same equality semantics. | ||
|
||
In the amends chain, `listingOfNumbers` has the same parent as `listing`. It delegates its member lookups to `listing`, and adds a typecheck at the end of each lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bioball Isn’t this an undesirable breaking change? If I’m not mistaken, introducing a delegation step means that overridden values are no longer reflected in dependent values, breaking late binding. Same for the as
operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally sure what you mean. Can you provide a snippet that demonstrates what you believe is breaking?
BTW: this PR is already merged into main, so feel free to play around with it and see if there is a regression here in Pkl's late-binding semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would break, but fortunately it doesn't:
m1: Mapping = new {
["name"] = "Bob"
["name2"] = this["name"]
}
m2: Mapping<String, String> = (m1) {
["name"] = "Marley"
}
// if m2["name2"] simply delegated to m1["name2"], I'd expect "Bob" here
name2 = m2["name2"] // "Marley"
One more question:
Will the following result in 10 objects and 9-fold delegation from m10 to m1?
m1: Mapping<String, String> = new { ["name"] = "Bob" }
m2: Mapping<String, String> = m1
m3: Mapping<String, String> = m2
...
m10: Mapping<String, String> = m9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an optimization that we make here: if the target type results in the same typecheck, it doesn't do any delegation. So, in this case, there's still just one mapping. See https://github.com/apple/pkl/blob/a7cc09892533b12a80c434937544406b18ae24b2/pkl-core/src/main/java/org/pkl/core/ast/type/TypeNode.java#L1621-L1623 for implementation details!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the delegation that's happening:
m1: Mapping = new {
["name"] = "Bob"
["name2"] = this["name"]
}
m2: Mapping<String, String> = (m1) { ["name"] = "Marley" }
^^^^^^^^^^^^^^^^^^^^^^^^ // original VmMapping
^^^^^^^^^^^^^^^^^^^^^^^ // new VmMapping that delegates to the one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to wrap my head around how the delegation works:
m1: Mapping = new {
["name"] = "Bob"
["name2"] = this["name"]
}
// apparently, `m2 = m1 as Mapping<String, String>` isn't equivalent
// to the following. Otherwise, late binding would break (see m3).
m2 = new Mapping {
["name"] = m1["name"] as String
["name2"] = m1["name2"] as String
}
m3 = (m2) {
["name"] = "Marley"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if this is any more helpful, but... the lookup on name
isn't m1["name"] as String
; it's ((m1) { ["name"] = "Marley" })["name"] as String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking the implementation, I think I understand what's going on:
The new object resulting from m1 as Mapping<String, String>
doesn't really define new members that delegate to m1
's members.
Instead, the new object knows its value type, and as String
happens as part of member access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's right!
|
||
listing2 = (listing) { "hello" } // <1> | ||
---- | ||
<1> Okay; result is `new Listing { 1; 2; 3; "hello" }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively, this makes sense to me:
We append a string to a Listing<Int>
and get a listing containing ints and strings.
However, the following makes less sense to me:
local listing1: Listing<Int> = new { 1 }
local listing2: Listing<String> = listing1
local listing3 = (listing2) { "2" } // append a string to a Listing<String>
first = listing3[0] // we got an int (and no error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there must be a bug, then. I would expect this to error. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: apple/pkl#725
|
||
There are two underlying issues with this approach. | ||
|
||
=== Performance cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the current implementation of surrogate objects is itself quite expensive: It adds 5 fields to, and instantiates two empty collections for each instance of, class VmListing
and VmMapping
(via a common superclass). These classes aren't only used for surrogate objects but also for regular listings/mappings. Additionally, case distinctions for surrogate objects are added throughout the codebase.
One way to improve on the current implementation is to move towards an object model that supports having multiple internal representations for the same type of Pkl object. For example, different representations could be used for small, large, and surrogate listings/mappings. Truffle offers special support for implementing such object models.
|
||
== Alternatives considered | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered to preserve the element type of Listing<X>
when the listing is amended? Although a breaking change, this doesn't break the common case xs: Listing<X>
where xs
is repeatedly amended because the property type Listing<X>
rules out adding elements of other types. And it's much more space/time efficient because it doesn't require adding another surrogate after every amend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've thought about this in the past, but this would be quite limiting on the usage side.
For example:
dogs: Listing<Dog> = new {
new { breed = "Greyhound" }
}
animals: Listing<Animal> = (dogs) {
new Horse { ... } // fail; can only add `Dog` here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how often this occurs in practice, and if something like the following would be acceptable in such cases:
animals: Listing<Animal> = new {
...dogs
new Horse { ... }
}
"obj-surrogate-obj-surrogate-..." is quite an unfortunate data structure to handle the common case.
I think the current surrogate implementation roughly quadruples the memory overhead of listings and mappings in this common case. A deep chain will probably also slow down many operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I'm switching terminology to "delegate" for this, because "surrogate" turned out to be kind of a confusing term when discussing this concept internally. Will update docs.
I don't think this is quite so bad for perf. You only get one delegate per typecast, and in the normal case, this happens infrequently.
Idiomatic Pkl code typically looks like this, where there is a schema, and then data that adheres to the schema:
// MyPeople
people: Listing<Person>
// myData.pkl
amends "MySchema.pkl"
people {
new { name = "Barney" }
}
This results in only one delegate when myData
is eval'd; when people
is accessed and cast to Listing<Person>
1. Creating new members within people
here isn't any more expensive than it is today.
There are two cases that I can think of where this might end up being problematic:
- There are many collections, rather than many members within a collection. E.g. data that ends up looking like
[[1], [2], [3], [4], [5]]
. - A type-cast is executed, where the member type is different, and both the original and the casted object are forced (both original and casted receive their own cached members):
people1: Listing<People> = new { ... } people2: Listing<People(lastName == "Smith")> = people1
In any case, I think this is an optimization problem, rather than a problem with the design of the language. Changing this would be a big breaking change, and it would also differ from how this works in most languages (concat
/plus
usually accepts contravariant members).
Footnotes
-
Note:
people { ... }
desugars topeople = (super.people) { ... }
, except we do not type-check when accessingsuper.people
. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design and implementation should inform each other.
Throwing away all reified type information on every amend is unfortunate from an implementation perspective:
- It is expensive in the common case where a listing is amended and assigned to a property of the same type.
(Each assignment requires a new delegate.) - It rules out many potential optimizations, such as using a special representation for
Listing<Int>
.
The current design also means that bugs can go undetected in programs that aren't fully typed.
Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is expensive in the common case where a listing is amended and assigned to a property of the same type.
(Each assignment requires a new delegate.)
Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.
It rules out many potential optimizations, such as using a special representation for Listing.
I'm not so sure; imagine we have a special representation of Listing<Int>
called VmIntListing
:
class ListingAmendNode {
@Specialization
VmIntListing eval(VmIntListing parent, long[] members) {
// we can keep the specialized representation
}
@Fallback
VmListing eval(VmIntListing parent, Object[] members) {
// need to drop back to generic `VmListing` now
}
}
Fortunately, accepting this SPICE doesn't rule out giving alternative designs a fair shot later on (before 1.0).
Yup! And this is a orthogonal to this SPICE anyway (how typechecking works vs. how amend is typed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate here? It would be helpful to see a code snippet that demonstrates this.
a: Listing<Int> = new { 0 }
b: Listing<Int> = (a) { 1 }
c: Listing<Int> = (b) { 2 }
In the common case that an amended listing is assigned back to a property of the same type (or even the same property), one delegate is created for every amend, doubling the number of VmListing
objects.
It gets worse for Listing<Listing<Int>>
.
I'm not so sure; imagine we have a special representation of Listing called VmIntListing:
Right, that's exactly what I imagined. My point is that if we have an optimized VmIntListing
representation, we likely don't want to have both VmListing
and VmIntListing
in a parent/delegate chain.
This addresses the following issues:
apple/pkl#405
apple/pkl#406
The implementation for this SPICE can be found here: apple/pkl#628