Alexander Kuklev (akuklev) wrote,
Alexander Kuklev

Vision of Scala 3

This is an essey about a programing language that does not yet exist.
I'm going to show you what you can do with an extension of Scala 2.10 presented here and why this extension has certain universality. Please read the linked post, I'm going to use new syntax proposed there throughout this post.

Type system should Scala 3 be based on

Scala 3 certainly has to be based upon a type system called “Dependent Object Types”. It's rather complicated type system with both nominal and structural types and subtyping. Its subjects are essentially dependent tuples which can contain abstract type members besides normal members. There are also product types (functions), but their construction and elimination rules are not yet settled. Together, sum and product types provide a complete propositions-as-types encoding for first-order logic, which relates it to constructive type theories.

Due to presence of nominal types DOT can be an open-world theory, and it really is: it does not assume that all types it knows (by name) were necessarily defined within dot. So there might be foreign types that are not defined (or definable) within the system, but one can still work with objects of these types when the come with modules of operations working on them. Even though these operations are not and cannot be defined within DOT, they still can be used in it: one can compose them with other operations of compatible signatures. A concrete example would be the type of Reals, which are in general not representable in finite amount of computer memory and thus not definable within any constructive type theory, but assuming such type "exists in the outer world" we can write down programs working with them and reason about this programs even though we're not able to execute them. What's the use of that? — Well, after having defined a program for Reals we can translate it into a program manipulating Doubles (floating point numbers) and analyse if the latter is a correct approximation of the former. The nominality and open-world setting is a big advantage over Per Martin-Löf style type systems.

As it was already mentioned, DOT does not yet specify precise construction and elimination rules for product types. There were efforts to define the rules in such a way that DOT gains the strength of Fω with subtyping, but they led to contradictions. It's quite likely that this is possible and there is an ongoing work on this issue, but in this essay I will argue that we should stick to the small DOT core without extending it even to the full strength of System F with unrestricted recursion!
My vision is a weak core and pluggable completions producing dialects for reflecting different topoi (mathematical setups) to work in. There could be a strong Fω-like Scala variety and a weak variety with totality checking. The core is flexible enough to be adapted for arbitrary topoi to work in, from the classical ZFC set theory to Calculus of Constructions to Homotopic Type Theory. It's of course also fit for the legacy Scala topos, constituted by computations which can be carried out on an idealized JVM.

What are theories and models?
(Exposition of the syntax we need to describe theories comfortably)

Classical quote of Nicolas Bourbaki reads “Mathematics is a science that studies abstract structures”. In modern language one could rather say, a mathematics studies theories and their models.

A theory consists of a signature and a couple of properties it must conform to. Signature consists of a couple of abstract type members and members of specified types. Here's a signature of a semigroup (not quite complete):
trait Semigroup {
  type G
  def compose(a: G, b: G): G

If we consider a theory to be a structure endowing a type, it's better to use a typeclass notation:
trait Semigroup[G] {
  def compose(a: G, b: G): G
А model for a given signature is a named tuple assigning concrete types, concrete implementations of operations and concrete proofs demanded by signature. Concrete means, they are explicitly constructed in the topos (or a weaker realm) inside of which we model our structure. Trait (signature) is a type for corresponding models.

Whenever we want to have a type endowed with certain structure (say, Semigroup), we write type G: Semigroup. This notation is suggestive that typeclasses act like types of types, but actually it's just a shorthand for {type G; def model: Semigroup[G]}, a pair of a type and a model of the theory for this particular type.

Predicates can be modeled either as types or like operations returning Boolean, the latter is easier to understand for programmers and is also the topos-theoretic way. Here's a signature for equality theory:
trait Eq[T] {
  def _==(a: T, b: T): Boolean
We can also incorporate properties. Say, == must be built so, that (x == x) is true for any x. So, we might demand the proof of that fact in the signature. For boolean expressions, there's a macro type Evidence(expr), which is the type of proofs that the expression is true. (I'll use the unicode sortcut ✓ for Evidence) So, that's how we encode it:
trait Eq[T] {
  def _==(a: T, b: T): Boolean
  def reflexivity(x: T): ✓(x == x)
  def symmetry(x: T, y: T)(a: ✓(x == y): ✓(y == x)
  def transitivity(x: T, y: T, z:T)(a: ✓(x == y), b: ✓(y == z)): ✓(x == z)
We want to incorporate the property of associativity into the signature of the semigroup, so the elements of the group must have equality:
trait SemiGroup[G] extends Eq[G]{
  def compose(a: G, b: G): G
  // Alias for compactness:
  def _∘(a: G, b: G) = compose(a, b)
  def associativity(a: G, b: G, c: G):
    ✓((a ∘ (b ∘ c)) == ((a ∘ b) ∘ c))
Few more examples for you to get a feeling:
trait Monoid[M] extends SemiGroup[M] {
  def unit: M
  def leftCancelativeness(x: M): ✓(unit == unit ∘ x)
  def rightCancelativeness(x: M): ✓(unit == x ∘ unit)
type VectorSpace[V] extends AbelianGroup[V] = {
  type F: Field
  def action(scalar: F, vector: V): V

How to write proofs
Say you want to prove that a monoid has only one unit, i.e. every element of monoid havind the left or right cancellation property is equal to the unit. In other words, we want to define a procedure which being “given an element unit2 with left cancellation property” produces an “evidence that unit2 is the same as unit”:
def lemma1[M: Monoid](unit2: M)(leftCanc2: (x: M) => ✓(unit2 == unit2 ∘ x)) = {
  // apply left cancellation property of unit2 to unit and vice verse
  val p1: ✓(unit2 == unit2 ∘ M.unit) = leftCanc2(M.unit)
  val p2: ✓(M.unit == unit2 ∘ M.unit) = M.rightCancellation(unit2)
  // now we just have to simplify “unit2 == something == unit” to “unit2 == unit”
  val p3: ✓(unit2 ∘ M.unit == M.unit) =
      M.symmetry(M.unit, unit2 ∘ M.unit)(p2)
  val p4: ✓(unit2 == M.unit) =
      M.transitivity(unit2, unit2 ∘ M.unit, M.unit)(p2, p3)
The used dependent function type is a shortcut:
(x: M) => ✓(unit = unit ∘ x) ≡ {
  def apply(x: M): ✓(unit == unit ∘ x)

The notation Type[_] is used not only for typeclasses, these are generally ‘types parametrized by other types’. Due to availability of subtyping constraints on abstract type members, in DOT we can demand abstract type members (or type parameters) of traits to be not just types, but types parametrized by other types, so called higher-kinded types.

Beautiful example is provided by semicategories and categories which are "typed" versions of semigroups and monoids. You can always think of a semigroup as consisting of functions with their usual composition which is naturally associative. These must be functions type T => T for some fixed type T (say, real numbers) so that their composition is always defined. Semicategory is the generalization for the case we have many types and only functions with matching source and target types can be composed. Monoid is semigroup with unit, category is a semicategory with units. The signatures of semicategory and category repeat the ones of semigroup and monoid in all respects except that in first ones we had one type T of the semigroup/monoid elements, whereas in second ones we'll have parametrized (by source and target) families Arr[_,_] of element types:
trait SemiCategory[Arr[_,_] : Eq] {
  def compose[A, B, C](a: Arr[A, B], b: Arr[B, C]): Arr[A, C]
  def _∘[A, B, C](a: Arr[A, B], b: Arr[B, C]) = compose(a, b)
  def associativity[A, B, C, D](a: Arr[A, B], b: Arr[B, C], c: Arr[C, D]):
    ✓((a ∘ (b ∘ c)) == ((a ∘ b) ∘ c))
trait Category[Arr[_,_] : Eq] extends SemiCategory[Arr[_,_]] {
  def id[O]: Arr[O, O]
  def leftCancelativeness[O](x: M): ✓(id[O] == id[O] ∘ x)
  def rightCancelativeness[O](x: M): ✓(id[O] == x ∘ id[O])
(In Scala ≤2.10, the notation T[_] presupposes that the polymorphic type T should accept any types as arguments and in general, when bounds of type-parameter in a polymorphic definition are not specified, absence of bounds is presupposed. We suggest that in Scala 3 when no bounds for type parameters are specified, the most broad bounds allowed by the rest of the signature should be taken. It spares vast amounts of unnecessary writing. In particular, we want the definitions above to be applicable to types of functions on restricted or extended type universes, e.g. Function[A <: Collection, B <: Collection] (function between collection types) or Function[A: Ord, B: Ord] (functions between types endowed with order).)

The last thing we want to explore, is how to write signatures of theories which make use of ‘powerset’ and ‘subsets’ (powerobjects and subobjects in general setting, where we don't restrict us to set-theoretic topoi). Well, in general topoi, these are constructed using the _ => _ type (type of maps from one type to the other) and Boolean type (which is called subobject classifier in general setting). First note, that ‘subsets’ S of A are classified by maps isMember: A => Boolean.
So, the type of subobjects of T can be defined as
case class Restriction[T](isMember: T => Boolean) {
  type Element = T with {self =>
    membership: ✓(isMember(self))
(We can explicitly define the type for elements of the restriction as a dependent pair)

Once you have a restriction r: Restriction[S], you can define functions from elements of this restriction:
val even = Restriction[Nat](isMember = {_ % 2 == 0}) 
def bisect(x: even.Element): Nat = ...

We can also easiliy define structures like topology:
trait TopSpace[S]  {
  def topology: Restriction[Restriction[S]]

In DOT there is a subtyping relation which allows to use expressions of one type in the context where another type is expected in cases this is entirely safe and transperent. It solves the following two issues:
– Functions A => B to be applicable to terms of types which are restrictions of A;
– Stronger structures such as monoids to be accepted in context where a weaker structure (such as a semigroup) is required.

DOT uses a cumbersome mixture of nominal and structural subtyping which we I don't want to describe here completely, but I just want to underline that the mentioned features are solved by DOT perfectly: restriction types (as defined above) are subtypes of the types they restrict, stronger signatures are subtypes of weaker ones. Subtyping relation enables specifying lower and upper bounds for abstract type members of traits, which is precisely the mechanism that makes nominal induction, higher kinded types and lots of other valuable things possible.

Subtyping relation for types internal to DOT (i.e. traits) is even a complete one, traits build a lattice, their least upper bounds and greatest lower bounds are always well defined, which can be used to demand a type to conform to two signatures in a coherent way:
radixSort[T: Ord ∨ Weighted](xs: Collection[T])

The complicated issue is the subtyping for abstracted value types. We want all containers (List, Tree, etc) to be subtypes of Collection and Nat (natural numbers) to be subtype of Int. Iff abstracted type A (say, Nat or List) has a unique natural embedding into B (say, Int or Collection), we want that A <: B. So far I have no concrete idea how to implement it on the type system level. It's a rather deep issue: all minimal models of Peano Arithmetic have unique natural embeddings into each other, which means that while there is a multitude of models, there is exactly one abstracted type of natural numbers Nat which is also isomorphic to the universal Semiring. A type system that could take account of such subtleties must be at least as strong as HoTT.

* * *

This was an exposition of the trait definition sublanguage of Scala. When you're not using type providers and path-dependent types, like Evidence or .Element, the language has no term-level constructions, we only define member names and their types in a straightforward ways, interpretable in any category with finite products (or more generally, any circuitry) with all external types we use, like, say, Boolean, provided. The trait itself (as a type) will lay inside the category/circuitry iff the trait contains only types and constants or the category/circuitry is a (cartesian) closed one. Otherwise it lays in the enclosing topos.

When dependent types are used, suddenly the term-level language arises (see the expressions like (x ∘ unit) above). They can contain creation of modules and member extraction (harmless things interpretable in all circuitries), but also application and (dependent) abstraction, which are interpretable only in (locally) closed categories/circuitries.

By enhancing or restricting the richness of terms allowed in trait signatures (i.e. by working in more or less powerful circuitries), one can precisely control the tradeoff between expressiveness and manageability.

Typeclass-based polymorphism and open world setting

As we already mentioned, DOT doesn't assume, all types are defined inside of DOT, some might come externally (open world). Now consider a polymorphic function
def foo[T](x: T, y: T)
What can be done to the arguments in the body of the function? Absolutely nothing! Because, there is nothing you can do to values of unknown type, not even check their equality. Even though all types that can be defined within Scala automatically come with equality, we do not assume that all types have equality. The open world setting for typed systems reads “Inconstructibility of type does not imply its nonexistance”, which is closely related to open world assumption of ontology systems “Inability to derive a fact does not imply the opposite”. In practice, the types that are nonconstructible represent codata, such as possibly infinite sequences or (true) real numbers. (Of course, you can generate some infinite sequences deterministically and construct lots of real numbers, but constructible sequences/reals form an infinitely small fraction of all possible ones.) Constructible are only the interfaces (typeclasses) for codata types, Seq[Int] being an example.

To be able to do something, you must restrict, you must assume that T is endowed with some structure. For instance, like this:
def foo[T: Eq](x: T, y: T)
This is a shorthand for
def foo[T](x: T, y: T, T: Eq[T] = implicit)
So, the function takes an additional hidden argument: the model for the theory Eq[T], i.e. a tuple, where the operation eq(x: T, y: T) that checkes the equality, is defined. (It's no problem that I have used the same name for the type and for the associated model, they can always be differentiated from the context.) Now we can write something into the body of foo, for instance
def foo[T: Eq](x: T, y: T) = (x == y)
One of the main strengths of Scala is the presence of encapsulation, i.e. we can abstract out the implementation details of a certain type and forbid any functions to interact with instances of the type in any other way through the defined interface (signature).
implicit val m: Monoid = new Monoid {
  type M = (Nat, Int)
  def compose(a: M, b: M): M = ???
  // Implementation details, which are aware that M = (Nat, Int)
type M = m.M
// Implementation details of M are now hidden, you
// cannot find out that M = (Nat, Int) in the outer scope.*
// The only thing you know about it is that M : Monoid,
// since there is a defined implicit model Monoid[M].

There is a plenty of real-life examples for type-class bound polymorphism
def sort[T: Ord](xs: Collection[T]): List[T]

... and very few with unbound polymorphism:
def first[T](xs: List[T]): T
def the[T](x: Collection[T], uniqueness: ✓(size(x) == 1) = implicit): T
In order to ensure you're working with a type which is defined within Scala, use the typeclass TypeTag:
def foo[T: TypeTag](x: T)
Within body of this function you can be sure that T is not some foreign stuff like type of Real numbers, but a type defined by hand in some Scala source file, you can perform reflection, i.e. analyze the type signature and so on. Universe of TypeTag'ged type is the native topos of Scala.

You can work in other topoi as well. For example we can create a model of Agda II language type system within Scala.
Here's the code in Agda
data Nat: Set where
  zero: Nat
  succ: Nat → Nat
data List (T: Set): Set where
  nil: T → List
  cons: T → List T → List T

Here's the Scala code (Agda'esque DSL):
val nat = agda.datatype(constructors: Map(
  zero -> nat,
  succ -> (nat → nat)
def list(t: agda.Set) = agda.datatype(constructors: Map(
  nil -> list(t),
  cons -> (t → list(t) → list(t))

Now we can supply this model with a Scala type provider, i.e. a macro which converts t: agda.Set into native Scala types:
type Nat = agda.Type(nat)
type List[T: agda.Set] = agda.Type(list(T))

When creating the native type T type provider also puts the implicit value agda.Set[T] into the scope. It is the Agda'esque reification of this type much like TypeTag[T] is the native reification of T for native Scala types. Consider the following example where Scala signature transforms to an Agda'esque signature upon erasure:
last[T: agda.Set](xs: agda.List[T]): T
// Desugar context bounds:
last[T](t: agda.Set, xs: agda.List[T]): T
// Desugar type aliases
last[T](t: agda.Set, xs: agda.Type(agda.list(t))): T
// Pre-erasure type normalization
// (Special macro is being called before erasure)
last[T](t: agda.Set)(xs: agda.Type(agda.list(t))): agda.Type(t)
// Perform erasure
{val t: Set} => Type(list(t)) => Type(t)
// Compare with Agda:
{T: Set} -> List T -> T
(The only difference is that in Agda there is no difference between term-level and type-level, whereas in Scala you have to lift between levels; the lift from term-level type representation into the type-level is agda.Type(type).)

We can also define the model for ZFC set theory, but as opposed to Agda topos which can be equipped with operational semantics and used for computations, set theory can be essentially used only for reasoning and making proofs, because most of its operations operations are not constructive (either non-computable, or non-deterministic such as taking an arbitrary element from a set of size > 1). The type provider for ZFC-sets could produce only mock types, which wouldn't be a problem since the proofs are never executed, only (type)checked.
Tags: scala

  • Прогресс

    Десять дней назад, вторая ступень SpaceX'овского корабля Starship своим ходом слетала своим ходом на десять километров вверх, и усмепшно приземлилась…

  • О водосбережении

    Как известно, питьевая вода во многих странах дефицитный ресурс. И даже в дождливой Германии летом иногда случаются засухи, в результате которых она…

  • 36

    Традиционный деньрожденный пост. Год выдался необычный. :)

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.

  • Прогресс

    Десять дней назад, вторая ступень SpaceX'овского корабля Starship своим ходом слетала своим ходом на десять километров вверх, и усмепшно приземлилась…

  • О водосбережении

    Как известно, питьевая вода во многих странах дефицитный ресурс. И даже в дождливой Германии летом иногда случаются засухи, в результате которых она…

  • 36

    Традиционный деньрожденный пост. Год выдался необычный. :)