Essence and Foundation of the Practical Type System (PTS)
First Published |
2023-10-17 |
Author |
Christian Neumanns |
Editor |
Tristano Ajmone |
License |
This is part 2 in a series of articles titled How to Design a Practical Type System to Maximize Reliability, Maintainability, and Productivity in Software Development Projects.
It is recommended (but not required for experienced programmers) to read the articles in their order of publication, starting with Part 1: What? Why? How?.
For a quick summary of previous articles you can read Summary of the Practical Type System (PTS) Article Series.
Note
Please be aware that PTS is a new paradigm and still a work-in-progress. As explained in section History of the article Essence and Foundation of the Practical Type System (PTS), I created a proof-of-concept implementation which is now a bit outdated — therefore you won’t be able to try out the PTS code examples shown in this article.
Introduction
In this article we'll take a closer look at PTS — its goals and non-goals, its history, what to avoid, and its core types.
Context
This section provides essential contextual information necessary for grasping the essence of PTS.
Goals and Non-goals
The aim of this series is to present and discuss a partly-implemented-and-tested type system designed to maximize reliability, maintainability, and productivity in software development projects.
We'll assume that PTS is used in a high-level, imperative, object-oriented, compiled programming language, designed for application programming (including the development of libraries and frameworks).
However, PTS is not bound to a certain programming language paradigm. Most PTS concepts could be used, for example, in a procedural or functional programming language, or in a language designed for systems programming.
PTS is meant to simplify life for software developers working on mid- to large-size code bases. As we'll see in subsequent articles, this can't be achieved with a simple type system — it can only be achieved with an advanced type system composed of a sound set of mechanisms that need to work seamlessly together. Simple type systems that are easy to learn might be appropriate in small, one-developer projects. They can be nice to use — until the project grows in size and we start spending hours upon hours fixing bugs that could have been reported immediately at compile- or edit-time by a more advanced type system. Therefore simple type systems are ill-suited in mission-critical, real-life enterprise projects, because they significantly aggravate the task of writing reliable and maintainable software in reasonable time. PTS is an advanced type system, and therefore non-trivial to implement.
The goal is to design a type system that is easy to use and difficult to misuse, rather than one which is easy to implement.
I'll focus on showing why some features/mechanisms should be part of a practical type system, and why other features should be omitted. The rationale behind each concept will be illustrated by source code examples to illustrate how PTS code looks like. However, I won't provide a comprehensive specification or dig into implementation details and APIs. We'll primarily focus on the why (the benefits), not the how.
History
The Practical Type System (PTS) originated in the Practical Programming Language (PPL). I created PPL to try out how a programming language could help writing more reliable and maintainable code in less time. PPL is a compiled, high-level, object-oriented and functional programming language designed for application programming.
I wrote the first versions of PPL in Java. After some iterations, I bootstrapped the PPL compiler in PPL, so that PPL was eventually written in PPL.
Notably, PPL was used to write:
-
The first versions of the Practical Data and Markup Language (PDML).
-
The first versions of the Practical Markup Language (PML) (which I've been using since 2019 to write all my articles, including the original version of this one).
Note: PDML and PML were eventually rewritten in Java, to render them more suitable for open-source projects.
-
Some minor parts of a commercial ERP application.
Hence PPL wasn't just a toy language — it was tested and usable to a certain extent, albeit still far from being production-ready (incomplete documentation, a rudimentary standard library, no IDE support, etc.). Because of its immaturity, PPL was never open-sourced. It is now hibernating for a few years — but that might change in the future.
The point is, PPL helped me to learn a lot about language design and type systems, and now I want to share and discuss what I've learned. Constructive feedback is most welcome, and the resulting knowledge will hopefully be useful to design future programming languages.
In a nutshell: PTS is a set of ideas which I fleshed out in PPL, and I'm now publishing them since they seem to be useful.
We need to differentiate between the PTS paradigm and the PTS syntax. Both emerged from PPL, and they'll be introduced in the next sections.
PTS Paradigm
The PTS paradigm encompasses a set of type system features designed to maximize reliability, maintainability, and productivity. The aim is to provide a unique set of practical features that complement each other, work together seamlessly, are easy to use, and ultimately help developers to write better code in less time.
Some PTS features are borrowed from other programming languages (e.g. static typing and immutability by default). Other common type system features are designed differently from how other languages approach them (e.g. null-safety and error-handling). Some novel approaches will be presented too. In the upcoming articles I'll mostly focus on these non-ordinary features.
PTS Syntax
Besides the PTS paradigm, the PTS syntax also originated in PPL. It's a new syntax which aims to be succinct, easy to understand, and well suited to write code based on the PTS paradigm. In the following articles I'll use this syntax in all PTS source code examples.
Note
The PTS syntax is an improved version of the original PPL syntax. Therefore, most PTS source code examples will not work if you try them out in the current PPL version.
The syntax will be explained whenever needed, however it's worth mentioning right away some basic syntax rules.
The common C-style approach to code blocks is to enclose them within curly braces:
if (condition) {
i = 1;
} else {
i = 2;
}
The PTS syntax doesn't use braces or other paired delimiters. Instead, a period (.
) is used to terminate a block, as shown below:
if condition
i = 1
else
i = 2
.
Note
The required period at the end of the block eliminates problems and ambiguities that can occur in languages that use only whitespace to define control flow and structure (like Python).
For example, the above code could (but shouldn't) be written as follows, and still work correctly:
if condition
i = 1
else
i = 2
.
Naming conventions are as follows:
-
Snake case style is used for identifiers, and uppercase letters are used for acronyms (e.g.
first_item
,XML_to_JSON_converter
). -
There is no rule for starting specific kinds of identifiers with uppercase letters. For example, the rule "All type names start with an uppercase letter" is not applied in the PTS syntax.
In rare cases of ambiguity (or if one wants to explicitly state the kind of identifier), a dedicated prefix (specified for each kind of identifier) can be used (e.g.
ty_string
instead ofstring
, to explicitly denote type string). -
Identifiers are case-sensitive (e.g.
foo
,Foo
, andFOO
are different identifiers).
Some source code examples in the following articles assume that PTS is used in an object-oriented programming language. For example, I'll use o.m()
, where o
denotes an object, and m()
is a method call on that object.
Important!
The PTS paradigm (features) and PTS syntax do not depend on each other.
The PTS paradigm could be implemented, for instance, in a programming language that uses a syntax which is very different from the PTS syntax shown in this article series.
No-Go 'Features'
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
— Antoine de Saint-Exupéry
Excellent advice!
Before adding features, we should first think hard about what not to add.
Whole classes of bugs and unnecessary complexity can easily be eliminated by simply not supporting several features, even if they are popular and might make sense in other contexts.
Below is a list of features that raise the risk of software bugs, add unnecessary complexity, and often jeopardize productivity and deprive us of the joy of coding. Some of these features are more related to the programming language, but since they go hand in hand with the type system they are worth mentioning here.
The following features have no place in PTS:
-
Undefined state
Example: No protection against using an uninitialized variable. The variable might contain any random garbage that happens to be in memory when its value is accessed.
For more information and examples, read the Wikipedia article Undefined value, which warns us:
There is no limit on what might happen. At best, an easily detectable crash; at worst, a subtle error in a seemingly unrelated computation.
-
Undefined behavior means that the behavior of the application can become totally unpredictable because of some language use case which is neither specified by the programming language, nor by its implementations (compilers/interpreters).
For example, in some languages integer division by zero results in undefined behavior.
Wikipedia states:
In the C community, undefined behavior may be humorously referred to as 'nasal demons', after a comp.std.c post that explained undefined behavior as allowing the compiler to do anything it chooses, even 'to make demons fly out of your nose'.
For practical illustrations showing what this really means I recommend delving into the following articles:
-
What Every C Programmer Should Know About Undefined Behavior #2/3, written by Chris Lattner (co-founder of LLVM and Clang compiler)
-
A Guide to Undefined Behavior in C and C++, Part 1, written by John Regehr, Professor of Computer Science, University of Utah
Undefined state and undefined behavior are scary — really scary.
-
-
Unspecified behavior is closely related to undefined behavior, albeit different from it. In this case the behavior is unspecified by the programming language too, but the compiler/interpreter chooses what to do, and this might be specified in the documentation of the tool.
For example, compiler A does X in a given situation, while compiler B does Y.
This opens the door to different run-time behaviors for the same source code, depending on the compiler/interpreter used in the given context, as well as the host OS and underlying hardware architecture. For example, an application might run correctly if compiled with one compiler, but fails to run properly if compiled with another compiler.
-
Silently ignored errors
Example: silently ignored arithmetic overflow errors, such as
2_000_000_000 + 2_000_000_000
(addition of two signed 32 bits integers) being incorrectly evaluated to-294_967_296
in C#, Java, and other programming languages. -
Unsafe null-handling
No protection against the infamous null pointer dereference (nowadays synonym with the "billion dollar mistake").
-
Mutable data by default
Data should be immutable by default, and mutable only when explicitly specified. For a summary of the rationale see section Make all data structures immutable, unless there is a good reason to make them mutable! in my article Fundamental Pragmatics for Successful Programmers.
-
Shared mutable data
Shared mutable data can cause weird behavior and bugs that are extremely difficult to find and fix. This is well explained in Henrik Eichenhardt's article Why shared mutable state is the root of all evil.
-
Implicit type conversion/coercion
Implicit type conversions/coercions are error-prone whenever there is a risk of silently ignored loss of information, wrong data, or other nasty bugs.
There are different variations of this "no-go feature".
Probably the most prominent case is the implicit conversion/coercion between numeric types of different natures and/or different ranges, such as coercing a floating point number into an integer number, or a 32-bit integer into a 16-bit integer.
Another example would be a string that is implicitly coerced into a number, or zero if the string doesn't represent a valid number.
-
Dynamic typing
A type system designed for mid- to large-size projects should support static typing, because it allows catching more bugs at compile-time, and offers other advantages.
The following are some dynamically typed programming languages: JavaScript, Lisp, Lua, PHP, Python, and Ruby.
Static typing is used in C, C++, C#, Go, Haskell, Java, Kotlin, Rust, and other languages.
Note
Static typing does not prevent a type system from providing also mechanisms that mimic dynamic typing, if needed.
-
Pointer arithmetics
For a quick overview, see What are the dangers of pointers in C?
-
Value types and reference types co-existing for the same data
There are important differences between value types and reference types, and ignoring them can lead to subtle bugs.
For example, when an object is changed in a function, this change will be reflected in the calling context only in case of a reference type. If the data is passed by value, the change will not be propagated to the calling context.
In C# and Java, integers can be values or references.
C# example:
int i1 = 1; object i2 = 1;
Java example:
int i1 = 1; Integer i2 = 1;
For more information, read Value Type and Reference Type (related to C#, but mostly applicable to other languages too).
-
Excessive functionality in the root type of the type hierarchy
Examples: Equality, comparison, hashing, copying functions/methods, and further functionality present in the root type (well explained for C# and Java in Jon Skeet's article Redesigning System.Object/java.lang.Object).
-
Other
Unneeded complexity and error-proneness can be caused by various other type system or language constructs. For example, the following horrors are all found in JavaScript:
-
truthy, falsy, and nullish values
-
shadowed and hoisted variables
-
complex rules for
==
and===
operators -
exotic objects.
-
Note
Some features make sense in environments with performance constraints, such as in systems programming and real-time systems. For example, not checking for buffer/arithmetic overflows/underflows, pointer arithmetics, and manual memory management can increase efficiency. Manual memory management (instead of using an automatic garbage collector) is often the only viable choice in real-time systems with hard constraints on maximum time delays, where arbitrary time lags of even just a few microseconds (typically caused by garbage collectors) can't be tolerated. However, these features should generally not be part of a high-level type system designed for application programming where maximum reliability, maintainability and productivity are more important than maximum efficiency.
We all want performance, but if a choice must be made between reliability and performance in PTS, then reliability wins. For example, checking for arithmetic overflow errors increases reliability, but decreases performance. We don't want to deal with difficulties like those explained in Russ Cox's article C and C++ Prioritize Performance over Correctness. To cover performance-critical parts of an application, a high-level programming language should support calling code written in low-level languages optimized for performance, such as C or Rust.
To keep this section short, we won't elaborate on the perils of the above features. Practice shows that they have all caused numerous software application failures — some of them resulting in disasters. Just remember:
-
The Ariane 5 rocket exploded due to coercing a 64-bit number into a 16-bit space.
-
Microsoft states: "70 percent of all security bugs are memory safety issues".
More examples can easily be found in JavaScript . The good part of the type system in JavaScript is that it teaches us how not to design a practical type system. People are now well aware of the many idiosyncrasies lurking in this language and hitting unexpectedly in all kinds of applications. Some of them are just funny, others might leave you speechless. Many examples are shown in What the f*ck JavaScript? (my favorite: null is falsy, but not false). There is even a song called Bug in the JavaScript, written by Dylan Beattie, an excellent speaker and the creator of the esoteric Rockstar programming language.
Core Types
Every type system needs to provide some core, native types. Core types are the building blocks for more complex types; they provide an essential foundation for creating customized, project-specific types.
Note
The ability to create custom types is a basic requirement that allows developers to apply the Data First Approach and the PTS Coding Rule, both explained in the previous article.
This section provides a quick overview of PTS core types. Type operations (e.g. how to get a sub-string or how to filter a collection) and implementation details are out of scope.
Note
Summary for experienced software developers:
PTS provides scalar types (character, string, number, boolean), collections (list, set, and map), and null
(yes, null
, not Maybe
or Option
, for reasons explained later).
You can just skim through this section or skip it.
Scalar Types
Type character
A character
is a single letter, digit, symbol, or other unit typically used in text processing. Characters are encoded in UTF-8. Hence, a character can represent any Unicode code point.
A character
literal is surrounded by apostrophes, and \
is used as escape character (C-style syntax).
Examples:
'a'
'1'
'!'
'\n' // <line feed>
'\'' // '
'\u2714' // ✔
'\U001F600' // 😀
Type string
A string
is a sequence of characters/Unicode code points.
Common C-style syntax is used for strings. A string
literal is surrounded by quotes, and \
is used as escape character.
Examples:
"foo" // foo
"line 1\nline 2" // line 1
// line 2
"\"Hello\" \\ \U0001F60A" // "Hello" \ 😊
A practical type system should also provide good support for:
-
literal expressions — for long, multiline strings, without escape sequences
-
string interpolation, to embed expressions in a string literal
Here's a PTS example:
write_line ( """File: C:\foo\bar.txt
SQL command: select "first_name", "last_name" from "employees"
line \3\
""" )
const name = "Peter Deutsch"
const quote = "If you get the data structures and their invariants right, most of the code will just write itself."
write_line ( """{{name.to_upper_case}} said:
"{{quote}}"""" )
Output:
File: C:\foo\bar.txt SQL command: select "first_name", "last_name" from "employees" line \3\ PETER DEUTSCH said: "If you get the data structures and their invariants right, most of the code will just write itself."
Type number
There are two sub-types of number: integer
and decimal
.
Type integer
An integer
is a positive or negative number, or zero, without a fractional component. It includes whole numbers (0, 1, 2, ...), and natural/counting numbers (1, 2, 3, ...).
Examples:
123
0
-123
// largest prime number found with a mechanical calculator by Aimé Ferrier:
20988936657440586486151264256610222593863921
Type decimal
A decimal
is a positive or negative number, or zero, with a fractional component.
Examples:
123.45
0.0
-1.007
// pi with 50 decimals:
3.14159265358979323846264338327950288419716939937510
// using an exponent:
10E10
-123.45E-12
Note
integer
and decimal
are arbitrary-precision numbers, limited only by the available memory of the host system. There's potentially no limit to the number of digits.
Signed, unsigned, and non-zero numbers, as well as number ranges (e.g. 1 .. 10
) play an important role in PTS. This will be covered in subsequent articles.
Type boolean
Type boolean
has two values in its set: true
and false
.
An alternative (adopted in PPL) would be to use type yes_no
with values yes
and no
.
Collections
There are three native collection types: list, set, and map.
Type list
A list is an ordered collection of elements.
A list literal containing three strings looks as follows:
[list<string> "good" "better" "best"]
Whitespace and/or a comma are used to separate list elements.
Note that the type name of the collection is written right after the opening bracket. Moreover, the type name of the elements is written between angle brackets (<>
). All collections are immutable by default. Thus, the above code defines an immutable list
whose elements are all of type string
.
Here is an example of a mutable, heterogeneous list that can contain elements of any type:
[mutable_list<any> "abc" 'd' 'd' 1+1]
If the type of the list elements is not explicitly specified, then the compiler infers it. However, this is only supported if all elements are of the same type. Hence, this code:
[list<boolean> true false true]
... can be shortened to:
[list true false true]
Type set
A set is an unordered collection of unique elements.
Example:
[set "Tim" "Tom" "Tam"]
Type map
A map is a collection of key/value pairs. The keys in a map must be unique, hence they are a set.
A map is also known as dictionary, associative array, or symbol table, depending on the programming language.
Example of a map whose keys are strings, and the values can be of any type:
[map<string,any>
"digit" : 1
"letter": 'a'
]
Type null
Consider a customer_order
object with attribute delivery_date
of type date
. Suppose that the delivery date of a given order is still unknown. What should be the value of delivery_date
?
A common approach is to use null
to represent the absence of a value.
We do the same in PTS.
Yes, we do use null
to represent the absence of a value, even though null
is the reason for the infamous null pointer error and has become a synonym with "the billion dollar mistake," a term coined by Tony Hoare, the inventor of null
.
No, we do not use a Maybe
monad or an Option
type — two popular solutions used in some modern programming languages to eliminate null pointer errors.
However, the concept of null
in PTS, as well as the rules and specific operators for handling null
, differ from approaches used in other languages. Type null
will be fully covered in a subsequent, dedicated article titled Null-Safety in PTS. We'll see why a dedicated type null
is used (and not Maybe
or Option
), how null
handling looks like in source code, and how null-safety is fully ensured by the compiler.
There is no need to fear null
anymore — null
is our friend, not our enemy. We need it, and it doesn't cause any harm if we use it correctly. But we also need a type system that protects us against misusing null
.
What's Next?
By now you should have a good overview of what PTS is, isn't, and aims to achieve.
We want a type system that simplifies defining reliable types, and we want bugs to be detected as early as possible in the development cycle — preferably at compile-/edit-time, or else as soon as possible at run-time.
In the next articles we'll progressively discover the features required to achieve the goal of writing more reliable and maintainable code in less time. We'll talk about record and union types, null-safety, error handling, constrained scalars, eradication of the empty string, and more.
Acknowledgment
Many thanks to Tristano Ajmone for his useful feedback to improve this article.