As much as we’ve abstracted and industrialized programming, under the hood it’s still math. The underlying mathematics of software drive our programming languages and our algorithms, providing tools and concepts we use to build code.
Code is complex, a knot of functions that execute in different ways at different times. We can think that it works, we can see that it works, but can we prove that it works? It’s possible to use some of the concepts at the heart of functional programming to design languages that produce code that’s mathematically provable, so we can use static analysis techniques to understand how our code will run under various starting conditions.
Having provable code is important when we want secure code. We need to be able to see where code violates type safety, where there’s a risk of unplanned halting or memory overruns. We know how code failures can affect the security of our applications, so we need to be able to prove that our code can never get into states that break our security model.
When we look at the cutting edge of computer science, we can see the intersection of math and code in experimental tools and languages that aim to implement some of these techniques. One such project is being worked on by Microsoft Research and the French national research center, Inria.
F* (F-star to its friends) is a functional programming language designed to support program verification techniques. You write code in F*, verify it, and then export it in any one of its target languages and environments. The language is mature enough that it’s used to develop itself, compiling in OCaml with an active development community on GitHub.
It’s already showing interesting results, being used to develop a secure, verified version of HTTPS as part of Project Everest. This intriguing academic project delivers formal proofs for key security technologies that underpin much of our modern e-commerce environment.
Project Everest has been used to build parts of the HTTPS stack already, including the TLS-1.3 record layer. This is an important part of the protocol, acting as the bridge between applications and the internals of HTTPS. It manages encrypted messages and needs to be secure to ensure that the only weak point in the system is its cryptographic libraries. By implementing it in F*, Project Everest has been able to ensure that the record layer itself is secure, with the resulting miTLS code forming part of Microsoft’s implementation of the QUIC HTTP standard.
F* is also being used to produce verified versions of common cryptographic libraries, exporting its code as C and assembly as part of the high assurance cryptographic library (HACL*) and ValeCrypt library, as well as in the EverCrypt cryptographic provider that sits on top of HACL* and ValeCrypt, choosing the best implementation of your chosen algorithm based on your processor and execution environment. It’s in use in the Azure Confidential Consortium framework and the WireGuard VPN used by the Linux kernel.
Other tools that have benefited from a F* implementation include a WebAssembly implementation of the Signal secure messaging protocol and a verified version of the Device Identifier Composition Engine (DICE) measured boot tool running in microcontroller firmware. You’ll find much of the Project Everest work on GitHub, with source code and a Linux Docker image that’s built daily.
Working with F*
So how do you use F*? It’s a surprisingly flexible language, with tools for most popular editors, including Visual Studio Code. You write code in F*, run it through a verifier, and when ready to use it, export it to your target language ready for use. The authors describe it as a “dependently typed” language, with a focus on delivering functional correctness and managing security properties and resource usage. The language wiki provides resources to help you get started, and there’s an online guide to programming in F*.
There’s very little F* can’t do. It owes a lot to functional programming methods, but it can be used like any other programming language as well. Used one way it’s a low-level systems programming language; used another way it’s a tool for building message-based distributed applications, ready for use in the public cloud alongside other microservice tools.
As a language, F* is perhaps better thought of as an ecosystem of different languages, each focused on specific use cases. These internal, domain-specific languages are the best way to approach F* programming, picking the one that’s closest to both your experience and your application.
One of the more useful options is Low*. This is for situations where you might normally use C, for low-level systems applications where security is critical. You can use familiar C-like programming methods, with an internal compiler that generates C code for compilation as part of a C-based project. There’s even stack and heap memory management constructs and tools for managing correctness. Functions in Low* have type signatures that indicate they’re both safe and correct, which build the proofs used to ensure your code does what it indicates.
Dependent types and theorem provers
At the heart of F* is the concept of dependent types. Here the definition of a type depends on a value: for example, constructing dependent functions where the type returned depends on the value of one of the functions arguments, or dependent pairs where the type of one value depends on the first. This approach helps make a function type-safe, ensuring that an array can’t exceed a predefined size.
This approach allows the F* tools to check for types and values as part of their proof-checking function, generating code to test your application based on the structure of the dependent types you’ve used. F* uses this to ensure that there’s a proof for your code, which itself can be used to compare the proof with security policies, ensuring memory safety, for example.
F* uses an automated theorem prover to check your code, using satisfiability modulo theories (SMT). When your code is checked, F* brings together the facts to be proven and builds a proof that’s then run through the F* Z3 SMT Solver. This is an award-winning tool that’s been used to rapidly solve some complex problems that would have otherwise taken significant amounts of compute time, such as checking the Azure firewall code in seconds rather than millions of years. Using Z3 can be complex, especially when you have to build models by hand. With F* automating the process, it becomes a tool anyone can use.
From computer science to everyday coding
One of the more useful features of F* is its ability to target many different development environments, including WebAssembly. With its increasing importance as a cross-platform runtime for all classes of device, WASM support makes a lot of sense as it simplifies the compiler aspects of F*. If your default target for provably correct code is a sandboxed runtime, then you’ve significantly reduced security risks. The combination of F* and WASM (using Web Assembly’s standalone WebAssembly System Interface [WASI] runtime) could offer a way of delivering enhanced security for SCADA industrial systems.
It’s good to see what was pure computer science research only a few years ago starting the journey to become commonplace. F* isn’t quite mainstream yet, but it’s a long way from the original Z3 SMT, and it shows how provable code can become part of your everyday development environment.
As F* gains new domain-specific languages it’s possible to imagine it becoming part of technologies like the Roslyn compiler. Using .NET to generate dependent types and treating it as just another F* domain-specific language, it’s possible to imagine significantly safer C# applications, reducing risk to users. It’s still a long road, but Microsoft’s internal use of F* to build and verify its own QUIC networking stack is a sign that it’s almost ready for prime time.
Copyright © 2021 IDG Communications, Inc.