The Blog

  • printf*

    It must have been around 2002 when I first read a paper by Lennart Augustsson about a language called Cayenne. It was the first time I had come across dependent types. Lennart’s first example motivating the need for (and the power of) dependent types is printf: a function used by legions of programmers that is, in an essential way, very dependently typed. Having worked since then on many programming languages with various forms of dependent types, now, 15 years later, a language that I contribute to has a proper, dependently typed printf. Well, a sprintf, but Cayenne’s example was also a sprintf. This post is about explaining how sprintf works in F*.

    What is sprintf?

    Some variant of sprintf appears in the standard library of many programming languages—it allows for a string to be formatted and assembled from a number of programmer-provided values in a convenient way. A basic usage is like this:

       let format_date (month:string) (day:nat) (year:nat) =
           sprintf "%s %d, %d" month day year
    

    Where, format_date "Nov." 22 2017 returns the string "Nov. 22, 2017".

    Or:

       let format_date (dow:string) (month:string) (day:nat) (year:nat) =
           sprintf "%s, %s %d, %d" dow month day year
    

    The curious thing here is that sprintf’s first argument is a “format string”, and the number and types of its remaining arguments depends on the format string: i.e., sprintf is the quintessential dependently typed function.

    A sketch of sprintf in F*

    Here’s one type for a simplified version of sprintf in F*.

    val sprintf: s:string{valid_format_string s} -> sprintf_type s
    

    Given a first argument s, a valid_format_string, the return value has a type sprintf_type s that is a function of the format string s.

    Defining valid_format_string s is relatively straightforward, e.g., if one wanted to only allow %d and %s as format specifiers (with %% as an escape sequence for printing “%” itself), one could do it like this:

    let rec valid_format_string = function
        | [] -> true
        | '%'::'d'::s 
        | '%'::'s'::s 
        | '%'::'%'::s -> valid_format_string s    
        | '%'::_ -> false
        | _::s -> valid_format_string s
    

    sprintf_type s is a little more unusual: it is a function that computes a type from a format string s. Here’s one version of it:

    let rec sprintf_type (s:string{valid_format_string s}) =
        match s with
        | [] -> string
        | '%'::'d'::s -> (int -> sprintf_type s)
        | '%'::'s'::s -> (string -> sprintf_type s)
        | '%'::'%'::s -> sprintf_type_chars s
        | _::s -> sprintf_type s
    

    Look at that closely: in the first case, if the format string is empty, sprintf "" just returns a string. If the format string begins with "%d", then sprintf expects one more argument, an int, followed by whatever arguments are needed by the rest of the format string, and so on.

    Note: I cut a corner there, treating a string as a list of characters, whereas in F* you have to explicitly coerce a string to a list of characters. You can see the whole program, corners restored, at the end of this post.

    That was simple! Why did it take so long for F* to get sprintf?

    The version of sprintf provided in F*’s standard library (FStar.Printf.sprintf) is somewhat more complex than the simple example we’ve just seen. For one, it supports many more format specifiers than just %d and %s. But, that’s just a detail. More essentially, in order for callers of sprintf to be checked efficiently, the type-checker has to compute the recursive function sprintf_type s at type-checking time.

    This kind of type-level computation is a distinctive feature of many dependently typed languages, a feature that F* acquired in full generality about 2 years ago. Unlike other dependently typed languages, in addition to type-level computation, the F* type-checker also makes use of algebraic and logical reasoning via an SMT solver. Balancing these features has required quite a lot of care: e.g., compute too much in the type-checker and the SMT solver is faced with reasoning about enormous, partially reduced terms; compute too little and the SMT solver has to do a lot of computation, which is not its strong point.

    For sprintf, we don’t want to rely on SMT reasoning at all, since everything can be checked simply using type-level computation. For this, F* requires a hint in the type of sprintf:

    val sprintf : s:string{normalize (valid_format_string s)}
               -> normalize (sprintf_type s)
    

    Notice the additional calls to normalize, which signals to F* to fully reduce valid_format_string s and sprintf_type s before resorting to SMT reasoning.

    Additional material

    See the actual library (based on F* code that Catalin Hritcu initially modeled after Arthur Azevedo de Amorim’s Coq sprintf) for more details about how this works, including how we compile away uses of sprintf to a bunch of string concatenations.

    Here’s a small standalone version. Compare it to the version in Cayenne … it’s pretty similar, but it took (well, at least it took me) 15 years to get there! : )

    module SimplePrintf
    open FStar.Char
    open FStar.String
    
    let rec valid_format_chars = function
        | [] -> true
        | '%'::'d'::s 
        | '%'::'s'::s 
        | '%'::'%'::s -> valid_format_chars s    
        | '%'::_ -> false
        | _::s -> valid_format_chars s
    
    let valid_format_string s =
        valid_format_chars (list_of_string s)
    
    let rec sprintf_type_chars (s:list char) =
        match s with
        | [] -> string
        | '%'::'d'::s -> (int -> sprintf_type_chars s)
        | '%'::'s'::s -> (string -> sprintf_type_chars s)
        | '%'::'%'::s -> sprintf_type_chars s
        | _::s -> sprintf_type_chars s
    
    let sprintf_type s = sprintf_type_chars (list_of_string s)
    
    let rec sprintf_chars 
            (s:list char{normalize (valid_format_chars s)})
            (res:string)
        : normalize (sprintf_type_chars s)
        = match s with
          | [] -> res
          | '%'::'d'::s -> fun (x:int) -> sprintf_chars s (res ^ string_of_int x)
          | '%'::'s'::s -> fun (x:string) -> sprintf_chars s (res ^ x)
          | '%'::'%'::s -> sprintf_chars s (res ^ "%")
          | x::s -> sprintf_chars s (res ^ string_of_char x)
    
    let sprintf (s:string{normalize (valid_format_string s)})
       : normalize (sprintf_type s)
       = sprintf_chars (list_of_string s) ""
    
  • F* v0.9.5.0 released

    On 24 August 2017 we released F* v0.9.5.0. This is another big release with lots of changes and new features compared to v0.9.4.0.

    Main new features

    • Proofs by reification (see this paper)
    • A revision of the libraries based on a new formal account of monotonic state (see this paper)
    • Extraction of programs with user-defined effects
    • Experimental support for tactics
    • New IDE protocol and new IDE features: autocompletion, evaluation, real-time syntax checking, jump-to-definition, type-at-point, etc.

    Changes and other improvements

    • A reorganization of the library and a single fstarlib.cmxa against which to link all F* programs compiled to OCaml (this change is incompatible with previous versions of F*)
    • A new printer of source terms
    • Revised error reporting from failed SMT queries
    • Improved support for separate compilation via a binary format for checked modules
    • Fixed a ton of bugs (179 closed GitHub issues)
  • Teaching F*

    Having to teach F* provides strong motivation to dust off the cobwebs and tidy away long forgotten bread crumbs hidden deep down in remote directories to make the language easier to install and use. It is thus no coincidence that major releases have been aligned with some of us going back to school after a long summer of coding to step out there and present the newest features of the language to a crowd of rowdy students.

    The latest two occasions:

    It is no coincidence that both courses had a decidedly cryptographic focus given that F* is the language of choice of the Everest project.

    What features would you like to see included in future research schools? And what do you think are the biggest stumbling blocks when learning and teaching a hot-off-the-press research language like F*?

  • F* v0.9.4.0 released

    The F* team is pleased to announce the release of F* v0.9.4.0. This is the culmination of exactly one year of hard work from a very quickly expanding F* team. We’re not very good at keeping precise change lists, but here are the main highlights of this release:

    • Predicative hierarchy of universes with universe polymorphism
    • Uniform syntax between expressions and types allowing rich type-level computation
    • Dijkstra Monads for Free
    • Extraction to C via KreMLin
    • New parser based on Menhir
    • New pretty printer for surface syntax and fstar --indent
    • Changed default effect to Tot
    • Strict positivity check for inductives
    • New synatax for inductive type projectors and discriminators
    • Better semantics for module open and support for local opens
    • Better dependency analysis
    • Better error locations for Z3 failures
    • Replaced Z3 timeouts with machine independent resource limits
    • Cleaned up libraries and examples (a bit)
    • Improvements to interactive mode
    • Docker builds
    • Fixed a ton of bugs (262 closed GitHub issues)

    Enjoy the best F* release ever!

  • Introducing KreMlin

    The work we do these days on F* is often in service of Project Everest. The goal of Everest is to verify and deploy a drop-in replacement for the HTTPS stack, the protocol using which you are probably reading this page, securely (hopefully). So far, we’ve been focusing most of our efforts in Everest on TLS, the protocol at the heart of HTTPS.

    Right now, I’m stuck in the Eurostar back from our week-long meeting in Cambridge, UK, so it feels like a good time to write down some thoughts about KreMLin, a new compiler backend that we’re using in Everest, that several of us have been working on over the summer, at MSR and INRIA.

    As a reminder, Everest sets out to verify and deploy secure cryptographic protocols, starting with TLS 1.3. Deploy is the salient part: in order to see people adopt our code, we not only need to write and prove our TLS library, but also to

    • make sure it delivers a level of performance acceptable for browser vendors, and
    • package it in a form that’s palatable for a hardcode Windows developer that started writing C before I was born.

    A TLS library can, roughly speaking, be broken down into two parts: the protocol layer that performs the handshake (“libssl”) and the cryptographic layer that actually encrypts the data to be transmitted (“libcrypto”). The handshake connects to the server, says hi, agrees on which algorithms to use, and agrees on some cryptographic parameters. Once parameters have been setup, the cryptographic layer is responsible for encrypting the stream of data.

    ...Read more
  • Welcome to F*!

    After many discussions, and in the spirit of the Gallium Blog (where I was a regular), the F* team is happy to announce the F*-blog! Expect a variety of posts, ranging from technical digressions about Dijkstra Monads to engineering discussions about parsing technology, and pretty much anything in between.

    One of our stated goals is to make F* more accessible to beginners; this means making the setup easier, but also writing more documentation, so that people who are not in the vicinity of the F* team can write programs, too. We’ve started an effort on the wiki; the goal of this blog is to complement the wiki and make it easy for F* enthusiasts to keep up with the development; be notified about breaking changes on the master branch, and more generally make the development process more open.

    I expect that this blog will also cover related projects, such as KreMLin, our F*-to-C translator, and miTLS, our ongoing implementation of TLS 1.3 in F*. Stay tuned! ☭

subscribe via RSS