printf*

It must have been around 2002 when I first read a paper by Lennart Augustsson about a language called Cayenne. It was the first time I had come across dependent types. Lennart’s first example motivating the need for (and the power of) dependent types is printf: a function used by legions of programmers that is, in an essential way, very dependently typed. Having worked since then on many programming languages with various forms of dependent types, now, 15 years later, a language that I contribute to has a proper, dependently typed printf. Well, a sprintf, but Cayenne’s example was also a sprintf. This post is about explaining how sprintf works in F*.

What is `sprintf`?

Some variant of sprintf appears in the standard library of many programming languages—it allows for a string to be formatted and assembled from a number of programmer-provided values in a convenient way. A basic usage is like this:

   let format_date (month:string) (day:nat) (year:nat) =
       sprintf "%s %d, %d" month day year

Where, format_date "Nov." 22 2017 returns the string "Nov. 22, 2017".

Or:

   let format_date (dow:string) (month:string) (day:nat) (year:nat) =
       sprintf "%s, %s %d, %d" dow month day year

The curious thing here is that sprintf’s first argument is a “format string”, and the number and types of its remaining arguments depends on the format string: i.e., sprintf is the quintessential dependently typed function.

A sketch of `sprintf` in F*

Here’s one type for a simplified version of sprintf in F*.

val sprintf: s:string{valid_format_string s} -> sprintf_type s

Given a first argument s, a valid_format_string, the return value has a type sprintf_type s that is a function of the format string s.

Defining valid_format_string s is relatively straightforward, e.g., if one wanted to only allow %d and %s as format specifiers (with %% as an escape sequence for printing “%” itself), one could do it like this:

let rec valid_format_string = function
    | [] -> true
    | '%'::'d'::s 
    | '%'::'s'::s 
    | '%'::'%'::s -> valid_format_string s    
    | '%'::_ -> false
    | _::s -> valid_format_string s

sprintf_type s is a little more unusual: it is a function that computes a type from a format string s. Here’s one version of it:

let rec sprintf_type (s:string{valid_format_string s}) =
    match s with
    | [] -> string
    | '%'::'d'::s -> (int -> sprintf_type s)
    | '%'::'s'::s -> (string -> sprintf_type s)
    | '%'::'%'::s -> sprintf_type_chars s
    | _::s -> sprintf_type s

Look at that closely: in the first case, if the format string is empty, sprintf "" just returns a string. If the format string begins with "%d", then sprintf expects one more argument, an int, followed by whatever arguments are needed by the rest of the format string, and so on.

Note: I cut a corner there, treating a string as a list of characters, whereas in F* you have to explicitly coerce a string to a list of characters. You can see the whole program, corners restored, at the end of this post.

That was simple! Why did it take so long for F* to get `sprintf`?

The version of sprintf provided in F*’s standard library (FStar.Printf.sprintf) is somewhat more complex than the simple example we’ve just seen. For one, it supports many more format specifiers than just %d and %s. But, that’s just a detail. More essentially, in order for callers of sprintf to be checked efficiently, the type-checker has to compute the recursive function sprintf_type s at type-checking time.

This kind of type-level computation is a distinctive feature of many dependently typed languages, a feature that F* acquired in full generality about 2 years ago. Unlike other dependently typed languages, in addition to type-level computation, the F* type-checker also makes use of algebraic and logical reasoning via an SMT solver. Balancing these features has required quite a lot of care: e.g., compute too much in the type-checker and the SMT solver is faced with reasoning about enormous, partially reduced terms; compute too little and the SMT solver has to do a lot of computation, which is not its strong point.

For sprintf, we don’t want to rely on SMT reasoning at all, since everything can be checked simply using type-level computation. For this, F* requires a hint in the type of sprintf:

val sprintf : s:string{normalize (valid_format_string s)}
           -> normalize (sprintf_type s)

Notice the additional calls to normalize, which signals to F* to fully reduce valid_format_string s and sprintf_type s before resorting to SMT reasoning.

Additional material

See the actual library (based on F* code that Catalin Hritcu initially modeled after Arthur Azevedo de Amorim’s Coq sprintf) for more details about how this works, including how we compile away uses of sprintf to a bunch of string concatenations.

Here’s a small standalone version. Compare it to the version in Cayenne … it’s pretty similar, but it took (well, at least it took me) 15 years to get there! : )

module SimplePrintf
open FStar.Char
open FStar.String

let rec valid_format_chars = function
    | [] -> true
    | '%'::'d'::s 
    | '%'::'s'::s 
    | '%'::'%'::s -> valid_format_chars s    
    | '%'::_ -> false
    | _::s -> valid_format_chars s

let valid_format_string s =
    valid_format_chars (list_of_string s)

let rec sprintf_type_chars (s:list char) =
    match s with
    | [] -> string
    | '%'::'d'::s -> (int -> sprintf_type_chars s)
    | '%'::'s'::s -> (string -> sprintf_type_chars s)
    | '%'::'%'::s -> sprintf_type_chars s
    | _::s -> sprintf_type_chars s

let sprintf_type s = sprintf_type_chars (list_of_string s)

let rec sprintf_chars 
        (s:list char{normalize (valid_format_chars s)})
        (res:string)
    : normalize (sprintf_type_chars s)
    = match s with
      | [] -> res
      | '%'::'d'::s -> fun (x:int) -> sprintf_chars s (res ^ string_of_int x)
      | '%'::'s'::s -> fun (x:string) -> sprintf_chars s (res ^ x)
      | '%'::'%'::s -> sprintf_chars s (res ^ "%")
      | x::s -> sprintf_chars s (res ^ string_of_char x)

let sprintf (s:string{normalize (valid_format_string s)})
   : normalize (sprintf_type s)
   = sprintf_chars (list_of_string s) ""

What is sprintf?

A sketch of sprintf in F*

That was simple! Why did it take so long for F* to get sprintf?

Additional material

What is `sprintf`?

A sketch of `sprintf` in F*

That was simple! Why did it take so long for F* to get `sprintf`?