There is quite a nice feature of Python that allows dictionaries to be applied to format strings, rather than just relying on the position of each conversion specification. To give a trivial example:
>>> print "Hello %s my name is %s." % ("Gaius", "Python") Hello Gaius my name is Python.
Becomes:
>>> print "Hello %(name)s my name is %(lang)s." % {"name":"Gaius", "lang":"Python"} Hello Gaius my name is Python.
I use this technique all the time for building complex strings from templates, e.g. generating configuration files, or composing email from a program, where it would be unmanageable to rely on just position, especially when the body is very likely to change. I couldn’t see this facility in OCaml, so I quickly knocked something up (based on a couple of functions from earlier):
(* Dict printf - a la Python. Only Str for now. *) module type DFPRINTF = sig val dsprintf: string -> (string * string) list -> string val dfprintf: out_channel -> string -> (string * string) list -> unit val dprintf : string -> (string * string) list -> (string * string) list -> unit end (* Find all occurences of the % character in a string and return their indexes *) let findq s = let rec findq' s offset acc = try let pos = String.index_from s offset '%' in findq' s (pos+1) (pos::acc) with Not_found -> List.rev acc in findq' s 0 [] (* replace characters from position x to position y in string a with string b *) let splice a x y b = let before = String.sub a 0 x in let after = String.sub a y ((String.length a) - y) in before ^ b ^ after (* substitute from dict h into string s *) let dsprintf s h = let q = findq s in let s' = ref "" in s' := String.copy s; for i = (List.length q) - 1 downto 0 do let ob = (List.nth q i) + 1 in (* the open bracket immediately following the % *) try match s.[ob] with |'('-> let cb = String.index_from s ob ')' in let k = String.sub s (ob + 1) (cb - ob - 1) in (* dict key between ( and ) *) if s.[cb + 1] == 's' then s' := splice !s' (ob -1) (cb + 2) (List.assoc k h) (* seems to be no way to use a format string at runtime *) else (); |_ -> (); with Not_found -> () (* not really a format/key not found *) | Invalid_argument e -> () (* string ends "%(...)" *) done; !s' let dfprintf c s h = output_string c (dsprintf s h) let dprintf s h = dfprintf stdout (dsprintf s h) (* End of file *)
# dsprintf "Hello %(name)s my name is %(lang)s." [("name", "Gaius"); ("lang", "OCaml")];; - : string = "Hello Gaius my name is OCaml."
The most obvious problem with this is, it only handles strings, and even then, does not permit any of the flags for formatting those strings, e.g. justification. That seems to be a non trivial problem. In Python again:
>>> x="%s" >>> print x % "hello" hello
However back in OCaml:
# open Printf;; # printf "%s" "hello";; hello- : unit = () # let x = "%s";; val x : string = "%s" # printf x "hello";; Error: This expression has type string but an expression was expected of type ('a -> 'b, out_channel, unit) format = ('a -> 'b, out_channel, unit, unit, unit, unit) format6 # printf (format_of_string x) "hello";; Error: This expression has type string but an expression was expected of type ('a, 'b, 'c, 'd, 'e, 'f) format6
There doesn’t seem to be a way I can tell for getting functions in the Printf
module to get a format string from a value – only from a string literal that is known at compile time, otherwise I could use the characters between the )
and the s
(or d
or whatever – not that a mixed type assoc list is allowed) and just pass them to sprintf
on line 39. Hmm. Annoying, but not a showstopper – just means I need to cast everything I want to the appropriate string (using literals with sprintf
if necessary) when building the association list. So my immediate string-constructing need is met, but it feels like bit of a hack.
The standard library has this function in the Buffer module, which you could use straightforwardly:
val add_substitute : t -> (string -> string) -> string -> unit
add_substitute b f s appends the string pattern s at the end of the buffer b with substitution. The substitution process looks for variables into the pattern and substitutes each variable name by its value, as obtained by applying the mapping f to the variable name. Inside the string pattern, a variable name immediately follows a non-escaped $ character and is one of the following:
* a non empty sequence of alphanumeric or _ characters,
* an arbitrary sequence of characters enclosed by a pair of matching parentheses or curly brackets. An escaped $ character is a $ that immediately follows a backslash character; it then stands for a plain $. Raise Not_found if the closing character of a parenthesized variable cannot be found.
Very interesting – I hadn’t spotted that, but I don’t think it does what I want either – there doesn’t seem to be a way to get format characters into it, e.g.:
# sprintf "Hello %15s" "Gaius";;
- : string = "Hello Gaius"
But it is definitely more concise!
# open Buffer;;
# let b = create 10;;
val b : Buffer.t =
# add_substitute b (fun x -> List.assoc x [("name", "Gaius")]) "Hello $(name)";;
- : unit = ()
# contents b;;
- : string = "Hello Gaius"
Thanks for the tip! It seemed too obvious a feature to actually be missing 🙂
That hasn’t formatted correctly, it should look like:
You can parse the string in $(…), e.g. $(15 name) to tell the substitute function to format your string. However, this is a bit of work.
You should take a look at Batteries’ extended printf. It features extensible tags, so you might be able to pull out exactly what you want.
Hmm, that’s odd. I have OCaml 3.11.2, FindLib and oUnit, just installed Camomile and Batteries isn’t happy:
root@debian:~/batteries-1.2.2# make
test ! -e src/batteries_config.ml || rm src/batteries_config.ml
ocamlbuild syntax.otarget byte.otarget src/batteries_help.cmo META shared.otarget
Finished, 1 target (0 cached) in 00:00:00.
+ ocamlfind ocamldep -package camomile,num,str -package camlp4.lib -pp camlp4of -pp camlp4of -modules libs/estring/pa_estring.mli > libs/estring/pa_estring.mli.depends
sh: camlp4of: command not found
Preprocessing error on file libs/estring/pa_estring.mli
Command exited with code 2.
Compilation unsuccessful after building 1 target (0 cached) in 00:00:00.
make: *** [all] Error 10
I’ll see if
apt-get upgrade ocaml
sorts it…Thanks for the tip 🙂
There is the package ocaml-batteries-included in Debian Squeeze/Sid. Better you use it, rather than recompiling.
To fix your specific problem (if you still want to compile), install camlp4-extra.
To add to the former comment by Benedikt
# let buf = Buffer.create 13;;
val buf : Buffer.t = abstr
# Buffer.add_substitute buf
(fun str -> List.assoc str \["name", "Gaius"; "lang", "Python"\])
"Hello $(name) my name is $(lang).";;
- : unit = ()
# Buffer.contents buf;;
- : string = "Hello Gaius my name is Python."
Except this time my name is OCaml 🙂
Hey, I was just about to code my own wrapper for
df -Ph
then I noticed you’d already writtenstatvfs
in ExtUnix. Merci beaucoup!To read a printf-format from a string, you can use the Scanf module with the { fmt %} tag :
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Scanf.html
Thanks, I shall have a play with that!
I can see I really need to stop doing stuff and have a thorough study of the standard libraries.
Another way to do it is with camlp4 macros. They can be type-safe too.
You could have a look at how we did SQL-safe type-safe string interpolation in PG’OCaml. There is also a camlp4 extension which does something similar to what you want, although the name of it escapes me right now …
I think I’d better learn to walk before I can run 😀
Very nice! And in retrospect, everything I needed to know was actually in the message from the type system!