I wrote some Python code a while ago for neatly formatting tabular output, for generating reports on the command line or from cron jobs to email. Typically this would be populated with a result set from a query “joined” with some computation done in the code. In order to make it possible to use Ocaml for this type of report (or for the next version of an existing report) I reimplemented it and here they are side by side.
The first thing to notice is that they’re exactly the same length in lines, but there is more Ocaml code. Admittedly this is probably not very idiomatic Ocaml on my part (and it is in the Camlp4 syntax). Tokens in Ocaml tend to be much longer, e.g. Array.length
or String.length
instead of an (overloaded) len
. Does this make it more legible or maintainable? I’m not convinced, at least not for a snippet this size, but I’ve yet to use Ocaml “in the large”. I get type safety – but I know that I will only be dealing with strings, ints or floats (dates are effectively strings for the purposes of this) and Python does the casts “for free”.
Anyway, this forms a part of a sub-project to put all the scaffolding in place so that Ocaml can be a “drop-in” replacement for Python for my work. Most importantly, the compiled Ocaml code has fewer external dependencies.
Python:
class Report: def __init__(self, cols=[]): "Takes column headings as an argument" self.widths = [] self.columns = cols self.records = [] for c in self.columns: self.widths.append(len(c)) def addRow(self, r=[]): "Add one row, padding or truncating if necessary" row = r if len(row) > len(self.columns): row = row[:len(self.columns)] while len(row) < len(self.columns): row.append(None) for x in xrange(0, len(row)): if len(str(row[x])) > self.widths[x]: self.widths[x] = len(str(row[x])) self.records.append(row) def printReport(self, o=stdout): "Generate the report of column headings, a divider, then the rows" h, d = [], [] for c in self.widths: h.append('%%%ds' % c) d.append('-' * c) fmt = ' '.join(h) print >>o, fmt % tuple(self.columns) print >>o, fmt % tuple(d) for r in self.records: print >>o, fmt % tuple(r)
OCaml:
class report (header : array string) = object (self) (** Array holding the width of each column *) value mutable widths = ([||] : array int); (** List of arrays each of which is one row in the report *) value mutable rows = ([] : list (array string)); (** Initialize widths to be the same as the widths of the header *) initializer do { widths := Array.make (Array.length header) 0; self#set_widths header }; method private set_widths x = for i = 0 to min (Array.length x) (Array.length header) - 1 do { if widths.(i) < String.length x.(i) then widths.(i) := String.length x.(i) else () }; (** Return r.(i) padded with spaces to widths.(i) *) method private pad_column r i = sprintf "%*s" widths.(i) r.(i); method private print_row chan r = do { for i = 0 to min (Array.length r) (Array.length header) - 1 do { output_string chan (self#pad_column r i ^ " ") }; output_string chan "\n" }; (** Add a row to this report - column widths will automatically adjust @param r an array of string values*) method add_row r = do { self#set_widths r; rows := [r :: rows] }; (** Generate the report to STDOUT @param chan an optional out_channel *) method print_report ?(chan = Pervasives.stdout) () = do { self#print_row chan header; self#print_row chan (Array.init (Array.length widths) (fun i -> String.make widths.(i) '-')); List.iter (fun x -> self#print_row chan x) (List.rev rows)(* get the rows in order added *) }; end;
One small change to be slightly more idiomatic would be to change lines 35 and 36 of the OCaml snippet to:
(Array.map (fun width -> String.make width ‘-‘) widths);
It saves a manual call to Array.length and results in code which is a bit clearer once you are familiar with functions like map.
You can also take advantage of currying on line 37:
List.iter (self#print_row chan) (List.rev rows)
That is perhaps less idiomatic and more a matter of personal preference.
Thanks for the tips – the latter one is what the Haskell community call “points-free” style I believe.
Do you have any thoughts about Camlp4 vs “regular” syntax?
A bit of terminology – the syntax you are using is called the revised syntax. Camlp4 is the tool which allows syntax extensions to OCaml, including the revised vs original OCaml syntaxes.
The revised syntax cleans up some potential inconsistencies in the original syntax. It does not seem to be very commonly used, but camlp4 can translate code from revised to original and back so you can jump back and forth between the two with some effort.
report2 and report3 are maybe a little bit more typical for OCaml. We use lists more often than arrays. We also avoid keeping state in the object (like widths) when we can compute it at the end.
class report (header: string array) =
object (self)
(** Array holding the width of each column *)
val mutable widths = ([||] : int array)
(** List of arrays each of which is one row in the report *)
val mutable rows = ([] : (string array) list)
(** Initialize widths to be the same as the widths of the header *)
initializer
begin
widths <- Array.make (Array.length header) 0; self#set_widths header
end
method private set_widths x =
for i = 0 to min (Array.length x) (Array.length header) - 1 do
if widths.(i) < String.length x.(i) then
widths.(i) <- String.length x.(i)
done
(** Return r.(i) padded with spaces to widths.(i) *)
method private pad_column r i =
Printf.sprintf "%*s" widths.(i) r.(i)
method private print_row chan r =
for i = 0 to min (Array.length r) (Array.length header) - 1 do
output_string chan (self#pad_column r i ^ " ");
done;
output_string chan "\n"
(** Add a row to this report - column widths will automatically adjust @param r an array of string values*)
method add_row r =
self#set_widths r;
rows String.make widths.(i) '-'));
List.iter (fun x -> self#print_row chan x) (List.rev rows)(* get the rows in order added *);
()
end
(* A little long, but support adding data of different size e.g. 4 cols with 3
* headers
*)
class report2 (header: string array) =
object (self)
(** List of arrays each of which is one row in the report *)
val mutable rows = ([] : (string array) list)
method add_row r =
rows
let e = f e1 e2 in
e :: map2' (tl1, tl2)
| [], e2 :: tl2 ->
let e = f dflt1 e2 in
e :: map2' ([], tl2)
| e1 :: tl1, [] ->
let e = f e1 dflt2 in
e :: map2' (tl1, [])
| [], [] ->
[]
in
map2' (lst1, lst2)
in
let lst =
List.map Array.to_list (header :: List.rev rows)
in
let widths =
List.fold_left
(fun widths row ->
map2
(fun s len -> max (String.length s) len)
row ""
widths 0)
[]
lst
in
let pads =
List.map (fun len -> String.make (len + 1) ' ') widths
in
let lst =
(* Introduce header split *)
match lst with
| hdr :: data ->
hdr
::
(List.map (fun len -> String.make len '-') widths)
::
data
| [] ->
[]
in
List.iter
(fun row ->
let _u : unit list =
map2
(fun s (pad, len) ->
let s_len = String.length s in
String.fill pad 0 len ' ';
String.blit s 0 pad (len - s_len) s_len;
output_string chan pad)
row ""
(List.combine pads widths) ("", 0)
in
output_string chan "\n")
lst;
flush chan;
()
end
end
(* Same version as report, cannot add extra cols
*)
class report3 (header: string array) =
object (self)
(** List of arrays each of which is one row in the report *)
val mutable rows = ([] : (string array) list)
method add_row r =
rows
List.map2
(fun s len -> max (String.length s) len)
row widths)
(List.map (fun _ -> 0) (Array.to_list header))
lst
in
let pads =
List.map (fun len -> String.make (len + 1) ' ') widths
in
let lst =
(* Introduce header split *)
match lst with
| hdr :: data ->
hdr
::
(List.map (fun len -> String.make len '-') widths)
::
data
| [] ->
[]
in
List.iter
(fun row ->
List.iter2
(fun s (pad, len) ->
let s_len = String.length s in
String.fill pad 0 len ' ';
String.blit s 0 pad (len - s_len) s_len;
output_string chan pad)
row
(List.combine pads widths);
output_string chan "\n")
lst;
flush chan;
()
end
end
let data =
[
[|"01"; "abcdef"; "ghfig"|];
[|"02"; "abcdef"; "ghfidfsg"|];
[|"03"; "abcdef"; "ghfig"|];
[|"04"; "abcd"; "ghfazeig"|];
[|"05"; "abcdefffff"; "gherfig"|];
]
let () =
let rprt = new report [|"a"; "b"; "c"|] in
let () =
List.iter rprt#add_row data;
rprt#print_report ()
in
let rprt = new report2 [|"a"; "b"; "c"|] in
let () =
List.iter rprt#add_row data;
rprt#add_row [|"z"; "z"; "z"; "z"|];
rprt#print_report ()
in
let rprt = new report3 [|"a"; "b"; "c"|] in
let () =
List.iter rprt#add_row data;
rprt#print_report ()
in
()
The code in my previous comment is not well displayed, here is a better version (I hope).
BTW, I have converted the revised syntax to standard OCaml syntax.
Very interesting, thanks!
Here is a short version (which also allows to add a row of 4 with a header of 3). I wrote the code fast, hope it respects your specs.
Test:
Hell, the system ate part of the code (becase of the ← ?). Let us try again (the possibility to preview would be welcome!)
Thanks!
When I first wrote it (in Python) I couldn’t think of a case in which getting a row with more columns than in the header wouldn’t be a bug (e.g. accidentally splitting on a space or a comma in some result from the DB) so I truncated (allowing the potentially lengthy report generation to continue without crashing), yet I could think of case where there would be fewer (e.g. a computation that returned no meaningful result) in which case I padded.
BTW, do not put types in your comments: they are already in the code (or can be displayed after compilation — e.g. with C-c C-t in Emacs) and you may want to change the data structures later… with the risk of forgetting to change your comments.
Pingback: Using OCaml with Oracle (2) | So I decided to take my work back underground