OCI*ML: 100× Faster with Bulk Operations

Continuing with my fixation on reducing the number of network roundtrips in OCI*ML I have enhanced the orabindexec function to perform bulk DML, that is to say, batching many similar INSERT, UPDATE or DELETE statements on the same table into a single OCI call. The function signature is unchanged, taking a list of arrays each of which represents one row to bind into a statement prepared with oraparse, so existing code will get a boost with no modification. The size of the batch is simply the size of the list; effectively this function is the opposite of orafetchall.

$ ./ocimlsh 
        Objective Caml version 3.12.0

not connected > orabindexec;;
- : Ociml.meta_statement -> Ociml.col_value array list -> unit = <fun>
not connected > orafetchall;;
- : Ociml.meta_statement -> Ociml.col_value array list = <fun>
not connected >

Using the same system from the previous benchmark on prefetching, a 1G client VM with a ~35ms ping/~70ms TNS ping time over a WAN/VPN connection to the server, loading a 10,000-line CSV file with a COMMIT at the end shows the following results:

Batch size Time (s) Rows/sec Speedup factor
1 855 11.70 1
10 93.53 106.92 9.14
100 20.45 489.00 41.79
1000 17.33 577.03 49.32
2000 17.85 560.22 47.88
5000 25.01 399.84 34.17
10000 36.28 275.63 23.56

Repeating the same test locally on the server shows just what a brutal effect network latency has on SQL*Net, while showing an even greater improvement with bulk DML:

Batch size Time (s) Rows/sec Speedup factor
1 11.75 851.06 1
10 1.27 7874.02 9.25
100 0.22 45454.55 53.41
1000 0.11 90909.09 106.82
2000 0.12 83333.33 97.92
5000 0.15 66666.67 78.33
10000 0.19 52631.58 61.84

Interesting to see that the optimal batch size is the same for this particular workload.

While this was somewhat more work to implement than simply switching on OCI’s built-in prefetching mechanism, the numbers show that it was definitely worthwhile. While the rows/sec over the WAN/VPN were lower than a SELECT statement, as to be expected with a pre-warmed block buffer cache, the speedup from tuning is nearly 50×, and over 100× locally. OCI*ML is definitely fast enough now, without even any tricks like INSERT APPEND or NOLOGGING to be used as an ETL tool in place of Perl, as well as for monitoring and report generation.

It also demonstrates the value of experimental over theoretical tuning, bigger batches are not always better. In part this will be because two copies of the batch will exist in memory: one on the OCaml side as it is being parsed and the list built, and another on the OCI side as it being reconstructed there in a suitable format.

About Gaius

Jus' a good ol' boy, never meanin' no harm
This entry was posted in Ocaml, OCIML, Oracle. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s