Form vs Function in Applications

When you have been around the industry for a little while, you will see that nearly all applications, especially user-facing ones, are made of only three kinds of things: Forms, Reports and Workflows. A Form is simply an interface, a screen or a page, for the user to input some data. It is made of text fields, checkboxes, drop-down menus, and so on. A good Form offers prompting and validation, such as a text field that is supposed to be a number is actually a number and within a sensible range. It can be dynamic, updating fields depending on each other, all sorts of fancy things, and when the user is done, saves the results in a database.

A Report is a screen or page that accesses a database, often one populated at least in part by Form input, filters, sorts and aggregates the records, and presents them in a useful way, for example a graph or a table. Often Reports have a few Form-like elements too, so the user can refine or drill down into the data. A Workflow simply links Forms and Reports together. Behind both you might have Services (APIs). So 4 things, to make (nearly!) any application you can think of.

Let’s consider a website like Amazon. You go to it, and it runs the default Report. You refine that Report into a selection of things you’re interested in, and select one, fill in a Form to buy it, and the database is updated, one item transferred atomically from their inventory to your basket, by calling a Service. Then in the warehouse someone runs a Report that tells them what to pack and who to ship to, then they fill in the Form to update the status. There’s nothing in this, fundamentally, that couldn’t have been done on a 1971 IBM 3270, it just might not look as pretty to modern eyes, but all the functionality and interactivity would be there. If you’re buying a book, do you really need a photo of the picture on the cover to decide?

Once you start thinking at this level, suddenly it doesn’t matter that the lifetime of the current fashionable web page generation language is only a few years, or the latest Javascript framework will only be around for a few months. Let junior programmers (who I’ll define as those who define themselves primarily in terms of their language rather than their domain) worry about that and worry about running just to keep up with their peers jumping on the latest bandwagon. Concentrate on the meat of the application, and use tried and true languages and platforms to do it. This isn’t an anti-technology rant by the way, just a piece of advice for those ready for the next level.

Posted in Business, C++, Oracle | Leave a comment

Can anyone learn to code?

Anyone can learn to code, we are constantly told these days, but is it true? Yes… in the same sense that it is true that anyone can learn to play a musical instrument. Many people do, and many of them get a great deal of personal satisfaction from it, play in bands, write songs, meet new people, and lots of other good things.

But if a man selling guitars and guitar lessons tells you that playing the guitar is a guaranteed route to a well paid, secure career, then you would be wise to take that with a pinch† of salt. Or worse, if he tries to convince you that your kids should just learn to play the guitar and join a Silicon Roundabout startupband, instead of studying traditional subjects.

Similarly anyone can learn to cook, but the skills required to cater a dinner party for some friends don’t scale up to catering a formal reception for hundreds, or running a restaurant. Not even Jamie Oliver is stupid enough to try to convince anyone his books and TV shows will teach you that. If schools taught people to cook meals from basic ingredients then as a nation we would be healthier, wealthier and happier, but they can’t even manage that… Why does anyone think they can teach “coding”?

† Handful

Posted in Business, Random thoughts | Leave a comment


So, how was 2014 for me, in tech at least? Pretty good. It has been quite an interesting experience adapting to an all-proprietary tech stack (database, language, IDE, job scheduler) but I feel I am finally getting a grip on how all the moving parts hang together and how the sum is greater than the whole of the parts. So, professionally, I’m pretty happy and I find out early next year how happy they are with me 🙂

The systemd farce has prompted me, after nearly 20 years, to look at alternatives to Debian for my personal Unix(-like). Candidates for a replacement so far are DragonFly, Minix3 and Debian/HURD, tho’ I have to wonder how viable the latter is. The Debian project simply doesn’t have the manpower to maintain two parallel distros, once systemd has fully infected Linux. Windows 95 called, it wants its registry back. I have these running in VMs to kick the tyres right now. But more and more of my scarce time for personal projects is taken up with BBC Micro and Atari ST stuff now anyway…

Speaking of the Beeb, Elite: Dangerous is available now. I have to admit I was skeptical that it would ever ship, and I’ll hold my hand up to say I was wrong about that, good work Braben. However I won’t be playing. The lack of a single player mode is the deal breaker for me – you can play single-player but you are forced to participate in the shared universe, which runs in real time. But in a game I want the universe to stop when I’m not there and resume where I left off whether that’s an hour later or a month. And consider this: I bought my copy of the original Elite, in 1984, with pocket money I’d saved. Acornsoft ceased to exist as a separate entity in 1986 as the bottom dropped out of the home computer market. But I still play it in 2014, on original hardware. Should something happen to the servers owned and operated by Frontier, what happens to online-only Elite?

Nothing to report on the OCaml front. I’m still into it, but just haven’t had time to do much on OCI*ML. As I said previously, I had gotten it to the point at which it was useful for the work I was doing at the time. Features such as handling BLOBs are still outstanding, maybe I will have a chance to add that (and find a way to do it without my company claiming the IP). I’d like to maintain my connection to the OCaml community, because you never know… And I need to get properly up to speed on C++14, again to keep options open. (RIP Dr Dobbs). Will I ever return to the world of Oracle? I won’t rule it out but what I am doing now is a radically different way of thinking about databases and the applications that run on them; a few years of this can only strengthen my database and development career long term.

My prediction in wider tech in 2015 is this is the make-or-break year, either something is done about impossible-to-secure HTTP, SMTP and TCP/IP networks, or the public massively disengages from online services in general, sticking only to those websites with a) a good track record of security and b) an even better track record of making things right financially and logistically when their customers are impacted by a breach.

Posted in BBC, C++, Linux, Ocaml, OCIML, Oracle, Random thoughts | Leave a comment


Long time since I have updated here, a lot has been happening. For a start part 4 of my series on Oracle 12c new features is unlikely to be written since in October I started a new job which is non-Oracle. I’m now working for an organization with specialized enough needs to have written its own in-house database, the raw building blocks of which are Linux, C++ and Python, integrating soft-realtime data access, replication with flexible topology, sophisticated batch scheduling and a whole range of other features, globally distributed and operating 24/7. My new role is Application Development Lead.

It is a bit strange; for the last 15-odd years Oracle has been my bread-and-butter, from versions 7 → 11g in Prod and 12c in Dev, as a DBA and a developer, but it is good once in a while to step outside one’s comfort zone. Here’s wishing for an exciting and prosperous 2014 for all!

Posted in Oracle, Random thoughts | Leave a comment

Real World OCaml

In the course of my work with OCaml I have traditionally resisted using anything other than “pure” OCaml, and the facilities of the underlying OS. So rather than OMake or OASIS I just used plain, old-fashioned Makefiles. For package management, I relied on APT on Debian and MacPorts on OSX. And I avoided both Batteries and Core. No so much out of a fear of “backing the wrong horse” but just to make whatever I did as portable and easy to adopt as possible. And also, in the early days, I didn’t really know enough to choose anyway, and I wanted to work with the raw language rather than a high-level framework. Sort of like you can learn to program MFC without ever really learning C++.

But now Real World OCaml (which I have on pre-order) is in final draft, and spent some of yesterday getting my Debian and OSX environments set up for it†. One quirk I quickly found is that both have pkgconfig as a prereq, which for whatever reason, neither system had already, and that’s not mentioned on the page, maybe everyone else has it by default. I have a bunch of OCaml stuff in-flight at the moment – OCI*ML test suite and new features, some playing with Project Euler (solved 1516 problems at time of writing) and now working my way through this (trying not to skip to FFI which is a keen interest of mine, obviously!). That’s on top of playing with Oracle 12c, and I have barely started properly playing with C++11 new features yet!

† Links to the draft of the book will stop working at some point I expect…

Posted in Ocaml | 7 Comments

Strange Datetime Problem

While working on my unit tests, I came across a sporadic failure in inserting and selecting Datetimes to the database, so I wrote a quick test harness to see what’s going on:

open Ociml
open Printf

let () = 
  let lda =  oralogon "ociml_test/ociml_test" in
  let sth = oraopen lda in
  for i = 0 to 100 do
    orasql sth "truncate table test_date";
    let d = (Datetime (localtime (Random.float (time() *. 2.)))) in
    oraparse sth "insert into test_date values (:1)";
    orabind sth (Pos 1) d;
    oraexec sth;
    oracommit lda;
    orasql sth "select * from test_date";
    let rs = orafetch sth in
    match (rs.(0) = d) with
    |true -> print_endline (sprintf "Inserted %s, got %s, OK" (orastring d) (orastring rs.(0)))
    |false -> print_endline (sprintf "Inserted %s, got %s <-------- FAIL" (orastring d) (orastring rs.(0)))

This fails about 3% of the time, for reasons I cannot fathom, there seems to be no correlation with summer time. Here’s a set of results incase anyone else can figure it out:

gaius@debian7:~/Projects/ociml$ o
        Objective Caml version 3.12.1

	OCI*ML 0.3 built against OCI 11.2

not connected > #use "tests/";;
Inserted 31-Jul-1993 20:21:44, got 31-Jul-1993 20:21:44, OK
Inserted 01-Mar-2022 09:17:53, got 01-Mar-2022 09:17:53, OK
Inserted 18-Jan-1995 05:48:57, got 18-Jan-1995 05:48:57, OK
Inserted 24-Jul-2024 14:10:44, got 24-Jul-2024 14:10:44, OK
Inserted 12-Jan-1991 12:32:01, got 12-Jan-1991 12:32:01, OK
Inserted 03-Nov-2018 18:26:46, got 03-Nov-2018 18:26:46, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 08-Jan-2036 17:58:34, got 08-Jan-2036 17:58:34, OK
Inserted 31-May-2001 07:29:34, got 31-May-2001 07:29:34, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 07-Aug-1986 04:59:54, got 07-Aug-1986 04:59:54, OK
Inserted 08-Mar-2036 15:19:15, got 08-Mar-2036 15:19:15, OK
Inserted 23-Mar-1975 09:36:54, got 23-Mar-1975 09:36:54, OK
Inserted 26-Aug-1998 10:39:15, got 26-Aug-1998 10:39:15, OK
Inserted 23-Jan-1985 13:23:09, got 23-Jan-1985 13:23:09, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 02-Dec-2024 12:06:13, got 02-Dec-2024 12:06:13, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 20-Mar-1985 11:47:15, got 20-Mar-1985 11:47:15, OK
Inserted 25-Oct-1976 20:16:19, got 25-Oct-1976 20:16:19, OK
Inserted 29-Dec-1972 23:46:42, got 29-Dec-1972 23:46:42, OK
Inserted 17-Aug-1993 06:41:06, got 17-Aug-1993 06:41:06, OK
Inserted 26-Sep-2037 23:58:20, got 26-Sep-2037 23:58:20, OK
Inserted 15-Aug-1994 07:44:57, got 15-Aug-1994 07:44:57, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 13-Sep-2022 23:14:04, got 13-Sep-2022 23:14:04, OK
Inserted 23-Mar-2031 15:29:59, got 23-Mar-2031 15:29:59, OK
Inserted 27-Dec-1983 21:15:25, got 27-Dec-1983 21:15:25, OK
Inserted 29-Feb-2032 06:55:03, got 29-Feb-2032 06:55:03, OK
Inserted 17-Jan-2019 10:58:15, got 17-Jan-2019 10:58:15, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 09-Oct-2031 21:33:34, got 09-Oct-2031 21:33:34, OK
Inserted 21-Aug-2023 20:31:46, got 21-Aug-2023 20:31:46, OK
Inserted 20-Sep-1992 16:30:21, got 20-Sep-1992 16:30:21, OK
Inserted 11-Nov-1970 10:00:13, got 11-Nov-1970 09:00:13 <-------- FAIL
Inserted 24-Feb-1984 20:46:46, got 24-Feb-1984 20:46:46, OK
Inserted 19-May-2005 00:45:39, got 19-May-2005 00:45:39, OK
Inserted 22-Apr-1986 05:51:55, got 22-Apr-1986 05:51:55, OK
Inserted 10-Apr-1987 11:32:32, got 10-Apr-1987 11:32:32, OK
Inserted 28-May-2016 15:43:58, got 28-May-2016 15:43:58, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 11-Feb-2033 01:03:55, got 11-Feb-2033 01:03:55, OK
Inserted 10-Jul-2031 19:50:26, got 10-Jul-2031 19:50:26, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 23-Nov-1982 04:12:36, got 23-Nov-1982 04:12:36, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 11-May-2009 21:59:43, got 11-May-2009 21:59:43, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 07-Jun-2007 01:11:58, got 07-Jun-2007 01:11:58, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 14-Mar-2002 06:34:51, got 14-Mar-2002 06:34:51, OK
Inserted 09-Nov-2009 06:40:03, got 09-Nov-2009 06:40:03, OK
Inserted 30-Jul-2037 06:55:44, got 30-Jul-2037 06:55:44, OK
Inserted 26-Nov-2030 21:14:53, got 26-Nov-2030 21:14:53, OK
Inserted 05-Sep-1996 15:14:24, got 05-Sep-1996 15:14:24, OK
Inserted 07-Apr-1980 11:34:26, got 07-Apr-1980 11:34:26, OK
Inserted 02-Jan-2037 18:55:00, got 02-Jan-2037 18:55:00, OK
Inserted 14-Mar-1977 15:07:19, got 14-Mar-1977 15:07:19, OK
Inserted 16-Oct-1995 01:51:15, got 16-Oct-1995 01:51:15, OK
Inserted 04-Aug-1990 06:50:10, got 04-Aug-1990 06:50:10, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 23-May-2021 16:00:23, got 23-May-2021 16:00:23, OK
Inserted 17-Aug-1982 02:21:05, got 17-Aug-1982 02:21:05, OK
Inserted 27-Aug-2013 20:52:49, got 27-Aug-2013 20:52:49, OK
Inserted 13-Dec-2027 14:10:48, got 13-Dec-2027 14:10:48, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 29-Jun-2025 02:53:11, got 29-Jun-2025 02:53:11, OK
Inserted 24-Jul-2031 23:54:31, got 24-Jul-2031 23:54:31, OK
Inserted 15-Mar-1971 21:08:49, got 15-Mar-1971 20:08:49 <-------- FAIL
Inserted 27-Apr-1981 21:35:54, got 27-Apr-1981 21:35:54, OK
Inserted 22-Dec-2008 19:00:03, got 22-Dec-2008 19:00:03, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 11-Feb-1986 09:25:28, got 11-Feb-1986 09:25:28, OK
Inserted 24-Mar-1971 12:46:15, got 24-Mar-1971 11:46:15 <-------- FAIL
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 29-Sep-2016 04:41:02, got 29-Sep-2016 04:41:02, OK
Inserted 03-Dec-2000 09:58:00, got 03-Dec-2000 09:58:00, OK
Inserted 10-Dec-1991 18:08:10, got 10-Dec-1991 18:08:10, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 29-Oct-1999 04:17:24, got 29-Oct-1999 04:17:24, OK
Inserted 21-Oct-1988 14:15:16, got 21-Oct-1988 14:15:16, OK
Inserted 27-May-2022 04:21:34, got 27-May-2022 04:21:34, OK
Inserted 16-Oct-1982 05:25:39, got 16-Oct-1982 05:25:39, OK
Inserted 19-Nov-1998 14:57:54, got 19-Nov-1998 14:57:54, OK
Inserted 29-Jun-1975 10:06:11, got 29-Jun-1975 10:06:11, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 02-Jul-1996 03:08:55, got 02-Jul-1996 03:08:55, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 17-Jan-2016 17:46:01, got 17-Jan-2016 17:46:01, OK
Inserted 28-Feb-1993 16:57:25, got 28-Feb-1993 16:57:25, OK
Inserted 21-Dec-1977 16:54:30, got 21-Dec-1977 16:54:30, OK
Inserted 05-Mar-2003 12:58:52, got 05-Mar-2003 12:58:52, OK
Inserted 03-Jul-2023 17:06:21, got 03-Jul-2023 17:06:21, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK
Inserted 06-Aug-1981 02:57:40, got 06-Aug-1981 02:57:40, OK
Inserted 17-Nov-1983 23:55:58, got 17-Nov-1983 23:55:58, OK
Inserted 17-Mar-1999 22:40:16, got 17-Mar-1999 22:40:16, OK
Inserted 04-Oct-2023 18:55:54, got 04-Oct-2023 18:55:54, OK
Inserted 28-Feb-1991 09:52:00, got 28-Feb-1991 09:52:00, OK
Inserted 13-Dec-1901 20:45:52, got 13-Dec-1901 20:45:52, OK

The only thing I can see is that they’re near the Unix epoch, but why would that cause it to be exactly 1 hour out…? The latest version of the code is up on Github. The underlying C code is in oci_types.c.

Anyway, at least this illustrates the value of soak-testing with randomly generated data – I had never experienced this issue “in the wild”, not has it been reported.

Update: Fixed! Was a double-application of localtime. I never noticed it because at the company I was at when I wrote this, there was a policy of all machines everywhere in the world being in GMT all year round! The epoch thing was a red herring. I suppose the moral of the story is make sure your random data is really random…

Posted in Ocaml, OCIML, Oracle | 5 Comments

OCI*ML: Make Test

Before resuming feature implementation in OCI*ML I thought I ought to tighten up the test suite a bit, so I have started on a make test target, including some utilities for generating large test datasets, which should be useful elsewhere. In the process I uncovered a couple of bugs, which I also fixed. Once I’m happy with the level of coverage, I might even get around to doing LOBs…


It feels pretty good to be stretching the old OCaml muscles again 🙂

Posted in Ocaml, OCIML, Oracle | Leave a comment

Oracle 12c Launch Event (3)

This is the third segment of my series of posts on interesting new features in Oracle 12c. In this I will cover enhancements to RAC, DataGuard (DG) and RMAN.

Application Continuity (AC) is the next logical step in Oracle’s existing Transparent Application Failover capability. TAF works as follows: you start a big SELECT on one node of a RAC, and midway through, the node fails. The TAF-aware driver reconnects to another node and re-submits the select, discarding the rows that it has already retrieved, and finally delivers the complete result set to the application which, other than it taking a bit longer due to actually starting again, should not be aware that anything has happened, hence “transparent”. However when performing DML TAF is less capable, it is up to the application to catch the exception and retry the transaction, on a connection which has been reestablished by the driver. AC encapsulates much of this logic, enabling on reconnection a sessions’s state to be reproduced, and its work resumed or retried. However it requires the use of WebLogic or Universal Connection Pool – it is basically a transaction processing monitor like CICS. It’s great that we have this capability natively in the Oracle stack now, but I wish this was implemented in OCI and happened “for free” in any application that was simply recompiled against the 12c libraries, perhaps that will be possible in a later version. AC is built on top of an enabling technology called Transaction Guard (TG), which is in OCI. TG gives each transaction a unique identifier, which can be used to reliably determine the commit state of that transaction even following a database outage, as it is stored in the session in OCI. This means there is no risk for a TG aware application duplicating the same work – and even more cleverly, that there is no risk of another session doing some work that would make a subsequent attempt impossible (e.g. if a customer has bought the last item that we had in stock). For the DBA there is a new table LTXID_TRANS in the SYSAUX tablespace for viewing TG activity, and to enable the feature there is some additional configuration to do. I am going to see about incorporating this into OCI*ML eventually.

The next interesting feature is Global Data Services (GDS), an intelligent load balancing and connection routing layer, for use in replication scenarios. Sometimes replication is used for resilience, and sometimes it is used to improve performance, either offloading read-only work from a read-write database, or to make available a copy of the data locally in a remote location, to cut down on the latency of round trips (and with the added bonus that that site can continue to work even if losing connectivity). GDS works with both physical replicas (Active DataGuard) or logical (GoldenGate) – indeed as mentioned in part 1 these two products are now licensed together. At a previous employer, we had a very sophisticated replication topology implemented on Oracle 10g with Quest Shareplex, and I know a few old hands that still swear that Sybase Replication Server is the best thing since sliced bread. Oracle’s own story in this space has been a bit mixed; GG was an acquisition which deprecated Streams which in turn obsoleted Logical Standbys. Perhaps now there will be a solid platform on which we can build going forwards, with GDS, DG and GG, certainly the licensing gives the customer more flexibility.

Continuing with DG, there are a slew of new features here, including Far Sync (FS), the DBMS_ROLLING package for leveraging DG in upgrade scenarios, writeable temporary tables in read-only active dataguard DBs, unique sequences across a DG topology and automatic validation of readiness for DG role transitions. FS is designed for the “zero data loss” scenario. To meet this requirement, typically we would deploy across two, geographically dispersed datacentres, call them A and B. When a transaction is committed at A, it is replicated to B, when B acknowledges the commit, A commits and the call returns to the application. The latency on this system can quickly add up, double the disk I/O time plus the network roundtrip, plus a small computational overhead. Craig “The Hammer” Shallahamer has shown in his work on queuing theory how even a small increase in response time can massively impact a system’s throughput. FS therefore is a proxy that sits between databases A and B, in the same datacentre as B. Now the process is a transaction is committed at A, it is replicated to FS which immediately acknowledges it so that A can proceed as normal, and then passes it on to B which will then perform its disk I/O and commit it there, thus reducing overall response time while still getting the data safely away. I don’t fully understand yet how FS is related to or interacts with the fast start failover observer, if at all, but this is something I will study before attempting to deploy!

The next DG enhancement is the DBMS_ROLLING (DR) package, which as the name suggests is used for performing rolling upgrades, which is the term describing the approach of upgrading a group of databases that together offer a service one at a time, while the others maintain the service for the users. I have done this before in RAC by taking one node at a time out of service, and it was always possible to do manually with an Active DG configuration, but now the tools are provided to make it safer and faster. You designate one or more databases in the configuration as the “future primary” and its immediate standbys (leading group), this is the one upgraded and when everything is OK, the future primary takes over and the current primary (part of the trailing group) becomes the next upgrade target. As good DBA practice I like to cue up all the commands I will need in a spreadsheet or in scripts, having tested them procedure with VMs and had it “code reviewed” by another DBA, so that on the day to upgrade production, I can just execute the plan – DR builds this plan in a DB table. It looks like it will be a great tool for production DBAs, however it will be a long time before we can use it, not until 12.2 is out I expect, since you need to be on 12.1 to use it, and you need something to upgrade to… 🙂 Just as TG enables AC, DR is enabled by some underlying code that pre-validates DG role changes, similar to the way VCS assesses a node to see it if is a viable failover target. Again, something you can do yourself (the “one button failover” control panel I developed in my current job does this with a nice Bootstrap interface) but it is nice that there is an “official” solution now, it frees up the DBA to concentrate on value-adding tasks.

The most interesting enhancement to RMAN is the ability to recover a single table now. It is normal in a well-run DB to have several layers protecting the data. There are regular RMAN backups of the datafiles and more frequently of the archived redo logs enabling a database or a tablespace to be recovered to any point in time. But in the event that a user requires a single table back (and it can’t be gotten via Flashback or a lagging standby) this has a significant overhead – the DB must be restored to a different system, with sufficient storage, then the table can be copied over a DBLINK or via DataPump. This is a tedious process that doesn’t offer great turnaround times to the user, so many experienced DBAs also like to schedule regular logical backups using DataPump or previously exp. This has been normal practice for as long as I have been a DBA! At the cost of some – perhaps a lot of – extra disk space, this makes retrieving a table as it existed at the time of the export very easy, but it is more stuff to manage. The new RMAN merges these capabilities meaning that it can be done with a single tool, this is something that will make the DBAs life easier and make the end user happier too.

One final thing I would like to mention is the new ability to move a datafile online. We have been able to move a table from one tablespace to another (and hence from one or more datafiles, to different ones) but the process of moving a datafile has meant going offline however briefly. My usual approach has been to use RMAN to make a copy in the new location, set the original to read-only, recover the copy to apply changes made since the copy started, then switch them over with RMAN. This is a much better way than offlining the tablespace, moving the datafiles, renaming and re-onlining as it minimizes the unavailability. With ASM, there is a technique for moving an entire diskgroup from one set of LUNs to another online by exploiting a behavior of the rebalance operation which I have used when upgrading storage hardware. But now we can do it online with the ALTER DATABASE MOVE DATAFILE statement for an individual datafile on any filesystem. This is a feature that is genuinely new in the sense that it wasn’t possible to do it at all previously to 12c.

In part 4, performance and tuning enhancements.

Posted in OCIML, Oracle | Tagged , , , | Leave a comment

Oracle 12c Launch Event (2)

Continuing from the previous post, here are some more of the new features of Oracle 12c that I am excited about. The first is transparent redaction. Like the privilege analysis feature, this is something that you could implement using 11g features, such as views with inline functions or virtual columns and very careful allocation of grants, or even VPD. All of these methods carry a significant manageability overhead. Another alternative would be to do it in the application – e.g. in the middleware, performing a regexp-based substitution before sending the result to the GUI. That approach is fraught with problems – what if there is a bug and the redaction is not applied to a particular column, as it has been added recently or the app developer was unaware that it needed redacting, and of course, the app’s credentials will give it unrestricted access. The redaction facility allows us to apply a policy at the database without changing any existing applications as the datatypes will all match – a credit card number will still be a 16-digit number, stored in one, normal column in the database, and one app, say the one that talks to the payment gateway will see it in its true form, whereas another app, say the one that prints invoices, will see the redacted version. This should be of great interest to developers too according to the DRY principle.

A related feature is data masking. This is known by many names, such as anonymizing, or at one company I worked for, it was called “snipsnip”. There are many reasons that you don’t want real customer data in your dev and test environments, including that it violates the Data Protection Act. Another is the risk of using real email addresses and accidentally spamming customers or even worse, real credit card details and double-billing. And yet, there is a dilemma, since to be useful, tests must be done with representative data. Like several other 12c features, masking is a built-in mechanism that for something everyone had had to implement with a one-off solution for every database they managed. At the snipsnip company this was a major undertaking, many hours to run the script over a copy of production restored from backup, during which time it had to be securely fenced off from the rest of the development environment – any by their very nature, dev enviroments are not designed to be a securable as production, so there was a risk there. Data Masking however contains enhancement to make this process as fast as possible, for example rather than updating scattered rows in-place, it creates new contiguous blocks to store the masked rows, and to reduce maintenance overhead on hand-written scripts, it resolves foreign-key dependencies automatically. It is also possible to apply the masking policies on-the-fly during DataPump exports, so the sensitive data never even touches the development environment. However I personally prefer to refresh development environments via a production backup (which can then be masked, then cloned for rapid provisioning of development DBs). The reason is simple: it validates the backup and restore process, daily, for free. Ideally you would actually read back from the tape before it goes offsite rather than just copying the RMAN files from a disk staging area.

Continuing the security theme, the next feature to look at is the Audit Vault/Database Firewall, which are now integrated into a single product. This makes sense as really they are two sides of the same coin, validating access and detecting and logging unauthorized access. Previously I have used Imperva devices to fulfill this role – it was a while ago but I think the reason could have been the licensing cost of the Oracle solution. It will be worth revisiting in 12c to see if the value proposition stacks up now – as I often say there is no cheap or expensive in business, there in only worth the money, or not. The DB Firewall operates on a similar principle to conventional protocol aware deep packing inspecting firewalls, rather that just permitting or blocking based simply on IP address, it parses and analyzes SQL statements and accepts or rejects them accordingly. This is on top of privileges granted on tables to users inside the DB, and on top of redaction, providing defence in depth (this is what I was referring to in part 1 when I said there are better methods for securing data). Oracle does not log “insufficient privilege” errors normally, and its internal audit functions write to a table inside the database, SYS.AUD$, the firewall/vault solution neatly addresses both of these by providing an external repository that can log anything a firewall rule detects. At one site I worked at, a combination of over-zealous auditing and sloppy coding brought the datawarehouse to its knees – rather than a single select to bring back all the rows, the developer had somehow coded one select per row, and on a large table, had swamped the SYSAUX tablespace with millions of audit entries every time the ran their code! The firewall/vault solution allows us to separate these duties from the DB and where necessary for regulatory compliance, even from the DBAs.

Another kind of separation that has historically been missing from Oracle is that of the roles and responsibilities of DBAs. This is the same problem as in most (but not all) Unixes – an all-powerful root user. There was the SYSOPER role which despite the official word, I have never actually seen used anywhere. In fact connecting / as sysdba (or rman target /) seems to be the norm among working DBAs, despite the obvious potential for “fat fingers” causing catastrophe. The concept of “least privileges” in my mind as far as a trusted user like a DBA is concerned is less about security and more about protecting an administrator from himself! Anyway there are some new roles now, SYSBACKUP, SYSDG (for administering DataGuard) and SYSKM (for key management). I hope to make use of these when I have a 12c in Production 🙂

In part 3: New features in RAC, RMAN and DataGuard, and in part 4, some performance/monitoring enhancements.

Posted in Oracle | Tagged , , , , | Leave a comment

Oracle 12c Launch Event (1)

Back in July, I attended the Oracle Database 12c Launch event in London, and this is my rather belated writeup of some of the new features, some of which were introduced to me on the day, and others I had already started playing with in my Debian 12c VM.

The major new feature in 12c of course is multitenancy (MT). I have worked on big consolidation projects before, in the early 2000s at Reuters I was using VMware to crunch down 42U racks into only 3U or 6U, and the idea of a “pluggable database” will be familiar to any SQL Server DBA, but Oracle have taken these two concepts and refined them. This is of extreme interest to me when wearing my architect’s hat; one of the main problems faced when consolidating is accidentally creating physical dependencies where no logical dependency exists. E.g. if system A depends on system B, or both are used by the same end users, then they can be considered logically the “same thing” as far as scheduling downtime for upgrades or maintenance (not just the DB, the OS, the hardware, the storage, the network and everything else that actually makes a useful system). But say there is no relationship between A and B, and they have completely different sets of end users (timezones, business lines, etc). Now there is an interesting problem for the consolidator: do the benefits of consolidation actually outweigh the complexity of getting any downtime required from both communities at the same time? It is all too easy to fall into the trap of short term benefits from reduced hardware, licensing, datacentre space, and all the other reasons to consolidate, but in doing so create a system that is all but unmanageable. I see MT as another tool in the toolbox for exactly this kind of problem.

One of the use cases presented for MT was fast upgrades. In this scenario, a container database (CDB) at version X would be hosting one or more pluggable databases (PDB), and the upgrade process to version X+1 would be to create a new CDB at that level, then unplug/plug the PDBs into it. This is a very fast operation as only the metadata actually moves; if both systems can see the same storage, and if the patch is to the binaries only. Once again this is another tool in the toolbox; another approach would be to actually physically create the new version X+1 DB and replicate into it with GoldenGate† at the cost of more storage, but of course now an upgrade script can be run in the new one and both can be tested. But a better option would be a hybrid of the two approaches: use the new copy-on-write cloning mechanism offered by the MT engine, and replicate changes, or do a one-off upgrade. There is a lot more flexibility in 12c compared to 11g by the introduction of this layer between CDB and PDB. Another use case for MT is management of service levels (SLA). It is very common for an infrastructure group to offer “the business” bronze, silver and gold levels of service, where these might determine the speed at which a service can be recovered in the event of catastrophic failure, how frequently it is backed up, performance levels, etc. Therefore you can have a CDB at each level and unplug/plug DBs to move them between tiers. Sounds very easy but of course there is more to it than the DB – you would also be moving the underlying DBFs from cheaper to faster/more resilient storage if you moved from Bronze to Gold in practice! Outside of the Oracle Database Machines, I don’t know how seamless this would be, so there may be some integration effort involved. Probably it will still be faster than doing an RMAN duplicate into a new tier, but it is not quite as straightforward as the marketing blurb suggests.

Another new feature that I am excited about is Privilege Analysis. This will allow us to run an application and watch what objects it touches and fine-tune its grants accordingly – a bit like AppArmor does for applications. I am less worried these days about deliberate, malicious attempts to access data (in the context of GRANT SELECT etc, there are much bigger threats and much better strategies for mitigating them) than I am about creating accidental dependencies, e.g. app A comes to rely on tables maintained by team B, who decide they have a much better way of doing things and simply stop maintaining them, and A gets stale data back. This can be done the old-fashioned way with auditing and roles, but to be manageable, they are too coarse grained in my experience, and there is always the risk that a developer will change something without informing a DBA that new grants are needed, or using tables that a role has incidentally given them along with the one that actually needed at the time. I also think more needs to be done to educate developers that DBAs are not just the gatekeepers of data; we fulfill the vital development job of keeping track of how the system is plumbed from app to app, whereas individual developers tend to only see their app. It is not a criticism to say this, merely an observation.

Speaking of developers, the next feature on my list is pattern matching SQL. Not as in regexps which we have had for years, but patterns in data across many rows. This is radical new stuff; Oracle have shown their interest in this area with the integration of R and Exalytics but this is the first time it can be done in pure SQL, meaning fewer integration issues, less impedance mismatch (for want of a better term, between query and statistical languages) and hopefully much better performance on less hardware, since the same block buffer cache is used to fetch and process the data. The speaker presented some (obviously cherry-picked, but still impressive) code comparing the Java and SQL required to do some analysis; the SQL was much shorter and easier to read.

This is a thousand words now – in parts 2, 3 and 4 I will write about redaction policies, new features for resilience with RAC, DG and RMAN and more…

† GoldenGate and Active Dataguard licenses are now bundled.

Posted in Oracle | Tagged , , , | Leave a comment