When dealing with relational databases, there are three major types of data structure/objects that we need. The actual names vary between platform and API, but generally there will be, in order of creation:
- A connection. This is often called
connin the code. There will typically be one per thread per database, and there will be a corresponding object on the database server. In Oracle terms those would be called shadow processes and could be observed in the views
v$session. Creating one requires establishing a connection to the database (e.g. over TCP/IP) that will persist for the lifetime of the thread and authenticating.
- A statement handle. This is often called
cur. Creating a statement handle allocates whatever data structures are necessary on the client and the server to actually execute a SQL statement. Usually it is possible to create many statement handles per connection. For a long time a well known issue with JDBC was that it wouldn’t release statement handles which constituted a space leak (in Oracle terms, hitting the
open_cursorslimit, which prevents a runaway application exhausting memory on the server).
- A result set. This is actual tabular data returned by executing a
SELECTstatement. Typically called
rsin code, this is usually a list of rows, each row being a list or array of values.
Some of this terminology varies depending on exactly how the API works, e.g. creating a statement handle may be implicit in generating a result set, or a result set might actually be a cursor that steps through the result set a row at a time (fetching one row each time it is called and then raising an exception such as EOF at the end) rather than a list of rows delivered before moving on to the next statement. Anyway.
The proper scope of a result set it obvious: it lives within the function that executes the query and processes the results that are returned. That processing may involve poking some of the values into a different or wider scope (e.g. using them to update a GUI). Refactoring may necessitate a function returning a result set (e.g. a function that decides which SQL to run) but IMHO it would be cleaner for a function like that to return the SQL only and for the scope of the result set to remain entirely within the calling function.
The scope of a statement handle is very nearly as obvious; while it can be passed around as an argument it doesn’t make sense to do so; they naturally precede result sets, they’re cheap to create and destroy (in comparison to establishing a connection or running a query), and while they can be reused it only makes sense to do so by running several queries in sequence in the same function; using an existing statement handle to start a new query having only partially consumed its last result set is bound to cause mysterious problems, especially in the case that for each row of one result set, another function is called that may execute some SQL, if it takes the calling function’s statement handle as an argument.
The question is, what should the scope of a connection be? It doesn’t make sense to create a new connection for every new query; firstly it’s usually an expensive operation, involving network roundtrips, forking a process on the server, etc. Secondly it is helpful for both the database and the DBA who looks after it to be able to account for the total activity of any application. And thirdly, doing so would raise the same question of scope, but for the authentication credentials rather than the connection object itself!
In a single-threaded application, it would be easy to make
lda a named global variable and for each function to just assume it. I don’t like that as it discourages code reuse. It makes functions less “pure” – I know that a function that does I/O via the DB is impure anyway, but assuming the underlying tables don’t change, then executing identical SQL ought to result in identical results being returned, so a function that takes as an argument a value that is used as a bind variable ought to see the same results without relying on any global state.
The convention I am using at the moment makes the connection scope the main function and also passes it as an argument to each function that will execute SQL. E.g. (in pseudo-OCaml):
let function1 lda a b c = let sth = occi_create_statement lda "" in let (_, _, rs) = ora_select sth "some sql..." [|a; b; c|] ... (* main *) let _ = let lda = occi_connect username password database () in let x = function1 lda a b c in function2 lda x; function3 lda; occi_disconnect lda
myFunction lda = do run lda "insert into mytable values (1)"  commit lda main = handleSqlError $ do lda <- connectSqlite3 "test.db" myFunction lda disconnect lda
The proper Haskell solution probably involves the State monad somehow…
You might want to read this: http://blog.ezyang.com/2010/08/type-kata-controlled-sharing-of-references/
Also, this is a simpler type kata to maybe (maybe) shed some light on the technique above: http://blog.ezyang.com/2010/08/type-kata-newtypes/
Thanks – looks like some serious study is in order there!
Pingback: Scope in database code (2) | So I decided to take my work back underground
Pingback: LDAP (1) / Scope in database code (3) | So I decided to take my work back underground