TIP #185 Version 1.4: Null Handling

This is not necessarily the current version of this TIP.


TIP:185
Title:Null Handling
Version:$Revision: 1.4 $
Authors: John H. Harris <JHHarris at valley dot net>
Zubair <sheik_zubairahmed at yahoo dot com>
State:Draft
Type:Project
Tcl-Version:8.5
Vote:Pending
Created:Thursday, 08 April 2004
Keywords:Tcl, absent value

Abstract

can I use null statements within switch. Give example

Rationale

Tcl deals with strings, the universal medium for representing data. It lacks, however, the intrinsic ability to represent missing data, or nulls. A <b style="color:black;background-color:#ffff66">null</b> datum (or just "<b style="color:black;background-color:#ffff66">null</b>") is very different from an empty string. A database may represent Prince's last name as "" -- his name is know and it is an empty string, but if a respondent forgets to give his weight on a questionnaire, he is not weightless; his weight is not ""; it is unknown, or <b style="color:black;background-color:#ffff66">null</b>. Nulls are common in real-world data, particularly data obtained from interviews or questionnaires. Because of this, most modern database engines and statistics tools recognize nulls. A large fraction of the applications we are writing are client programs to such databases, though the user is not always aware of it.

The programmer can use whatever is provided with the database application program interface (API) he is using, but most Tcl APIs provide none, probably because Tcl itself lacks nulls.

A Tcl programmer writing an interface to a database must improvise something to deal with nulls. If the data representation domain does not include an empty string, as for an integer or date, then an empty string may suffice, but arbitrary-string data is common, so something fancier is often needed.

For example, here is an approach I used in a current project built on SQLite [1]. I prepend each nonnull string with an apostrophe. Thus, nulls are universally represented by an empty string. I accomplish this using the coalesce() function provided by SQLite, e.g., if I want

  select id, name from persons

I must ask for

  select id, coalesce(''||name,'') as name from persons

I wrote a layer that automates this ugly mess. To keep it simple (and because SQLite is untyped) it prepends all fields with an apostrophe, even ones that do not need it. I deal with the apostrophes later -- testing for nullness is easy, but the apostrophe gets in the way for most other I/O and processing.

This is more than you wanted to know about my personal problems, but you can be sure this kind of thing is happening over and over with hundreds of programmers and applications.

Now let us imagine how it might be simplified, assuming Tcl <b style="color:black;background-color:#ffff66">null </b>support and a revamped SQLite API:

  set result [ db eval {select id, name from person}
  foreach -<b style="color:black;background-color:#ffff66">null</b> <unknown> { id name } $result {
     puts "$id: $name" }

I will explain more later, but basically, the database returns <b style="color:black;background-color:#ffff66">null </b>information on every query, so there is no fancy SQL code. The -<b style="color:black;background-color:#ffff66">null</b> option tells foreach how to represent a <b style="color:black;background-color:#ffff66">null</b> when it assigns id and name.

Specification

It would be nice if <b style="color:black;background-color:#ffff66">null</b> handling could be added to, or with, a single command suite. If we could just extend, say, the dict command, these objects would be ideal media for database APIs. Unfortunately, it seems impossible to implement any change with out involving Tcl's string implementation, list syntax, and command interpreter.

Representing Nulls

Not all commands will understand nulls. We will speak of <b style="color:black;background-color:#ffff66">null</b>-smart and <b style="color:black;background-color:#ffff66">null</b>-dumb commands. All commands follow these rules:

  1. Nulls represent a lack of information.

  2. An empty string is not <b style="color:black;background-color:#ffff66">null</b>.

  3. Nulls can not be made "unnull", merely by being processed. This is equivalent to creating information out of thin air. Substituting a nonnull value for a <b style="color:black;background-color:#ffff66">null</b> must programmed explicitly.

  4. For <b style="color:black;background-color:#ffff66">null</b>-smart commands, nulls propagate. A <b style="color:black;background-color:#ffff66">null</b> combined with any nonnull is <b style="color:black;background-color:#ffff66">null</b>. Appending a <b style="color:black;background-color:#ffff66">null</b> to a string, or substituting a <b style="color:black;background-color:#ffff66">null</b> into a string nulls the entire string.

  5. Logical comparisons with nulls with respect to magnitude or identity evaluate to <b style="color:black;background-color:#ffff66">null</b> (i.e., unknown).

  6. <b style="color:black;background-color:#ffff66">Null</b>-dumb commands must treat nulls as empty strings.

Strings Objects

An empty string is not a <b style="color:black;background-color:#ffff66">null</b>. We have the further constraint that we must be able to handle any arbitrary binary string, including <b style="color:black;background-color:#ffff66">null</b> (in the sense of 0x00) bytes. Clearly the implementation must be augmented; we must add a <b style="color:black;background-color:#ffff66">null</b> flag. In implementation, <b style="color:black;background-color:#ffff66">null</b> strings would probably have their string part set to an empty string, to accommodate <b style="color:black;background-color:#ffff66">null</b>-dumb commands, as in rule 6 above. Thus, dumb commands can simply ignore the <b style="color:black;background-color:#ffff66">null</b> flag.

List Syntax

It is easy to add a <b style="color:black;background-color:#ffff66">null</b> flag to the string class. To be of any use, however, we need to pass a <b style="color:black;background-color:#ffff66">null</b> string to commands, which means embedding nulls in lists. We can indicate this much in the same way the {expand}{} syntax TIP #157 works, by exploiting otherwise illegal list syntax. I propose using {<b style="color:black;background-color:#ffff66">null</b>}! as <b style="color:black;background-color:#ffff66">null</b> representation for lists. (Perhaps new syntax could be a rider on the {expand}{} rule. I fear that increasing the number of rules could scuttle this TIP.) At present this syntax (like {expand}{}) produces an "extra characters after close-brace" error.

Note that the string "{<b style="color:black;background-color:#ffff66">null</b>}!" is not interpreted as a <b style="color:black;background-color:#ffff66">null </b>string, instead it is a nonnull string that is also a well-formed list with one <b style="color:black;background-color:#ffff66">null</b> element.

Interpreter Behavior

If the command interpreter encounters the word "{<b style="color:black;background-color:#ffff66">null</b>}!" in a list, this element is replaced by a <b style="color:black;background-color:#ffff66">null</b> string in the array of strings passed to the command.

When the Tcl command interpreter encounters a word that contains a variable substitution, if the variable contains a <b style="color:black;background-color:#ffff66">null</b>, the interpreter will behave as if the entire word were replaced with "{<b style="color:black;background-color:#ffff66">null</b>}!", and pass a <b style="color:black;background-color:#ffff66">null</b> string to the command in its place.

Command substitutions behave the same way when a command returns a <b style="color:black;background-color:#ffff66">null</b>.

Manipulating Nulls

We need not be decided how commands should respond to nulls all at once. Indeed, it would be better to let this evolve as we gain experience with this new dimension of data. At first we will need at least some basic support to create, copy and test nulls.

The String Command

The simplest approach is simply to test for nullness. The obvious candidate for this test would be an addition to the string is command suite:

  if { [ string is <b style="color:black;background-color:#ffff66">null</b> $s ] } {
     error {Missing data for s.} }

The string command could also be used to generate nulls:

  set <b style="color:black;background-color:#ffff66">null</b> [ string <b style="color:black;background-color:#ffff66">null</b> ]

The Set Command

We can use set to create a variable that contains a <b style="color:black;background-color:#ffff66">null</b> string:

   % set s {<b style="color:black;background-color:#ffff66">null</b>}!

Now s exists and has its <b style="color:black;background-color:#ffff66">null</b> flag set. This is such a natural syntax that it probably make string <b style="color:black;background-color:#ffff66">null</b> unnecessary.

We can retrieve a <b style="color:black;background-color:#ffff66">null</b> value with set -- suppose puts is dumb and sees the <b style="color:black;background-color:#ffff66">null</b> as an empty string:

  % puts "s equals [ set s ]"

Set returns a <b style="color:black;background-color:#ffff66">null</b>. In this case, the <b style="color:black;background-color:#ffff66">null</b> is in a command substitution, and it nulls the entire string being passed to puts, which, being <b style="color:black;background-color:#ffff66">null</b>-dumb, outputs an empty string and newline. This may be unsatisfactory. We would rather substitute a nonnull, perhaps an empty string, or "<NULL>".

When and what to substitute is an ad hoc programming choice, so should be an option. Here is the set command with an option that tells how to represent a <b style="color:black;background-color:#ffff66">null</b>:

  % set s {<b style="color:black;background-color:#ffff66">null</b>}!
  % set -<b style="color:black;background-color:#ffff66">null</b> <NULL> s
  <NULL>
  % set -<b style="color:black;background-color:#ffff66">null</b> huh? s
  huh?

The default value for this option is a <b style="color:black;background-color:#ffff66">null</b>, so [set s] can be used as a direct substitution for $s, as you would expect.

While we must allow for the worst case of representing a <b style="color:black;background-color:#ffff66">null</b> amidst the set of arbitrary binary strings, in practical data this seldom occurs. When it does, we must resort to an explicit test and conditional execution, but more often there is some gap in the domain of valid data that can be used to represent a <b style="color:black;background-color:#ffff66">null</b>. We have already seen the example of using and empty string for a <b style="color:black;background-color:#ffff66">null</b> integer or date. For other data types there are better choices. The programmer knows these gaps and can choose a string that fits in the gap and is also easily understood by humans or other software.

The -<b style="color:black;background-color:#ffff66">null</b> option should have the same meaning when applied to any <b style="color:black;background-color:#ffff66">null</b>-smart command: if the return value is a <b style="color:black;background-color:#ffff66">null</b>, change it to the option value.

The set command can use a similar option to assign a <b style="color:black;background-color:#ffff66">null</b>. This time the option -nullify tells what value, by exact match, should be replaced by a <b style="color:black;background-color:#ffff66">null</b>.

  % set s <b style="color:black;background-color:#ffff66">NULL
</b>|  <b style="color:black;background-color:#ffff66">NULL
</b>|  % set -nullify <b style="color:black;background-color:#ffff66">NULL</b> t $s
  % puts '[ set -<b style="color:black;background-color:#ffff66">null</b> Void t ]'
  'Void'

Notice that string s above is not a <b style="color:black;background-color:#ffff66">null</b>, it is the string "<b style="color:black;background-color:#ffff66">NULL</b>". The second command translates it to a true <b style="color:black;background-color:#ffff66">null</b>, and the third translates it to "Void".

The -nullify option provides a way of assigning a <b style="color:black;background-color:#ffff66">null</b> to a variable that is independent of list syntax:

  % set -nullify {} s {}

The -nullify option should have the same meaning when applied to any <b style="color:black;background-color:#ffff66">null</b>-smart command: if an argument value is an exact match to the options value, change it to a <b style="color:black;background-color:#ffff66">null</b>.

List Commands

Most smart commands can just test the <b style="color:black;background-color:#ffff66">null</b> flag of their arguments and take appropriate action. A few commands, such as lindex, lset, join, and split must understand list <b style="color:black;background-color:#ffff66">null </b>syntax. Consider the string:

  % set s {a {b0 {<b style="color:black;background-color:#ffff66">null</b>}! b2} c}

This string represents a list whose second element is a list containing a <b style="color:black;background-color:#ffff66">null</b>. We expect this behavior:

  % lindex -<b style="color:black;background-color:#ffff66">null</b> ~ $s 1 1 
  ~
  % lset -nullify {} s 1 2 {}
  a {b0 {<b style="color:black;background-color:#ffff66">null</b>}! {<b style="color:black;background-color:#ffff66">null</b>}!} c
  % lindex -<b style="color:black;background-color:#ffff66">null</b> NUL $s 1 2
  NUL

Join creates a string from a list and must understand -<b style="color:black;background-color:#ffff66">null</b>. Notice that the second command below returns <b style="color:black;background-color:#ffff66">null</b>, by rule 4.

  % join -<b style="color:black;background-color:#ffff66">null</b> void {a {<b style="color:black;background-color:#ffff66">null</b>}!}
  a void
  % join {a {<b style="color:black;background-color:#ffff66">null</b>}!}

Split creates a list from a string and must understand -nullify:

  % split {a {<b style="color:black;background-color:#ffff66">null</b>}!}
  a {{<b style="color:black;background-color:#ffff66">null</b>}!}
  % split -nullify <b style="color:black;background-color:#ffff66">NULL</b> {a <b style="color:black;background-color:#ffff66">NULL</b>}
  a {<b style="color:black;background-color:#ffff66">null</b>}!

Expr and Control-flow Commands

Nulls can be tested using string is <b style="color:black;background-color:#ffff66">null</b>, but testing occurs so often in practice that we need to have expr and the control commands behave properly, allowing three-valued logic -- true, false, and <b style="color:black;background-color:#ffff66">null</b>. (In this context, a logic value of <b style="color:black;background-color:#ffff66">null</b> is often called "unknown".) Expr should recognise the -<b style="color:black;background-color:#ffff66">null</b> option. These examples illustrate typical three-valued logic tautologies:

  % set u {<b style="color:black;background-color:#ffff66">null</b>}!
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u }
  unk
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u && 1 }
  1
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u && 0 }
  unk
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u || 0 }
  0
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u == $u }
  unk
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u != 1 }
  unk
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u eq {} }
  unk
  % expr -<b style="color:black;background-color:#ffff66">null</b> unk { $u > 0 }
  unk

Notice that, logic expressions containing nulls may may have nonnull results. This my seem like a violation of rule 5, but actually is is just a special kind of lazy logic. Expr can simply ignore the <b style="color:black;background-color:#ffff66">null</b> term because it is tautologously irrelevant.

The control commands if, while, and for all throw errors if the expression evaluates to <b style="color:black;background-color:#ffff66">null</b>.

We can also use nulls to represent undefined mathematical results, or NaNs (not a number), in the terminology of IEEE 754 floating point arithmetic [2]. This allows Tcl to give the programmer access to hardware features that are currently hidden. To do this we need a -nocomplain option:

  % expr -nocomplain -<b style="color:black;background-color:#ffff66">null</b> nan { log(-1) }
  nan

We can use the -<b style="color:black;background-color:#ffff66">null</b> option to assign a usable value if a goes to zero:

  % set INF 1e16
  % set x [ expr -nocomplain -<b style="color:black;background-color:#ffff66">null</b> $INF { 5 / $a } ]

Nulls can propagate sensibly through a computation and give a useful result when, without them, the same expression would have throw an exception. Again, suppose a goes to zero:

  % expr -nocomplain -<b style="color:black;background-color:#ffff66">null</b> unk { $x / $a > 0.54 || $c < 0.042 }
  1

Other Basic Commands

We need not make all commands <b style="color:black;background-color:#ffff66">null</b>-smart immediately. Old commands can treat nulls as empty strings and function as before. They may be less useful than they could be, but nothing need break, because old code contains nothing that would introduce nulls in the first place. It is up to the programmer who decides to use Tcl's <b style="color:black;background-color:#ffff66">null</b> facility to be aware of which commands respond intelligently to nulls. Perhaps a greater danger is that, once nulls are introduced, evolving commands may break applications that use nulls. Because of this, it would be wise to choose the initial set of <b style="color:black;background-color:#ffff66">null</b>-smart commands with care.

I propose modifying the following commands, if necessary, to respond to nulls appropriately: lset, foreach, format, the list commands, return, set, subst, string, <b style="color:black;background-color:#a0ffff">switch</b>. Except as noted below, this means recognizing the -<b style="color:black;background-color:#ffff66">null</b> option.

Now let us discuss the modifications needed for these commands.

String command suite

Except for string is commands, all string subcommands should return <b style="color:black;background-color:#ffff66">null</b> if any argument is <b style="color:black;background-color:#ffff66">null</b>, simply because the commands are meaningless when applied to a <b style="color:black;background-color:#ffff66">null</b> string.

Note that a <b style="color:black;background-color:#ffff66">null</b> string is not unequal to any given string. Batman may or may not be Bruce Wayne -- his identity is unknown. Similarly, two <b style="color:black;background-color:#ffff66">null</b> strings are neither equal nor unequal to each other. Batman may or may not be the same guy as the Lone Ranger.

String is

Except for string is <b style="color:black;background-color:#ffff66">null</b>, all string is subcommands return 0 with a <b style="color:black;background-color:#ffff66">null</b> argument.

Switch

The <b style="color:black;background-color:#a0ffff">switch</b> command should recognize the -<b style="color:black;background-color:#ffff66">null</b> option. Having a <b style="color:black;background-color:#a0ffff">switch</b> leg that matches <b style="color:black;background-color:#ffff66">null</b> is easy to imagine:

  <b style="color:black;background-color:#a0ffff">switch</b> $s {
     a { do this }
     b { do that }
     {<b style="color:black;background-color:#ffff66">null</b>}! { punt } }

but by rule 5 it would never execute -- nothing matches a <b style="color:black;background-color:#ffff66">null</b>. Instead, use -<b style="color:black;background-color:#ffff66">null</b>:

  <b style="color:black;background-color:#a0ffff">switch</b> -<b style="color:black;background-color:#ffff66">null</b> <b style="color:black;background-color:#ffff66">NULL</b> $s {
     a { do this }
     b { do that }
     <b style="color:black;background-color:#ffff66">NULL</b> { punt } }

If you must match any other nonnull string separately from nulls, use -glob and catch the <b style="color:black;background-color:#ffff66">null</b> with default:

  <b style="color:black;background-color:#a0ffff">switch</b> -glob -<b style="color:black;background-color:#ffff66">null</b> <b style="color:black;background-color:#ffff66">NULL</b> -- $s {
     a { do this }
     b { do that }
     * { any other nonnull }
     default { punt } }
Foreach

The Foreach command recognizes both -<b style="color:black;background-color:#ffff66">null</b> and -nullify.

Format

The format command should understand the -<b style="color:black;background-color:#ffff66">null</b> option.

The list commands

In general, all the list commands treat nulls as distinct elements. Nulls never disappear from a list. llength includes them in the count. Commands that convert lists to or from strings (lappend, lindex, linsert, list, lset, split, join) should understand the -<b style="color:black;background-color:#ffff66">null</b> option. Those that stay in the list domain (concat, lrange) should not.

lsort

If a list contains a <b style="color:black;background-color:#ffff66">null</b> it cannot be sorted -- lsort returns a <b style="color:black;background-color:#ffff66">null</b> string. With the -<b style="color:black;background-color:#ffff66">null</b> option, it can be sorted after substituting the -<b style="color:black;background-color:#ffff66">null</b> option value.

Issues and Extensions

In this section we discuss design issues that are either debatable or not entirely resolved.

Discussion

References

[[1]]

SQLite

[[2]]

IEEE Floating point

[[3]]

MySQL

Copyright

This document has been placed in the public domain.


Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows