<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TIP SYSTEM "http://tcl.activestate.com/cgi-bin/tct/tip/tipxml.dtd">
<!-- Converted at Wed May 16 19:53:10 GMT 2012 -->
<!-- TIP AutoGenerator - written by Donal K. Fellows -->

<TIP number='114'>
<header><title>Eliminate Octal Parsing of Leading Zero Integer Strings</title><author address="mailto:dgp@users.sf.net">Don Porter</author><status type='project' state='draft' tclversion="9.0" vote='after'>$Revision: 2.3 $</status><history></history><created day='16' month='oct' year='2007' /><keyword>octal</keyword></header>
<abstract>This TIP proposes elimination of Tcl&apos;s practice of using octal notation to interpret a string with a leading zero when an integer value is expected.</abstract>
<body><section title="History and Rationale">
<para>There are several places in the syntax of several Tcl commands where an integer value may be accepted. Routines such as <emph style="bold">Tcl_GetInt()</emph> perform the task of parsing an integer value from the string value in these places. Ultimately, these routines have been built on C standard library functions such as <emph style="bold">strtol()</emph>. Due to this implementation choice, Tcl integer parsing has inherited features from <emph style="bold">strtol()</emph> including the feature that a leading zero in a string has been taken as a signal that the string is an integer value in octal format.</para>
<para>Several programmers and programs have hit this feature by surprise, resulting in nasty bugs such as:</para>
<verbatim><vline encoding='base64'>ICUgcHJvYyBtIGRhdGUgew==</vline><vline encoding='base64'>ICAgICAgbGFzc2lnbiBbc3BsaXQgJGRhdGUgLV0geSBtIGQ=</vline><vline encoding='base64'>ICAgICAgcmV0dXJuIFtzdHJpbmcgaW5kZXggX0pGTUFNSkpBU09ORCAkbV0=</vline><vline encoding='base64'>ICAgfQ==</vline><vline encoding='base64'>ICUgbSAyMDA3LTAyLTE0</vline><vline encoding='base64'>IEY=</vline><vline encoding='base64'>ICUgbSAyMDA3LTEyLTI1</vline><vline encoding='base64'>IEQ=</vline><vline encoding='base64'>ICUgbSAyMDA3LTA5LTI2</vline><vline encoding='base64'>IGJhZCBpbmRleCAiMDkiOiBtdXN0IGJlIGludGVnZXI/WystXWludGVnZXI/IG9yIGVuZD9bKy1daW50ZWdlcj8gKGxvb2tzIGxpa2UgaW52YWxpZCBvY3RhbCBudW1iZXIp</vline></verbatim>
<para>There are very few places in Tcl scripts where this feature is actually useful. Octal format for integers simply isn&apos;t encountered all that often in most programming tasks tackled by Tcl scripts. The main counterexample is the use of octal format integers to describe filesystem permissions on unix systems. The Tcl commands that operate on filesystem permission values are <emph style="bold">open</emph> and <emph style="bold">file attributes</emph>, and it is a simple matter to directly code them to recognize octal format, rather that have then rely on octal parsing as a general integer value recognition feature. On the HEAD, these commands have already been so revised. With those few cases accounted for, it&apos;s been observed that removing this feature of Tcl integer parsing &quot;will likely fix more scripts than it breaks.&quot;</para>
<para>The opportunity to make this change in Tcl 8.5 arises because we&apos;ve already replaced our old parsing routines based on <emph style="bold">strtol()</emph> with our own number parser <tipref type="text" tip="249"/>.</para>
</section>
<section title="Proposal">
<para>Revise all integer parsing in Tcl by making modifications to the <emph style="bold">TclParseNumber()</emph> routine. With reference to the state machine graph in <tipref type="text" tip="249"/>, we change the exit edges of state <emph style="italic">integer[1]</emph>. Characters <emph style="bold">0</emph> - <emph style="bold">7</emph> and characters <emph style="bold">8</emph> - <emph style="bold">9</emph> should now lead to state <emph style="italic">integer[4]</emph>, so that they continue decimal parsing, and not octal parsing. The states <emph style="italic">integer[2]</emph> and <emph style="italic">error[5]</emph> will now be accessible only if the character <emph style="bold">o</emph> or <emph style="bold">O</emph> is seen while in state <emph style="italic">integer[1]</emph> and there will no longer be any exit from those states when the characters <emph style="bold">.</emph> or <emph style="bold">e</emph> or <emph style="bold">E</emph> are observed.</para>
<para>This change to <emph style="bold">TclParseNumber()</emph> is achieved with a <emph style="bold">#define KILL_OCTAL</emph> in the file <emph style="italic">tclStrToD.c</emph>.</para>
</section>
<section title="Compatibility">
<para>This change is an incompatibility. It&apos;s long been believed that such a change should not happen until Tcl 9 because of this, but over time the consensus belief has developed that far fewer programs and programmers will be harmed by the incompatibility than will be helped by removing the misfeature.</para>
<para>That said, the incompatibility is serious. The same string in the same place in a script can now have a completely different meaning. Before the change:</para>
<verbatim><vline encoding='base64'>ICUgbGluZGV4IHthIGIgYyBkIGUgZiBnIGggaSBqIGt9IDAxMA==</vline><vline encoding='base64'>IGk=</vline></verbatim>
<para>After the change:</para>
<verbatim><vline encoding='base64'>ICUgbGluZGV4IHthIGIgYyBkIGUgZiBnIGggaSBqIGt9IDAxMA==</vline><vline encoding='base64'>IGs=</vline></verbatim>
<para>This is not the usual situation where new feature causes scripts that were an error to become non-errors -- a compatible change.</para>
<para>This is also not a situation where a change causes legal scripts to become errors. Such a change would break scripts, but would at least leave behind scripts that raise noisy errors alerting about the breakage.</para>
<para>This is the most serious kind of incompatibility, where we replace a working script with another working script that does something completely different. An illustration of the problem from Tcl&apos;s own test suite highlights the danger. Some of Tcl&apos;s tests in <emph style="bold">io.test</emph> depend on the umask value, so that value is captured:</para>
<verbatim><vline encoding='base64'>IHNldCB1bWFza1ZhbHVlIFtleGVjIC9iaW4vc2ggLWMgdW1hc2td</vline></verbatim>
<para>Note that the shell command <emph style="bold">umask</emph> returns a mask value as an integer in octal format. The test suite has relied on Tcl&apos;s built-in ability to recognize this format, and the expected result of test <emph style="bold">io-40.3</emph> has been computed:</para>
<verbatim><vline encoding='base64'>IGZvcm1hdCAlMDRvIFtleHByIHswNjY2ICYgfiR1bWFza1ZhbHVlfV0=</vline></verbatim>
<para>After the proposed change, <emph style="italic">$umaskValue</emph> is treated as a decimal number, and the wrong expected result is computed. (This test has already been updated on the HEAD to avoid such problems.)</para>
<para>It is not difficult to imagine more serious problems in scripts that make use of the result returned by the shell command <emph style="bold">umask</emph> where a file might be created or modified with completely unintended permissions as a result of the proposed change. Such scripts might easily raise security concerns.</para>
<para>Even in the light of the judgment that such (hopefully rare) compatibility issues are acceptable in exchange for the benefits of purging the misfeature, we really ought to consider seriously how we can alert those migrating to Tcl 8.5 to this possibility and to the need to examine their scripts for this issue.</para>
<para>Besides the impact on Tcl commands, this change may also cause incompatibilities in extensions, to the extent their commands rely on Tcl&apos;s integer parsing to support octal notation.</para>
</section>
<section title="Rejected Alternatives">
<para>Motivated largely by the serious incompatibilities lurking here, a few people have suggested that some means be provided to toggle Tcl&apos;s integer parsing behavior between two modes, one which recognizes octal and one which does not. This idea appears inspired in part by the <emph style="bold">::tcl_precision</emph> variable, which has long exercised control over Tcl&apos;s floating point number formatting.</para>
<para>While the motivation may be well-intended, this proposal is basically unworkable, and can&apos;t really help anybody. The point of the proposed change is to make simple code work as simple coders expect it to. Our original example proc can already be corrected like so:</para>
<verbatim><vline encoding='base64'>IHByb2MgbSBkYXRlIHs=</vline><vline encoding='base64'>ICAgIHNjYW4gJGRhdGUgJWQtJWQtJWQgeSBtIGQ=</vline><vline encoding='base64'>ICAgIHJldHVybiBbc3RyaW5nIGluZGV4IF9KRk1BTUpKQVNPTkQgJG1d</vline><vline encoding='base64'>IH0=</vline></verbatim>
<para>The point of this proposal is to make the original code just work. It doesn&apos;t help to offer this complexity as a solution:</para>
<verbatim><vline encoding='base64'>ICUgcHJvYyBtIGRhdGUgew==</vline><vline encoding='base64'>ICAgICAgc2V0IG1vZGUgW3RjbDo6dW5zdXBwb3J0ZWQ6Om9jdGFsXQ==</vline><vline encoding='base64'>ICAgICAgdGNsOjp1bnN1cHBvcnRlZDo6b2N0YWwgb2Zm</vline><vline encoding='base64'>ICAgICAgbGFzc2lnbiBbc3BsaXQgJGRhdGUgLV0geSBtIGQ=</vline><vline encoding='base64'>ICAgICAgc2V0IHJlc3VsdCBbc3RyaW5nIGluZGV4IF9KRk1BTUpKQVNPTkQgJG1d</vline><vline encoding='base64'>ICAgICAgdGNsOjp1bnN1cHBvcnRlZDo6b2N0YWwgJG1vZGU=</vline><vline encoding='base64'>ICAgICAgcmV0dXJuICRyZXN1bHQ=</vline><vline encoding='base64'>ICAgfQ==</vline></verbatim>
<para>No one would choose that over just fixing the code to use <emph style="bold">scan</emph>, in which case this proposal won&apos;t be needed. Also, this kind of management of a shared mode setting cannot (easily and cheaply) be avoided, because at the point we need to control the Tcl number parser, the most specific context we have is the thread, so the mode has to be set thread-wide.</para>
<para>Likewise, the (hopefully rare) set of scripts that would actually want to turn octal parsing back on are not going to announce themselves. In order to know that </para>
<verbatim><vline encoding='base64'>IHRjbDo6dW5zdXBwb3J0ZWQ6Om9jdGFsIG9u</vline></verbatim>
<para>needs to be added to a script to make it function correctly, some kind of audit has to reach that conclusion, and once that conclusion is reached and the issues are understood, it&apos;s just as easy to insert <emph style="bold">scan %o</emph> in the proper places as it would be to insert the <emph style="bold">tcl::unsupported::octal</emph> stopgap.</para>
<para>In short, any coder finding themselves in a position to consider using a <emph style="bold">tcl::unsupported::octal</emph> tool, would quickly decide not to use it in favor of just fixing their code. Thus users of this feature are mythical, and it will not be implemented.</para>
</section>
<section title="Note">
<para>This TIP has been <emph style="italic">explicitly</emph> rejected as a feature for Tcl 8.5. Consensus was that the type of breakage it inherently induces is not acceptable in a minor version change.</para>
</section>
<section title="Copyright">
<para>This document is placed in the public domain.</para>
</section>
</body></TIP>

