TIP #122 Version 1.2: Use tcl_{non,}wordchars Throughout Tcl/Tk

This is not necessarily the current version of this TIP.


TIP:122
Title:Use tcl_{non,}wordchars Throughout Tcl/Tk
Version:$Revision: 1.2 $
Authors: Martin Weber <ephaeton at gmx dot net>
Martin Weber <Ephaeton at gmx dot net>
State:Draft
Type:Project
Tcl-Version:8.5
Vote:Pending
Created:Thursday, 12 December 2002

Abstract

This TIP shall bring flexible management of word and non-word chars to Tcl, to be used throughout the Tcl realm in e.g. [regexp]'s \w \W, Tk's [textwidget], etc.

Specification

Assignment to tcl_{non,}wordchars shall influence any place in Tcl which decides whether something is a word character or not, including detection of word boundaries in e.g. regular expressions, Tk's text widget and so on.

For this there shall be no hard-coding of lists of values which are word and non-word characters, and neither shall the language rely on the language of implementation (i.e. C's is*() functions), as this disallows dynamic changing of tcl_{non,}wordchars.

Rather shall the value(s) of tcl_{non,}wordchars be used to determine whether a given character is part of a word or not.

Rationale

Currently in Tcl there are different hard-coded ways to decide whether a certain character is a word character or a non word character. Different hard-coded ways also imply that changes on one side might not get over to the other side, so there soon are different hard-coded ways which yield different hard-coded results. As a inference of it being hard-coded, this also means that there is no way to change or fix that potentially broken behavior. Having Tcl lookup the values of those variables at runtime allows for the needed flexibility, both when dealing with nonstandard demands and nonstandard character sets.

As an example of the breakage, you can assign a regular expression to tcl_{,non}wordchars, and the double click binding in the textwidget will regard that pattern when marking a "whole word". When you try to ask the text widget to deliver the data under a certain coordinate with the indices 'wortstart' and 'wordend', the value of tcl_{non,}wordchars is not used though.

There may be a problem with the performance of the lookup, but on the other hand are C's is*() functions also implemented via a table lookup. An installation of a caching static character table could guarantee the needed performance.

Implementation

None yet.

Copyright

This document is placed in the public domain.


Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows