TIP #75 Version 1.7: Refer to Sub-RegExps Inside 'switch -regexp' Bodies

This is not necessarily the current version of this TIP.


TIP:75
Title:Refer to Sub-RegExps Inside 'switch -regexp' Bodies
Version:$Revision: 1.7 $
Authors: János Holányi <csani at lme dot linux dot hu>
m13666 at maidfnzgnl dot ru
Donal K. Fellows <donal dot k dot fellows at man dot ac dot uk>
State:Draft
Type:Project
Tcl-Version:8.5
Vote:Pending
Created:Wednesday, 28 November 2001
Discussions To:http://purl.org/mini/cgi-bin/chat.cgi
Keywords:switch, regexp, parentheses

Abstract

Currently, it is necessary to match a regular expression against a string twice in order to get the sub-expressions out of the matched string. This TIP alters that so that those sub-exps can be substituted directly into the body of the script to be executed.

Rationale

Similarly to the

   regexp -- <RE> $string matchvar submatchvar ...

of Tcl and the

   interact -re <RE> {
      set matches "$interact_out(0,string) $interact_out(1,string) ..."
   }

of Tcl/Expect, it would be very helpful and would also make Tcl more consistent if the [switch] command of Tcl would support references to parenthesized REs inside the switch patterns from the bodies associated to each of the patterns. As it is, it is currently necessary to match the regular expression against the string twice to obtain this information.

Specification

The easiest way to get the information is to place it into a variable. All that remains is a way to specify which variable should receive the information. This is done by a new option to the [switch] command: -matchvar. The argument to this optiongives the name of a variable in which will be placed a Tcl list of the matches discovered by the RE engine, such that the part of the string that was matched is given by [lindex $var 0], the first parenthesis by [lindex $var 1], etc. The alternative to this is to use the name of an array, but this is more expensive.

The indices which the match occurred at can also be sometimes useful. Therefore, the new option -indexvar will also be provided which will name a variable into which a list of match indices (each a two item list of values in the same way that [regexp -indices] computes) will be placed. It will be legal for both -matchvar and -indexvar to be specified in the same [switch] command, but only if the matching mode is -regexp. (The other kinds of match modes always match against the whole string anyway.)

Both variables (if specified, of course) will contain the empty list if the default branch is taken.

Example

set string "some long complicated message"
switch -matchvar foo -indexvar bar -regexp -- $string {
   {\w*(e)\w*} {
      puts "matched [lindex $foo 0] with 'e' at [lindex $bar 1 0]"
   }
   default {
      puts "no words containing a letter 'e' at all"
   }
}

Alternatives

Actually, no new syntax is needed to achieve the mentioned ability. The solution could adopt the behavior of [regsub] (description taken from regsub(n)):

If subSpec contains a `&' or `\0', then it is replaced in the substitution with the portion of string that matched exp. If subSpec contains a `\n', where n is a digit between 1 and 9, then it is replaced in the substitution with the portion of string that matched the n-th parenthesized subexpression of exp. Additional backslashes may be used in subSpec to prevent special interpretation of `&' or `\0' or `\n' or backslash.

This has the disadvantage of being incompatible with existing code that makes use of the -regexp option to [switch] and which may well have characters matching the above sequences inside already.

Reference Implementation

Not yet...

Copyright

This document has been placed in the public domain.


Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows