This is not necessarily the current version of this TIP.
| TIP: | 75 |
| Title: | Refer to Sub-RegExps Inside 'switch -regexp' Bodies |
| Version: | $Revision: 1.6 $ |
| Authors: |
János Holányi <csani at lme dot linux dot hu> m13666 at maidfnzgnl dot ru Donal K. Fellows <donal dot k dot fellows at man dot ac dot uk> |
| State: | Draft |
| Type: | Project |
| Tcl-Version: | 8.5 |
| Vote: | Pending |
| Created: | Wednesday, 28 November 2001 |
| Discussions To: | http://purl.org/mini/cgi-bin/chat.cgi |
| Keywords: | switch, regexp, parentheses |
Currently, it is necessary to match a regular expression against a string twice in order to get the sub-expressions out of the matched string. This TIP alters that so that those sub-exps can be substituted directly into the body of the script to be executed.
Similarly to the
regexp -- <RE> $string matchvar submatchvar ...
of Tcl and the
interact -re <RE> {
set matches "$interact_out(0,string) $interact_out(1,string) ..."
}
of Tcl/Expect, it would be very helpful and would also make Tcl more consistent if the [switch] command of Tcl would support references to parenthesized REs inside the switch patterns from the bodies associated to each of the patterns. As it is, it is currently necessary to match the regular expression against the string twice to obtain this information.
The easiest way to get the information is to place it into a variable. All that remains is a way to specify which variable should receive the information. This is done by a new option to the [switch] command: -matchvar. The argument to this optiongives the name of a variable in which will be placed a Tcl list of the matches discovered by the RE engine, such that the part of the string that was matched is given by [lindex $var 0], the first parenthesis by [lindex $var 1], etc. The alternative to this is to use the name of an array, but this is more expensive.
The indices which the match occurred at can also be sometimes useful. Therefore, the new option -indexvar will also be provided which will name a variable into which a list of match indices (each a two item list of values in the same way that [regexp -indices] computes) will be placed. It will be legal for both -matchvar and -indexvar to be specified in the same [switch] command, but only if the matching mode is -regexp. (The other kinds of match modes always match against the whole string anyway.)
Both variables (if specified, of course) will contain the empty list if the default branch is taken.
set string "some long complicated message"
switch -matchvar foo -indexvar bar {
{\w*(e)\w*} {
puts "matched [lindex $foo 0] with 'e' at [lindex $bar 1 0]"
}
default {
puts "no words containing a letter 'e' at all"
}
}
Actually, no new syntax is needed to achieve the mentioned ability. The solution could adopt the behavior of [regsub] (description taken from regsub(n)):
If subSpec contains a `&' or `\0', then it is replaced in the substitution with the portion of string that matched exp. If subSpec contains a `\n', where n is a digit between 1 and 9, then it is replaced in the substitution with the portion of string that matched the n-th parenthesized subexpression of exp. Additional backslashes may be used in subSpec to prevent special interpretation of `&' or `\0' or `\n' or backslash.
This has the disadvantage of being incompatible with existing code that makes use of the -regexp option to [switch] and which may well have characters matching the above sequences inside already.
Not yet...
This document has been placed in the public domain.
This is not necessarily the current version of this TIP.