TIP #113 Version 1.7: Multi-Line Searches in the Text Widget

This is not necessarily the current version of this TIP.


TIP:113
Title:Multi-Line Searches in the Text Widget
Version:$Revision: 1.7 $
Author:Vince Darley <vincentdarley at users dot sourceforge dot net>
State:Draft
Type:Project
Tcl-Version:8.5
Vote:Pending
Created:Friday, 11 October 2002

Abstract

This TIP proposes enhancing the implementation of the $textwidget search subcommand to allow matching of both exact strings and regexp patterns which span multiple lines, and to allow reporting on all matches.

Proposal

If the string/pattern given to the search subcommand contains sub-strings/patterns which match newlines, then it will be possible for that command to return a match which spans multiple lines. Where a match could occur both within a single line and across multiple lines, the first such match will be found, and the length of the match will follow the usual regexp rules, as documented in the regexp man page.

This can be implemented very efficiently, given the TCL_REG_CANMATCH flag supported by the regexp library, with no impact at all on the speed of matching single lines.

In addition, two new options to the search subcommand are available:

If the new -all option is given to the search subcommand, then all matches within the given range will be reported. This means the return result of the command will be a list of indices, and, if a -count var option was given, var will be set to a list of match-index-lengths.

If the new -nolinestop option is given then regexp searches will allow . and [^ sequences to match newline characters (which is normally not the case). This is equivalent to not providing the -linestop flag to Tcl's regexp command.

The text widget man page will be updated to reflect the new -all and -nolinestop options, and to remove the "single line" caveat.

Reference implementation

This is available from:

http://sourceforge.net/tracker/?func=detail&aid=621901&group_id=12997&atid=312997

The patch includes objectification of the entire Text widget, so the multi-line search changes are not obvious to isolate. In fact the changes required are < 100 lines of code (given that the rest has been objectified, that is).

(However, this patch has to workaround a bug in Tcl's unicode string manipulation, so search performance is probably impacted)

Issues

On the implementation side, it might be interesting to abstract the search interface away from the text widget, so that it could in principle be applied to any line-based textual source.

As in the single-line matching implementation in Tcl 8.x, the lack of support for backwards matching in Tcl's regexp library means that backwards matching can only be implemented as repeated forward matches, with a commensurate performance penalty (the solution to which is outside the scope of this tip).

Copyright

This document has been placed in the public domain.


Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows