TIP #80 Version 1.2: Additional Options for 'lsearch'

This is not necessarily the current version of this TIP.


TIP:80
Title:Additional Options for 'lsearch'
Version:$Revision: 1.2 $
Author:Tom Wilkason <tom dot wilkason at home dot com>
State:Draft
Type:Project
Tcl-Version:8.4
Vote:Pending
Created:Wednesday, 02 January 2002
Discussions To:news:comp.lang.tcl

Abstract

This TIP proposes additional options for the lsearch command to return and work with all matching items in the return rather than the first matching item.

Rationale

The lsearch function works well for finding the first item in a list that matches a pattern. However it is often useful to find all of the items in the list that match a pattern. This TIP proposes adding options to return the entire list of matches. With this capability, additional options are proposed to return the data rather than the indices (since you often want to work the the data anyway), and to add an option to return the logical exclusion of the matching items (i.e. those that don't match the search pattern).

Specification

I propose the following options be added to lsearch:

Option: -all

Returns a list of all indices that match the search condition (rather than the first one). The indices are returned low to high order. For a no match condition, a {} (empty result) is returned. If the the -all or -data switches are not specified, a -1 is returned for a no match condition just as it is done now.

Option: -data

Returns a list of the data that matches the search condition rather than the index (or indices with the -all option). A {} is returned for a no match condition. The data is returned in proper list order. This option is useful when you want to iterate over the returned data anyway. e.g.

    foreach item [lsearch -all -data -glob $someList *stuff] {
       # deal with item
    }

Option: -not

Negates the sense of the search condition (i.e. what doesn't match). When used with the -data or -all options, the return set will be the items that do match. If all items match then a {} is returned. Without the -all option, the first item in the list that does not match will be returned.

These can be combined as needed and yield some powerful capabilities when iterating over sub-lists (esp. with the new lset command).

Reference implementation

Changes to the Tcl_LsearchObjCmd command in generic/tclCmdIL.c are needed along with documentation and test code. The changes to the 8.4 head version of tclCmdIL.c are available here.

(URL missing)

Notes

The changes to lsearch are entirely backward compatible and do no change the behaviour or performance of the command for existing options. Moreover, these changes should not impact any of the other list changes in TIP #22, TIP #33 or TIP #45.

Copyright

This document has been placed in the public domain.

Appendix

##
# performs a lsearch -all -data -glob search
#
proc lsearch_dataGLOB {listData pattern} {
    set result [list]
    foreach item $listData {
        if {[string match $pattern $item]} {
            lappend result $item
        }
    }
    return $result
}
##
# performs a lsearch -all -data -regexp search
#
proc lsearch_dataRE {listData pattern} {
    set result [list]
    foreach item $listData {
        if {[regexp $pattern $item]} {
            lappend result $item
        }
    }
    return $result
}
##
# performs a lsearch -all -glob search
#
proc lsearch_allGLOB {listData pattern} {
    set result [list]
    set count 0
    foreach item $listData {
        if {[string match $pattern $item]} {
            lappend result $count
        }
        incr count
    }
    return $result
}

# Build a 2K list of data
catch {unset LIST}
time {lappend LIST someStuff} 1000
time {lappend LIST otherStuff} 1000

# Case with all data matching in a 2K list 5x speedup
puts "#C implementation [time {listx search -glob -all -data $LIST *Stuff} 100]"
#=> C implementation 6810 microseconds per iteration
puts "#tcl implementation [time {lsearch_dataGLOB $LIST *Stuff} 100]"
#=> tcl implementation 33050 microseconds per iteration

# Case with all data matching but returning indicies 6.5X speed up
puts "#C implementation [time {listx search -glob -all $LIST *Stuff} 100]"
#=> C implementation 5800 microseconds per iteration
puts "#tcl implementation [time {lsearch_allGLOB $LIST *Stuff} 100]"
#=> tcl implementation 39060 microseconds per iteration

# Case with no matching data 12X speed up
puts "#C implementation [time {listx search -glob -all -data $LIST none*} 100]"
#=> C implementation 1010 microseconds per iteration
puts "#tcl implementation [time {lsearch_dataGLOB $LIST none*} 100]"
#=> tcl implementation 13210 microseconds per iteration


# Repeat with RE, note more time spent in RE engine 2X speedup
puts "#C implementation [time {listx search -regexp -all -data $LIST Stuff} 100]"
#=> C implementation 39050 microseconds per iteration
puts "#tcl implementation [time {lsearch_dataRE $LIST Stuff} 100]"
#=> tcl implementation 76200 microseconds per iteration

# Case with no matching data 2X speedup
puts "#C implementation [time {listx search -regexp -all -data $LIST none*} 100]"
#=> C implementation 17520 microseconds per iteration
puts "#tcl implementation [time {lsearch_dataRE $LIST none} 100]"
#=> tcl implementation 33250 microseconds per iteration

Powered by TclThis is not necessarily the current version of this TIP.

TIP AutoGenerator - written by Donal K. Fellows