This is not necessarily the current version of this TIP.
| TIP: | 217 |
| Title: | Getting Sorted Indices out of Lsort |
| Version: | $Revision: 1.8 $ |
| Author: | James P. Salsman <james at bovik dot org> |
| State: | Draft |
| Type: | Project |
| Tcl-Version: | 8.4 |
| Vote: | Pending |
| Created: | Thursday, 26 August 2004 |
| Keywords: | Tcl, lsort, parallel lists |
An -indices option is proposed for the lsort command, returning the indices of the given list's elements in the order that they would have otherwise been sorted.
When corresponding parallel lists must be simultaneously sorted or accessed in the order given by sorting them all according to one used as a list of keys, it is necessary to obtain the indices of the key list's elements in the order that they would be sorted, without actually sorting the list. For example, a list of first names and a corresponding list of last names can be displayed in side-by-side Tk listboxes, in which case we may want to sort both lists by either one used as the sorting key, or we may want to simultaneously iterate over both in either order. To do so, merely sorting a list is unhelpful; we need to obtain the indices of the key list in the order that its corresponding elements would be sorted.
Tk listboxes, database I/O, and statistics applications often involve heavy use of parallel lists. For this and other reasons, many programming languages starting at least as early as APL, up to present-day, numerics-oriented languages such as MATLAB, have included the ability to directly obtain the indices required to access a list (or "vector") in sorted order. As shown below, the pure Tcl solution to this problem can take more than 10 times as long as the given reference implementation, which adds virtually no overhead when it is not invoked.
The lsort command shall accept a new option, -indices. When lsort is invoked with this option, it shall return a list of integer indices of the elements of the list given as the final argument to lsort, in the order that the elements would have been sorted had the -indices option not been specified.
This means an alternative (though less efficient for single lists) mechanism for producing a sorted list could be:
set resultList {}
foreach idx [lsort -indices $sourceList] {
lappend resultList [lindex $sourceList $idx]
}
Available on SourceForge [1] and at: http://www.bovik.org/lsort-indices-diff.txt
That reference implementation is a 109-line context diff, involving adding 20 lines of code to tclCmdIL.c, only one additional int of data memory overhead, and just one additional integer comparison at run time if the new option is not invoked.
Compared to the following pure Tcl implementation, the reference implementation takes 15% of the execution time for a list of 50,000 random real numbers, and just 9% of the execution time for a list of 5,000 random reals.
This pure Tcl implementation was adapted by Richard Suchenwirth from an earlier version by the author:
proc lsort-indices list {
if [llength $list] {
set i -1
foreach e $list {lappend tmp [list [incr i] $e]}
foreach e [lsort -index 1 -real $tmp] {lappend res [lindex $e 0]}
set res
}
}
In the lsort man page, under DESCRIPTION, change the first sentence:
"This command sorts the elements of list, returning a new list in sorted order."
... to read:
"This command sorts the elements of list, and returns a new list in sorted order, unless the -indices option is specified, in which case a list of integers is returned, corresponding to the indices of the given list's elements in the order that they otherwise would have been sorted."
Under EXAMPLES, at the end of the section, include the following lines:
Obtaining ordered indices:
% lsort -indices [list a c b]
0 2 1
% lsort -indices -unique -decreasing -real -index 0 \
{{1.2 a} {34.5 b} {34.5 c} {5.6 d}}
2 3 0
This document has been placed in the public domain by the author.
This is not necessarily the current version of this TIP.