1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013
|
=head1 NAME
Data::Sorting - Multi-key sort using function results
=head1 SYNOPSIS
use Data::Sorting qw( :basics :arrays :extras );
# Sorting functions default to simple string comparisons
@names = qw( Bob Alice Ellen Charlie David );
@ordered = sorted_by( undef, @names );
# Various options can be passed before the list values
@ordered = sorted_by( [ -order=>'reverse' ], @names );
# You can also generate a sorting function and then apply it
$function = sort_function();
@ordered = $function->( @names ); # or &{$function}(@names)
@ordered = sort_function( -order=>'reverse' )->( @names );
# The :array functions are prototyped to take the array first
@ordered = sorted_array( @names );
@ordered = sorted_arrayref( \@names );
# You can also sort an array in place, changing its internal order
sort_array( @names );
sort_arrayref( \@names );
# There are several sorting options, such as -compare => 'natural'
@movies = ( 'The Matrix', 'Plan 9', '2001', 'Terminator 2' );
@ordered = sort_function( -compare => 'natural' )->( @movies );
# @ ordered now contains '2001', 'The Matrix', 'Plan 9', 'Terminator 2'
# To sort numbers, pass the -compare => 'numeric' option
@numbers = ( 18, 5, 23, 42, 156, 91, 64 );
@ordered = sorted_by( [ -compare => 'numeric' ], @numbers );
@ordered = sort_function( -compare => 'numeric' )->( @numbers );
@ordered = sorted_array( @numbers, -compare => 'numeric' );
sort_array( @numbers, -compare => 'numeric' );
# You can sort by the results of a function to be called on each item
sort_array( @numbers, -compare => 'numeric', sub { $_[0] % 16 } );
# @numbers now contains 64, 18, 5, 23, 42, 91, 156
# For arrays of datastructures, pass in keys to extract for sorting
@records = (
{ 'rec_id'=>3, 'name'=>{'first'=>'Bob', 'last'=>'Macy'} },
{ 'rec_id'=>1, 'name'=>{'first'=>'Sue', 'last'=>'Jones'} },
{ 'rec_id'=>2, 'name'=>{'first'=>'Al', 'last'=>'Jones' } },
);
@ordered = sorted_array( @records, 'rec_id' );
# For nested data structures, pass an array of keys to fetch
@ordered = sorted_array( @records, ['name','first'] );
# Pass multiple sort keys for multiple-level sorts
@ordered = sorted_array( @records, ['name','last'], ['name','first'] );
# Any selected sort options are applied to all subsequent sort keys
@ordered = sorted_array( @records,
-order => 'reverse', ['name','last'], ['name','first'] );
# Options specified within a hash-ref apply only to that key
@ordered = sorted_array( @records,
{ order=>'reverse', sortkey=>['name','last'] },
['name','first'] );
# Locale support is available if you have Perl 5.004 or later and POSIX
POSIX::setlocale( POSIX::LC_COLLATE(), 'en_US' );
POSIX::setlocale( POSIX::LC_CTYPE(), 'en_US' );
@ordered = sorted_array( @records,
-compare=>'locale', ['name','last'], ['name','first'] );
=head1 ABSTRACT
Data::Sorting provides functions to sort the contents of arrays based on a collection of extraction and comparison rules. Extraction rules are used to identify the attributes of array elements on which the ordering is based; comparison rules specify how those values should be ordered.
Index strings may be used to retrieve values from array elements, or function references may be passed in to call on each element. Comparison rules are provided for numeric, bytewise, and case-insensitive orders, as well as a 'natural' comparison that places numbers first, in numeric order, followed by the remaining items in case-insensitive textual order.
=head1 DESCRIPTION
This module provides several public functions with different calling interfaces that all use the same underlying sorting mechanisms.
These functions may be imported individually or in groups using the following tags:
=over 9
=item :basics
sorted_by(), sort_function(): General-purpose sorting functions.
=item :array
sorted_array(), sorted_arrayref(), sort_array(), sort_arrayref(): Prototyped functions for arrays.
=item :extras
sort_key_values(), sort_description(): Two accessory functions that explain how sorting is being carried out.
=back
All of these functions take a list of sorting rules as arguments. See L<"Sort Rule Syntax"> for a discussion of the contents of the $sort_rule or @sort_rules parameters shown below.
=cut
########################################################################
package Data::Sorting;
require 5.003;
use strict;
use Carp;
use Exporter;
use vars qw( $VERSION @ISA %EXPORT_TAGS );
$VERSION = 0.9;
push @ISA, qw( Exporter );
%EXPORT_TAGS = (
basics => [qw( sorted_by sort_function )],
arrays => [qw( sorted_array sorted_arrayref sort_array sort_arrayref)],
extras => [qw( sort_key_values sort_description )],
);
Exporter::export_ok_tags( keys %EXPORT_TAGS );
use vars qw( @Array @Rules $PreCalculate $Rule @ValueSet );
########################################################################
=head2 sorted_by
@ordered = sorted_by( $sort_rule, @value_array );
@ordered = sorted_by( $sort_rule, @$value_arrayref );
@ordered = sorted_by( $sort_rule, $value1, $value2, $value3 );
@ordered = sorted_by( \@sort_rules, @value_array );
@ordered = sorted_by( \@sort_rules, @$value_arrayref );
@ordered = sorted_by( \@sort_rules, $value1, $value2, $value3 );
This is a general-purpose sorting function which accepts one or more sort order rules and a list of input values, then returns the values in the order specified by the rules.
=cut
# @in_order = sorted_by( $sort_rules_ary, @values );
sub sorted_by ($;@) {
my @sort_params = ( ! defined $_[0] ) ? () :
( ref($_[0]) eq 'ARRAY' ) ? @{ (shift) } :
shift;
( my $sorter, local @Rules ) = _parse_sort_args( @sort_params );
local *Array = \@_;
&$sorter;
}
########################################################################
=head2 sort_function
@ordered = sort_function( @sort_rules )->( @value_array );
@ordered = sort_function( @sort_rules )->( @$value_arrayref );
@ordered = sort_function( @sort_rules )->( $value1, $value2, $value3 );
Creates an anonymous function which applies the provided sort rules. The function may be cached and used multiple times to apply the same rules again.
=cut
# @in_order = sort_function( @sort_rules )->( @array );
sub sort_function (@) {
my ( $sorter, @rules ) = _parse_sort_args( @_ );
return sub {
local *Array = \@_;
local @Rules = @rules;
my @results = &$sorter;
# Kludge to clear extracted data; there's gotta be a better way...
foreach my $rule (@rules) {
map { delete $rule->{$_} } grep /^ext_/, keys %$rule
}
@results;
}
}
########################################################################
=head2 sorted_array
@ordered = sorted_array( @value_array, @sort_rules );
@ordered = sorted_array( @$value_arrayref, @sort_rules );
Returns a sorted list of the items without altering the order of the original list.
=cut
# @in_order = sorted_array( @array, @sort_rules );
sub sorted_array (\@;@) {
local *Array = shift;
( my $sorter, local @Rules ) = _parse_sort_args( @_ );
&$sorter;
}
=head2 sorted_arrayref
@ordered = sorted_arrayref( \@value_array, @sort_rules );
@ordered = sorted_arrayref( $value_arrayref, @sort_rules );
Returns a sorted list of the items without altering the order of the original list.
=cut
# @in_order = sorted_arrayref( $array_ref, @sort_rules );
sub sorted_arrayref ($;@) {
local *Array = shift;
( my $sorter, local @Rules ) = _parse_sort_args( @_ );
&$sorter;
}
########################################################################
=head2 sort_array
sort_array( @value_array, @sort_rules );
sort_array( @$value_arrayref, @sort_rules );
Sorts the contents of the specified array using a list of sorting rules.
=cut
# sort_array( @array, @sort_rules );
sub sort_array (\@;@) {
local *Array = shift;
( my $sorter, local @Rules ) = _parse_sort_args( @_ );
@Array = &$sorter;
}
=head2 sort_arrayref
sort_arrayref( \@value_array, @sort_rules );
sort_arrayref( $value_arrayref, @sort_rules );
Equivalent to sort_array, but takes an explicit array reference as its first argument, rather than an array variable.
=cut
# sort_arrayref( $array_ref, @sort_rules );
sub sort_arrayref ($;@) {
local *Array = shift;
( my $sorter, local @Rules ) = _parse_sort_args( @_ );
@Array = &$sorter;
}
########################################################################
=head2 sort_key_values
@key_values = sort_key_values( \@value_array, @sort_rules );
@key_values = sort_key_values( $value_arrayref, @sort_rules );
Doesn't actually perform any sorting. Extracts and returns the values which would be used as sort keys from each item in the array, in their original order.
=cut
# @results = sort_key_values( $array, @sort_rules );
sub sort_key_values ($;@) {
local *Array = shift;
my ($sorter, @rules) = _parse_sort_args( @_ );
if ( scalar @rules == 1 ) {
_extract_values_for_rule( $rules[0], @Array );
} else {
map [ _extract_values_for_item( $_, @rules ) ], @Array;
}
}
########################################################################
=head2 sort_description
@description = sort_description( $descriptor, @sort_rules );
Doesn't actually perform any sorting. Provides descriptive information about the sort rules for diagnostic purposes.
=cut
# @sort_rules = sort_description( 'text', @sort_rules );
sub sort_description ($;@) {
my $descriptor = shift;
my $desc_func;
if ( ! $descriptor ) {
$desc_func = \&_desc_text;
} elsif ( ref($descriptor) eq 'CODE' ) {
$desc_func = $descriptor;
} elsif ( ! ref($descriptor) ) {
no strict 'refs';
$desc_func = \&{"_desc_$descriptor"}
or croak("Can't find a function named '_desc_$descriptor'");
} else {
croak("Unsupported descriptor '$descriptor'")
}
my ($sorter, @rules) = _parse_sort_args( @_ );
map { &$desc_func( $_ ) } @rules;
}
sub _desc_text {
my $rule = shift;
my $comp = $rule->{compare};
$rule->{extract} .
join( '', map $_ ? "($_) " : " ", join(', ', map "'$_'", @{ $rule->{extract_args} }) ) .
"in " . ( $rule->{order_sign} < 0 ? "descending" : "ascending" ) . " " .
( ! ref($comp) ? "$comp" :
ref($comp) eq 'CODE' ? "with custom function ($comp)":
ref($comp) eq 'ARRAY' ? join(', ', @$comp) : "with $comp" ) .
" order"
}
########################################################################
=head2 Sort Rule Syntax
The sort rule argument list may contain several different types of parameters, which are parsed identically by all of the public functions described above.
A sort rule definition list may contain any combination of the following argument structures:
=over 4
=item I<nothing>
If no sort keys are specified, a default sort key is created using the C<extract =E<gt> "self"> option.
@ordered = sorted_array( @names );
=item I<sortkey>
Specifies a sort key. Each I<sortkey> may be either a scalar value, or an array reference. Appropriate values for a I<sortkey> vary depending on which "extract" option is being used, and are discussed further below.
@ordered = sorted_array( @numbers, sub { $_[0] % 8 } );
@ordered = sorted_array( @records, 'rec_id' );
@ordered = sorted_array( @records, ['name','first'] );
Any number of sortkeys may be provided:
@ordered = sorted_array( @records, ['name','last'],
['name','first'] );
=item -sortkey => I<sortkey>
Another way of specifying a sort key is by preceding it with the "-sortkey" flag.
@ordered = sorted_array( @numbers, -sortkey => sub { $_[0] % 8 } );
@ordered = sorted_array( @records, -sortkey => ['name','last'],
-sortkey => ['name','first'] );
=item { sortkey => I<sortkey>, I<option> => I<option_value>, ... }
Additional options can be specified by passing a reference to a hash containing a sortkey and values for any number of options described in the list below.
@ordered = sorted_array( @numbers, { sortkey => sub { abs(shift) },
compare => 'numeric', } );
=item -I<option> => I<option_value>
Sets a default option for any subsequent sortkeys in the argument list.
@ordered = sorted_array( @records, -compare => 'numeric',
-sortkey => sub { abs(shift) });
@ordered = sorted_array( @records, -compare => 'textual',
-sortkey => ['name','last'],
-sortkey => ['name','first'] );
=back
The possible I<option> values are:
=over 4
=item extract
Determines the function which will be used to retrieve the sort key value from each item in the input list.
=item compare
Determines the function which will be used to order the extracted values.
=item order
Can be set to "reverse" or "descending" to invert the sort order. Defaults to "ascending".
=item engine
Determines the underlying sorting algorithm which will be used to implement the sort. Generally left blank, enabling the module to select the best one available.
=back
Each of these options is discussed at further length below.
=cut
my @DefaultState = ( order=>'ascending', compare=>'cmp', extract=>'any' );
my %SupportedOptions = ( map { $_=>1 } qw( engine order compare extract ) );
my %FunctionCache;
sub _parse_sort_args {
my @arguments = ( @_ );
my %state;
my @rules;
while ( scalar @arguments ) {
my $token = shift @arguments;
my ( $flagname ) = ( $token =~ /^\-(\w+)$/ );
if ( $flagname and $SupportedOptions{$flagname} ) {
$state{ $flagname } = shift @arguments;
} elsif ( defined($flagname) and $flagname eq 'sortkey' ) {
push @rules, { @DefaultState, %state, 'sortkey' => shift @arguments };
} elsif ( ref($token) eq 'HASH' ) {
push @rules, { @DefaultState, %state, %$token };
} else {
push @rules, { @DefaultState, %state, 'sortkey' => $token };
}
}
if ( ! scalar @rules ) {
push @rules, { @DefaultState, 'extract' => 'self', %state, sortkey => [] };
}
no strict 'refs';
foreach my $rule ( @rules ) {
# Select the appropriate comparison function
my $compare = $rule->{compare};
croak("Missing compare option for sorting") unless ( $compare );
$rule->{compare_func} = ref($compare) eq 'CODE' ? $compare :
$FunctionCache{"_cmp_$compare"} ||= \&{"_cmp_$compare"}
|| croak("Can't find a function named '_cmp_$compare'");
# Optional parameter for "reverse" or "descending" sorts
$rule->{order_sign} = ( $rule->{order} =~ /^desc|^rev/i ) ? -1 : 1;
# Select the appropriate value extraction function
my $extract = $rule->{extract};
croak("Missing extract option for sorting") unless ( length $extract );
$extract = 'code' if ($extract eq 'any' && ref($rule->{sortkey}) eq 'CODE');
$rule->{extract_func} = ref($extract) eq 'CODE' ? $extract :
$FunctionCache{"_ext_$extract"} ||= \&{"_ext_$extract"} ||
croak("Can't find a function named '_ext_$extract'");
# Optional array of arguments to the extraction function
my $skey = $rule->{sortkey};
$rule->{extract_args} = ( ! defined $skey ) ? [] :
(ref($skey) eq 'ARRAY') ? $skey :
[ $skey ];
if ( $extract eq 'compound' ) {
foreach ( 0 .. $#{ $rule->{extract_args} } / 2 ) {
my $xa = $rule->{extract_args}->[ $_ * 2 ];
if ( ! ref $xa ) {
$rule->{extract_args}->[$_ * 2] = $FunctionCache{"_ext_$xa"} ||=
\&{"_ext_$xa"} || croak("Can't find a function named '_ext_$xa'");
}
}
}
}
# If $PreCalculate is set, do our lookups ahead of time for all of the items
my $engine = defined($PreCalculate) ? 'precalc' :
$rules[0]->{engine} ? $rules[0]->{engine} :
( @rules == 1 and $rules[0]->{order_sign} > 0
and $rules[0]->{compare} eq 'cmp'
and $rules[0]->{extract} eq 'self' ) ? 'trivial' :
(! grep {$_->{compare} ne 'cmp' or $_->{order_sign} < 0} @rules) ? 'packed' :
( scalar @rules == 1 ) ? 'precalc' :
'orcish' ;
# warn "Sorting using '$engine' engine\n";
my $sorter = ref($engine) eq 'CODE' ? $engine :
$FunctionCache{"_sorted_$engine"} ||= \&{"_sorted_$engine"} ||
croak("No such sort mode '$engine'; can't find function '_sorted_$engine'");
return $sorter, @rules;
}
########################################################################
=head2 Extraction Functions
For the extract option, you may specify one of the following I<option_value>s:
=over 4
=item any
The default. Based on the I<sortkey> may behave as the 'self', 'key', or 'method' options described below.
=item self
Uses the input value as the sort key, unaltered. Typically used when sorting strings or other scalar values.
=item key
Allows for indexing in to hash or array references, allowing you to sort a list of arrayrefs based on the I<n>th value in each, or to sort a list of hashrefs based on a given key.
If the sortkey is an array reference, then the keys are looked up sequentially, allowing you to sort on the contents of a nested hash or array structure.
=item method
Uses the sortkey as a method name to be called on each list value, enabling you to sort objects by some calculated value.
If the sortkey is an array reference, then the first value is used as the method name and the remaining values as arguments to that method.
=item I<CODEREF>
You may pass in a reference to a custom extraction function that will be used to retrieve the sort key values for this rule. The function will be called separately for each value in the input list, receiving that current value as an argument.
If the sortkey is an array reference, then the first value is used as the function reference and the remaining values as arguments to be passed after the item value.
=back
extract => self | method | key | code | CODEREF | ...
sortkey => - | m.name | key/idx | CODEREF | args
=cut
# $value = _extract_value( $item, $rule );
sub _extract_value {
my ( $item, $rule ) = @_;
my $value = &{ $rule->{extract_func} }( $item, @{ $rule->{extract_args} } );
return defined($value) ? $value : '';
}
# $value = _extract_values_for_item( $item, @rules );
sub _extract_values_for_item {
my $item = shift;
map { defined($_) ? $_ : '' }
map { &{ $_->{extract_func} }( $item, @{ $_->{extract_args} } ) } @_;
}
# $value = _extract_values_for_rule( $rule, @item );
sub _extract_values_for_rule {
my $rule = shift;
return @_ if ( $rule->{extract} eq 'self' );
map { defined($_) ? $_ : '' }
map { &{ $rule->{extract_func} }( $_, @{ $rule->{extract_args} } ) } @_;
}
sub _ext_self {
my ( $item, @sortkey ) = @_;
return $item;
}
sub _ext_split {
my ( $item, $delim, @indexes ) = @_;
# warn "Split '$item' with '$delim'\n";
my @values = split /$delim/, $item;
join $delim, @values[ @indexes ];
}
sub _ext_substr {
my ( $item, @sortkey ) = @_;
$#sortkey ? substr($item, $sortkey[0], $sortkey[1] ) : substr($item, $sortkey[0] );
}
sub _ext_self_code {
my ( $item, @sortkey ) = @_;
&$item( @sortkey );
}
sub _ext_code {
my ( $item, $code, @sortkey ) = @_;
&$code( $item, @sortkey );
}
sub _ext_method {
my ( $item, $method, @sortkey ) = @_;
$item->$method( @sortkey );
}
sub _ext_index {
my ( $item, @sortkey ) = @_;
while ( scalar @sortkey ) {
my $index = shift @sortkey;
if ( ! ref $item ) {
return;
} elsif ( UNIVERSAL::isa($item, 'HASH') ) {
$item = $item->{$index};
} elsif ( UNIVERSAL::isa($item, 'ARRAY') ) {
carp "Use of non-numeric key '$index'"
unless ( $index eq '0' or $index != 0 );
$item = $item->[$index];
} else {
carp "Can't _ext_index from '$item' ($index)";
}
}
return $item;
}
sub _ext_any {
my ( $item, @sortkey ) = @_;
if ( ref($item) eq 'CODE' ) {
# &_ext_self_code;
&$item( @sortkey );
} elsif ( ! scalar @sortkey ) {
return $item;
} elsif ( ref($sortkey[0]) eq 'CODE' ) {
&_ext_code;
} elsif ( UNIVERSAL::can( $item, $sortkey[0] ) ) {
&_ext_method;
} elsif ( ! ref $sortkey[0] ) {
&_ext_index;
} else {
confess "Unsure how to extract value for sorting purposes";
}
}
sub _ext_compound {
my $item = shift;
while ( scalar @_ ) {
my ($extr_sub, $sortkey) = ( shift, shift );
$item = &$extr_sub( $item, $sortkey ? @$sortkey : () );
}
return $item;
}
########################################################################
=head2 Comparison Functions
For the compare option, you may specify one of the following I<option_value>s:
=over 4
=item cmp
The default comparison, using Perl's default cmp operator.
=item numeric
A numeric comparison using Perl's <=> operator.
=item textual
A text-oriented comparison that ignores whitespace and capitalization.
=item natural
A multi-type comparison that places empty values first, then numeric values in numeric order, then non-textual values like punctuation, followed by textual values in text order. The natural ordering also includes moving subsidiary words to the end, eg "The Book of Verse" is sorted as "Book of Verse, The"
=item locale : $three_way_cmp
Comparator functions which use the POSIX strcoll function for ordering.
=item lc_locale : $three_way_cmp
A case-insensitive version of the POSIX strcoll ordering.
=item num_lc_locale
Like the 'natural' style, this comparison distinguishes between empty and numeric values, but uses the lc_locale function to sort the textual values.
=item I<CODEREF>
You may pass in a reference to a custom comparison function that will be used to order the sort key values for this rule.
=back
Each of these functions may return a postive, zero, or negative value based on the relationship of the values in the $a and $b positions of the current @ValueSet array. An undefined return indicates that the comparator is unable to provide an ordering for this pair, in which case the choice will fall through to the next comparator in the list; if no comparator specifies an order, they are left in their original order.
=cut
# $three_way_cmp = _cmp_cmp;
sub _cmp_cmp {
$ValueSet[$a] cmp $ValueSet[$b]
}
# $three_way_cmp = _cmp_bytewise;
sub _cmp_bytewise {
$ValueSet[$a] cmp $ValueSet[$b]
}
# $three_way_cmp = _cmp_numeric;
sub _cmp_numeric {
$ValueSet[$a] <=> $ValueSet[$b]
}
# $three_way_cmp = _cmp_empty_first;
sub _cmp_empty_first {
# If neither is empty, we have no opinion.
# If only one is empty, place it first
# If they're both empty, they're equivalent
( ! length($ValueSet[$a]) )
? ( ( ! length($ValueSet[$b]) ) ? 0 : -1 )
: ( ( ! length($ValueSet[$b]) ) ? 1 : undef );
}
# $three_way_cmp = _cmp_numbers_first;
sub _cmp_numbers_first {
# Use an extra array to cache our converted value
$Rule->{'ext_numeric'} ||= [];
my $is_numeric = $Rule->{'ext_numeric'};
# If we haven't already, check to see if the values are purely numeric
defined $is_numeric->[$a] or
$is_numeric->[$a] = ( $ValueSet[$a] =~ /\A\-?(?:\d*\.)?\d+\Z/ );
defined $is_numeric->[$b] or
$is_numeric->[$b] = ( $ValueSet[$b] =~ /\A\-?(?:\d*\.)?\d+\Z/ );
# If they're both numeric, use numeric comparison,
# If one's numeric and the other isn't, put the number first
# If neither is numeric, we have no opinion
( $is_numeric->[$a] )
? ( ( $is_numeric->[$b] ) ? ( $ValueSet[$a] <=> $ValueSet[$b] ) : -1 )
: ( ( $is_numeric->[$b] ) ? 1 : undef );
}
# $three_way_cmp = _cmp_textual;
sub _cmp_textual {
# Use an extra array to cache our converted value
$Rule->{'ext_textual'} ||= [];
my $mangled = $Rule->{'ext_textual'};
# If we haven't already, generate a lower-case, alphanumeric-only value
foreach my $idx ( $a, $b ) {
next if defined $mangled->[$idx];
local $_ = lc( $ValueSet[$idx] );
tr/0-9a-z/ /cs;
s/\A\s+//;
s/\s+\Z//;
$mangled->[$idx] = $_
}
# If both items have an alphanumeric value, compare them on that basis
# If one is alphanumeric and the other is punctuation/empty, put alpha last.
( length($mangled->[$a]) )
? ( length($mangled->[$b]) ? ( $mangled->[$a] cmp $mangled->[$b] ) : -1 )
: ( length($mangled->[$b]) ? 1 : undef );
}
# $three_way_cmp = _cmp_locale
sub _cmp_locale {
require POSIX;
POSIX::strcoll( $ValueSet[$a], $ValueSet[$b] );
}
# $three_way_cmp = _cmp_lc_locale
sub _cmp_lc_locale {
require POSIX;
POSIX::strcoll( lc($ValueSet[$a]), lc($ValueSet[$b]) );
}
sub _cmp_make_compound {
my @comparators = @_;
sub {
foreach my $comparator ( @comparators ) {
# Call each comparison function in an attempt to establish an ordering
my $rc = &$comparator;
# If the comparator returns undef, it has no opinion; call the next one
return($rc) if defined($rc);
}
}
}
{
no strict 'refs';
*{'_cmp_num_lc_locale'} = _cmp_make_compound( \&_cmp_empty_first, \&_cmp_numbers_first, \&_cmp_lc_locale );
}
# $three_way_cmp = _cmp_natural;
sub _cmp_natural {
# If neither is empty, we have no opinion.
# If only one is empty, place it first
# If they're both empty, they're equivalent
( ! length($ValueSet[$a]) )
? ( ( ! length($ValueSet[$b]) ) ? return 0 : return -1 )
: ( ( ! length($ValueSet[$b]) ) ? return 1 : undef );
# Use an extra array to cache our converted value
$Rule->{'ext_numeric'} ||= [];
my $is_numeric = $Rule->{'ext_numeric'};
# If we haven't already, check to see if the values are purely numeric
defined $is_numeric->[$a] or
$is_numeric->[$a] = ( $ValueSet[$a] =~ /\A\-?(?:\d*\.)?\d+\Z/ );
defined $is_numeric->[$b] or
$is_numeric->[$b] = ( $ValueSet[$b] =~ /\A(?:\d*\.)?\d+\Z/ );
# If they're both numeric, use numeric comparison,
# If one's numeric and the other isn't, put the number first
# If neither is numeric, we have no opinion
( $is_numeric->[$a] )
? return( ( $is_numeric->[$b] ) ? ( $ValueSet[$a] <=> $ValueSet[$b] ) : -1 )
: ( ( $is_numeric->[$b] ) ? return 1 : undef );
# Use an extra array to cache our converted value
$Rule->{'ext_textual'} ||= [];
my $mangled = $Rule->{'ext_textual'};
# If we haven't already, generate a lower-case, alphanumeric-only value
foreach my $idx ( $a, $b ) {
next if defined $mangled->[$idx];
local $_ = lc( $ValueSet[$idx] );
tr/0-9a-z/ /cs;
s/\A\s+//;
s/\s+\Z//;
s/\A(the)\s(.*)/$2 $1/;
$mangled->[$idx] = $_
}
# If both items have an alphanumeric value, compare them on that basis
# If one is alphanumeric and the other is punctuation/empty, put alpha last.
( length($mangled->[$a]) )
? ( length($mangled->[$b]) ? ( $mangled->[$a] cmp $mangled->[$b] ) : -1 )
: ( length($mangled->[$b]) ? 1 : undef );
}
########################################################################
=head2 Ascending or Descending Order
For the order option, you may specify one of the following I<option_value>s:
=over 4
=item forward I<or> ascending
The default order, from lower values to higher ones.
=item reverse I<or> descending
Reverses the ordering dictated by a sort rule.
=back
=head2 Sorting Engines
Depending on the specific sorting rules used in a given call, this module automatically selects an internal function that provides an appropriate approach to implementing the sort, called the sort "engine".
You can override this selection by setting an "engine" option on the first sort key, which can either contain either the name of one of the engines, described below, or a CODEREF with equivalent behavior.
=over 4
=item trivial
In the common case of sorting raw values with a cmp comparison, the fast-but-simple "trivial" engine is used, which simply applies Perl's default sorting.
=item orcish
For a complex multi-key sort the "orcish" engine is typically selected.
=item precalc
Used when there's only one sorting key.
You may also set the $PreCalculate package variable to true to force this engine to be selected. Because the sort key values for the list are calculated before entering Perl's sort operation, there's less of a chance of possible re-entry problems due to nested uses of the sort operator, which causes a fatal error in at least some versions of Perl.
=item packed
Some sorts are handled with the Guttman-Rosler technique, extracting packed keys and using Perl's default sort function, which is substantially faster, but currently only a limited set of simple comparisons can be handled this way. (For more information on packed-default sorting, see http://www.sysarch.com/perl/sort_paper.html or search for "Guttman-Rosler".)
=back
=cut
sub _sorted_trivial {
sort @Array
}
sub _sorted_precalc {
foreach my $rule (@Rules) {
$rule->{ext_value} = [ _extract_values_for_rule( $rule, @Array ) ]
}
return @Array[ sort _sorted_indexes_precalc 0 .. $#Array ];
}
# Compare indexes $a and $b acording to each of the specified rules
# $three_way_cmp = _sorted_indexes_precalc;
sub _sorted_indexes_precalc {
# implicit: $a, $b
RULE: foreach $Rule (@Rules) {
local *ValueSet = ( $Rule->{ext_value} ||= [] );
# If the function returns zero or undef, the values are equivalent
my $rc = &{ $Rule->{compare_func} }
or next RULE;
# Else return the comparison results, reversing them first if necessary
return $rc * $Rule->{order_sign};
}
# If the items are equivalent for all of the rules, don't change their order
# warn "Comparing $a and $b: '$ValueSet[$a]' " . ('=') . " '$ValueSet[$b]'\n";
return $a <=> $b;
}
sub _sorted_orcish {
return @Array[ sort _sorted_indexes_orcish 0 .. $#Array ];
}
sub _sorted_indexes_orcish {
# implicit: $a, $b
RULE: foreach $Rule (@Rules) {
# If we haven't already, calculate the value of each item for this rule
local *ValueSet = ( $Rule->{ext_value} ||= [] );
defined $ValueSet[$a] or $ValueSet[$a] = _extract_value($Array[$a], $Rule);
defined $ValueSet[$b] or $ValueSet[$b] = _extract_value($Array[$b], $Rule);
# If the function returns zero or undef, the values are equivalent
my $rc = &{ $Rule->{compare_func} }
or next RULE;
# Else return the comparison results, reversing them first if necessary
return $rc * $Rule->{order_sign};
}
# If the items are equivalent for all of the rules, don't change their order
# warn "Comparing $a and $b: '$ValueSet[$a]' " . ('=') . " '$ValueSet[$b]'\n";
return $a <=> $b;
}
sub _sorted_packed {
my @packed;
if ( @Rules == 1 ) {
@packed = map
&{ $Rules[0]->{extract_func} }( $Array[$_], @{ $Rules[0]->{extract_args} } )
. "\0" . $_,
( 0 .. $#Array );
} else {
@packed = map {
my $item = $Array[$_];
join( "\0",
map(&{ $_->{extract_func} }( $item, @{ $_->{extract_args} } ), @Rules),
$_
)
} ( 0 .. $#Array );
}
# warn "Packed: " . join(', ', map "'$_'", @packed ) . "\n";
return @Array[ map substr($_, 1 + rindex $_, "\0"), sort @packed ];
}
########################################################################
=head1 STATUS AND SUPPORT
This release of Data::Sorting is intended for public review and feedback.
Name DSLIP Description
-------------- ----- ---------------------------------------------
Data::
::Sorting bdpfp Multi-key sort using function results
Further information and support for this module is available at www.evoscript.org.
Please report bugs or other problems to E<lt>bugs@evoscript.comE<gt>.
=head1 BUGS AND TO DO
The following issues have been noted for future improvements:
Convert more types of comparisons to packed-default sorts for speed.
Further investigate the current status of the Sort::Records module.
Add a comparator function for an alpha-numeric-spans sorting model
like Sort::Naturally.
Interface to Sort::PolySort for alternate comparator styles, like
"name" and "usdate".
For non-scalar values, compare referents along the lines of
Ref::cmpref().
Provide better handling for nested sorts; perhaps throw an exception
from the inner instance to the outer, catch and set $PreCalculate,
then go back into the loop?
Replace dynamic scoping with object instances for thread safety.
May not be necessary given changes in threading models.
=head1 CREDITS AND COPYRIGHT
=head2 Developed By
M. Simon Cavalletto, simonm@cavalletto.org
Evolution Softworks, www.evoscript.org
=head2 Copyright
Copyright 2003 Matthew Cavalletto.
Portions copyright 1996, 1997, 1998, 1999 Evolution Online Systems, Inc.
=head2 License
You may use, modify, and distribute this software under the same terms as Perl.
=cut
########################################################################
1;
|