1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357
|
=head1 NAME
XML::XQL::Tutorial - Describes the XQL query syntax
=head1 DESCRIPTION
This document describes basic the features of the XML Query Language (XQL.)
A proposal for the XML Query Language (XQL) specification was submitted
to the XSL Working Group in September 1998.
The spec can be found at L<http://www.w3.org/TandS/QL/QL98/pp/xql.html>.
Since it is only a proposal at this point, things may change, but it is very
likely that the final version will be close to the proposal.
Most of this document was copied straight from the spec.
See also the L<XML::XQL> man page.
=head1 INTRODUCTION
XQL (XML Query Language) provides a natural extension to the XSL pattern
language. It builds upon the capabilities XSL provides for identifying classes
of nodes, by adding Boolean logic, filters, indexing into collections of nodes,
and more.
XQL is designed specifically for XML documents.
It is a general purpose query language, providing a single syntax
that can be used for queries, addressing, and patterns.
XQL is concise, simple, and powerful.
XQL is designed to be used in many contexts. Although it is a superset of XSL
patterns, it is also applicable to providing links to nodes, for searching
repositories, and for many other applications.
Note that the term XQL is a working term for the language described in this
proposal. It is not their intent that this term be used permanently.
Also, beware that another query language exists called XML-QL,
which uses a syntax very similar to SQL.
The L<XML::XQL> module has added functionality to the XQL spec, called I<XQL+>.
To allow only XQL functionality as described in the spec, use the
XML::XQL::Strict module. Note that the XQL spec makes the distinction between
core XQL and XQL extensions. This implementation makes no distinction and
the Strict module, therefore, implements everything described in the XQL spec.
See the L<XML::XQL> man page for more information about the Strict module.
This tutorial will clearly indicate when referring to XQL+.
=head1 XQL Patterns
This section describes the core XQL notation. These features should be part
of every XQL implementation, and serve as the base level of functionality
for its use in different technologies.
The basic syntax for XQL mimics the URI directory navigation syntax, but
instead of specifying navigation through a
physical file structure, the navigation is through elements in the XML tree.
For example, the following URI means find the foo.jpg file within the bar
directory:
bar/foo.jpg
Similarly, in XQL, the following means find the collection of fuz elements
within baz elements:
baz/fuz
Throughout this document you will find numerous samples. They refer to the data
shown in the sample file at the end of this man page.
=head1 Context
A I<context> is the set of nodes against which a query operates.
For the entire query, which is passed to the L<XML::XQL::Query>
constructor through the I<Expr> option, the context is the list of input nodes
that is passed to the query() method.
XQL allows a query to select between using the current context as the input
context and using the 'root context' as the input context.
The 'root context' is a context containing only the root-most
element of the document. When using XML::DOM, this is the Document object.
By default, a query uses the current context. A query prefixed with '/'
(forward slash) uses the root context. A query may
optionally explicitly state that it is using the current context by using
the './' (dot, forward slash) prefix. Both of these
notations are analogous to the notations used to navigate directories in a file
system.
The './' prefix is only required in one situation. A query may use the '//'
operator to indicate recursive descent. When
this operator appears at the beginning of the query, the initial '/' causes the
recursive decent to perform relative to the
root of the document or repository. The prefix './/' allows a query to perform
a recursive descent relative to the current context.
=over 4
=item Examples:
Find all author elements within the current context. Since the period is really
not used alone, this example forward-references other features:
./author
Note that this is equivalent to:
author
Find the root element (bookstore) of this document:
/bookstore
Find all author elements anywhere within the current document:
//author
Find all books where the value of the style attribute on the book is equal to
the value of the specialty attribute of the bookstore element at the root of
the document:
book[/bookstore/@specialty = @style]
=back
=head1 Query Results
The collection returned by an XQL expression preserves document order,
hierarchy, and identity, to the extent that these are defined.
That is, a collection of elements will always be returned in document order
without repeats. Note that the spec states that the order of attributes within
an element is undefined, but that this implementation does keep attributes
in document order. See the L<XML::XQL> man page for more details regarding
I<Document Order>.
=head1 Collections - 'element' and '.'
The collection of all elements with a certain tag name is expressed using the
tag name itself. This can be qualified by showing that the elements are
selected from the current context './', but the current context is assumed and
often need not be noted explicitly.
=over 4
=item Examples:
Find all first-name elements. These examples are equivalent:
./first-name
first-name
Find all unqualified book elements:
book
Find all first.name elements:
first.name
=back
=head1 Selecting children and descendants - '/' and '//'
The collection of elements of a certain type can be determined using the path
operators ('/' or '//'). These operators take as their arguments a collection
(left side) from which to query elements, and a collection indicating which
elements to select (right side). The child operator ('/')selects from immediate
children of the left-side collection, while the descendant operator ('//')
selects from arbitrary descendants of the left-side collection.
In effect, the '//' can be thought of as a substitute for one or more levels of
hierarchy. Note that the path operators change the context as the
query is performed. By stringing them together users can 'drill down' into the
document.
=over 4
=item Examples:
Find all first-name elements within an author element. Note that the author
children of the current context are found, and then first-name children are
found relative to the context of the author elements:
author/first-name
Find all title elements, one or more levels deep in the bookstore
(arbitrary descendants):
bookstore//title
Note that this is different from the following query, which finds all title
elements that are grandchildren of bookstore elements:
bookstore/*/title
Find emph elements anywhere inside book excerpts, anywhere inside the bookstore:
bookstore//book/excerpt//emph
Find all titles, one or more levels deep in the current context. Note that this
situation is essentially the only one where
the period notation is required:
.//title
=back
=head1 Collecting element children - '*'
An element can be referenced without using its name by substituting the '*'
collection. The '*' collection returns all
elements that are children of the current context, regardless of their tag name.
=over 4
=item Examples:
Find all element children of author elements:
author/*
Find all last-names that are grand-children of books:
book/*/last-name
Find the grandchildren elements of the current context:
*/*
Find all elements with specialty attributes. Note that this example uses
subqueries, which are covered in Filters, and
attributes, which are discussed in Finding an attribute:
*[@specialty]
=back
=head1 Finding an attribute - '@'
Attribute names are preceded by the '@' symbol. XQL is designed to treat
attributes and sub-elements impartially,
and capabilities are equivalent between the two types wherever possible.
Note: attributes cannot contain subelements. Thus, attributes cannot have path
operators applied to them in a query.
Such expressions will result in a syntax error.
The XQL spec states that attributes are inherently unordered and indices
cannot be applied to them, but this implementation allows it.
=over 4
=item Examples:
Find the style attribute of the current element context:
@style
Find the exchange attribute on price elements within the current context:
price/@exchange
The following example is not valid:
price/@exchange/total
Find all books with style attributes. Note that this example uses subqueries,
which are covered in Filters:
book[@style]
Find the style attribute for all book elements:
book/@style
=back
=head1 XQL Literals
XQL query expressions may contain literal values (i.e. constants.)
Numbers (integers and floats) are wrapped in XML::XQL::Number objects and
strings in XML::XQL::Text objects. Booleans (as returned by true() and false())
are wrapped in XML::XQL::Boolean objects.
Strings must be enclosed in single or double quotes. Since XQL does not allow
escaping of special characters, it's impossible to create a string with both
a single and a double quote in it. To remedy this, XQL+ has added the q// and
qq// string delimiters which behave just like they do in Perl.
For Numbers, exponential notation is not allowed. Use the XQL+ function eval()
to circumvent this problem. See L<XML::XQL> man page for details.
The empty list or undef is represented by [] (i.e. reference to empty array)
in this implementation.
=over 4
=item Example
Integer Numbers:
234
-456
Floating point Numbers:
1.23
-0.99
Strings:
"some text with 'single' quotes"
'text with "double" quotes'
Not allowed:
1.23E-4 (use eval("1.23E-4", "Number") in XQL+)
"can't use \"double \"quotes" (use q/can't use "double" quotes/ in XQL+)
=back
=head1 Grouping - '()'
Parentheses can be used to group collection operators for clarity or where the
normal precedence is inadequate to express an operation.
=head1 Filters - '[]'
Constraints and branching can be applied to any collection by adding a filter
clause '[ ]' to the collection. The filter is analogous to the SQL WHERE clause
with ANY semantics. The filter contains a query within it, called the
subquery. The subquery evaluates to a Boolean, and is tested for each element
in the collection. Any elements in the collection failing the subquery test are
omitted from the result collection.
For convenience, if a collection is placed within the filter, a Boolean TRUE
is generated if the collection contains any members, and a FALSE is generated
if the collection is empty. In essence, an expression such as author/degree
implies a collection-to-Boolean conversion function like the following
mythical 'there-exists-a' method.
author[.there-exists-a(degree)]
Note that any number of filters can appear at a given level of an expression.
Empty filters are not allowed.
=over 4
=item Examples:
Find all books that contain at least one excerpt element:
book[excerpt]
Find all titles of books that contain at least one excerpt element:
book[excerpt]/title
Find all authors of books where the book contains at least one excerpt, and
the author has at least one degree:
book[excerpt]/author[degree]
Find all books that have authors with at least one degree:
book[author/degree]
Find all books that have an excerpt and a title:
book[excerpt][title]
=back
=head2 Any and all semantics - '$any$' and '$all$'
Users can explicitly indicate whether to use any or all semantics through
the $any$ and $all$ keywords.
$any$ flags that a condition will hold true if any item in a set meets that
condition. $all$ means that all elements in a
set must meet the condition for the condition to hold true.
$any$ and $all$ are keywords that appear before a subquery expression within
a filter.
=over 4
=item Examples:
Find all author elements where one of the last names is Bob:
author[last-name = 'Bob']
author[$any$ last-name = 'Bob']
Find all author elements where none of the last-name elements are Bob:
author[$all$ last-name != 'Bob']
Find all author elements where the first last name is Bob:
author[last-name[0] = 'Bob']
=back
=head1 Indexing into a collection - '[]' and '$to$'
XQL makes it easy to find a specific node within a set of nodes.
Simply enclose the index ordinal within square brackets. The ordinal is 0 based.
A range of elements can be returned. To do so, specify an expression rather
than a single value inside of the subscript operator (square brackets).
Such expressions can be a comma separated list of any of the following:
n Returns the nth element
-n Returns the element that is n-1 units from the last element.
E.g., -1 means the last element. -2 is the next to last element.
m $to$ n Returns elements m through n, inclusive
=over 4
=item Examples:
Find the first author element:
author[0]
Find the third author element that has a first-name:
author[first-name][2]
Note that indices are relative to the parent. In other words, consider the
following data:
<x>
<y/>
<y/>
</x>
<x>
<y/>
<y/>
</x>
The following expression will return the first y from each of the x's:
x/y[0]
The following will return the first y from the entire set of y's within x's:
(x/y)[0]
The following will return the first y from the first x:
x[0]/y[0]
Find the first and fourth author elements:
author[0,3]
Find the first through fourth author elements:
author[0 $to$ 3]
Find the first, the third through fifth, and the last author elements:
author[0, 2 $to$ 4, -1]
Find the last author element:
author[-1]
=back
=head1 Boolean Expressions
Boolean expressions can be used within subqueries. For example, one could use
Boolean expressions to find all nodes of a particular value, or all nodes with
nodes in particular ranges. Boolean expressions are of the form
${op}$, where {op} may be any expression of the form {b|a} - that is, the
operator takes lvalue and rvalue arguments and returns a Boolean result.
Note that the XQL Extensions section defines additional Boolean operations.
=head2 Boolean AND and OR - '$and$' and '$or$'
$and$ and $or$ are used to perform Boolean ands and ors.
The Boolean operators, in conjunction with grouping parentheses, can be used to
build very sophisticated logical expressions.
Note that spaces are not significant and can be omitted, or included for
clarity as shown here.
=over 4
=item Examples:
Find all author elements that contain at least one degree and one award.
author[degree $and$ award]
Find all author elements that contain at least one degree or award and at
least one publication.
author[(degree $or$ award) $and$ publication]
=back
=head2 Boolean NOT - '$not$'
$not$ is a Boolean operator that negates the value of an expression within a
subquery.
=over 4
=item Examples:
Find all author elements that contain at least one degree element and that
contain no publication elements.
author[degree $and$ $not$ publication]
Find all author elements that contain publications elements but do not contain
either degree elements or award elements.
author[$not$ (degree $or$ award) $and$ publication]
=back
=head1 Union and intersection - '$union$', '|' and '$intersect$'
The $union$ operator (shortcut is '|') returns the combined set of values from
the query on the left and the query on the right. Duplicates are filtered out.
The resulting list is sorted in document order.
Note: because this is a union, the set returned may include 0 or more elements
of each element type in the list. To restrict the returned set to nodes that
contain at least one of each of the elements in the list, use a filter, as
discussed in Filters.
The $intersect$ operator returns the set of elements in common between two sets.
=over 4
=item Examples:
Find all first-names and last-names:
first-name $union$ last-name
Find all books and magazines from a bookstore:
bookstore/(book | magazine)
Find all books and all authors:
book $union$ book/author
Find the first-names, last-names, or degrees from authors within either books
or magazines:
(book $union$ magazine)/author/(first-name $union$ last-name $union$ degree)
Find all books with author/first-name equal to 'Bob' and all magazines with
price less than 10:
book[author/first-name = 'Bob'] $union$ magazine[price $lt$ 10]
=back
=head1 Equivalence - '$eq$', '=', '$ne$' and '!='
The '=' sign is used for equality; '!=' for inequality. Alternatively, $eq$ and
$ne$ can be used for equality and inequality.
Single or double quotes can be used for string delimiters in expressions.
This makes it easier to construct and pass XQL from within scripting languages.
For comparing values of elements, the value() method is implied. That is,
last-name < 'foo' really means last-name!value() < 'foo'.
Note that filters are always with respect to a context. That is, the expression
book[author] means for every book element that is found, see if it has an
author subelement. Likewise, book[author = 'Bob'] means for
every book element that is found, see if it has a subelement named author
whose value is 'Bob'. One can examine the value of the context as well, by
using the . (period). For example, book[. = 'Trenton'] means for every
book that is found, see if its value is 'Trenton'.
=over 4
=item Examples:
Find all author elements whose last name is Bob:
author[last-name = 'Bob']
author[last-name $eq$ 'Bob']
Find all authors where the from attribute is not equal to 'Harvard':
degree[@from != 'Harvard']
degree[@from $ne$ 'Harvard']
Find all authors where the last-name is the same as the /guest/last-name element:
author[last-name = /guest/last-name]
Find all authors whose text is 'Matthew Bob':
author[. = 'Matthew Bob']
author = 'Matthew Bob'
=back
=head2 Comparison - '<', '<=', '>', '>=', '$lt', '$ilt$' etc.
A set of binary comparison operators is available for comparing numbers and
strings and returning Boolean results.
$lt$, $le$, $gt$, $ge$ are used for less than, less than or equal, greater
than, or greater than or equal. These same
operators are also available in a case insensitive form: $ieq$, $ine$, $ilt$,
$ile$, $igt$, $ige$.
<, <=, > and >= are allowed short cuts for $lt$, $le$, $gt$ and $ge$.
=over 4
=item Examples:
Find all author elements whose last name is bob and whose price is > 50
author[last-name = 'Bob' $and$ price $gt$ 50]
Find all authors where the from attribute is not equal to 'Harvard':
degree[@from != 'Harvard']
Find all authors whose last name begins with 'M' or greater:
author[last-name $ge$ 'M']
Find all authors whose last name begins with 'M', 'm' or greater:
author[last-name $ige$ 'M']
Find the first three books:
book[index() $le$ 2]
Find all authors who have more than 10 publications:
author[publications!count() $gt$ 10]
=back
=head2 XQL+ Match operators - '$match$', '$no_match$', '=~' and '!~'
XQL+ defines additional operators for pattern matching. The $match$ operator
(shortcut is '=~') returns TRUE if the lvalue matches the pattern described by
the rvalue. The $no_match$ operator (shortcut is '!~') returns FALSE if they
match. Both lvalue and rvalue are first cast to strings.
The rvalue string should have the syntax of a Perl rvalue, that is the delimiters
should be included and modifiers are allowed. When using delimiters other than
slashes '/', the 'm' should be included. The rvalue should be a string, so don't
forget the quotes! (Or use the q// or qq// delimiters in XQL+, see L<XML::XQL>
man page.)
Note that you can't use the Perl substitution operator s/// here. Try using the
XQL+ subst() function instead.
=over 4
=item Examples:
Find all authors whose name contains bob or Bob:
author[first-name =~ '/[Bb]ob/']
Find all book titles that don't contain 'Trenton' (case-insensitive):
book[title !~ 'm!trenton!i']
=back
=head2 Oher XQL+ comparison operators - '$isa', '$can$'
See the L<XML::XQL> man page for other operators available in XQL+.
=head2 Comparisons and vectors
The lvalue of a comparison can be a vector or a scalar. The rvalue of a
comparison must be a scalar or a value that can be cast at runtime to a scalar.
If the lvalue of a comparison is a set, then any (exists) semantics are used
for the comparison operators. That is, the result of a comparison is true if
any item in the set meets the condition.
=head2 Comparisons and literals
The spec states that the lvalue of an expression cannot be a literal.
That is, I<'1' = a> is not allowed. This implementation allows it, but it's not
clear how useful that is.
=head2 Casting of literals during comparison
Elements, attributes and other XML node types are casted to strings (Text)
by applying the value() method. The value() method calls the text() method by
default, but this behavior can be altered by the user, so the value() method
may return other XQL data types.
When two values are compared, they are first casted to the same type.
See the L<XML::XQL> man page for details on casting.
Note that the XQL spec is not very clear on how values should be casted for
comparison. Discussions with the authors of the XQL spec revealed that there
was some disagreement and their implementations differed on this point.
This implementation is closest to that of Joe Lapp from webMethods, Inc.
=head1 Methods - 'method()' or 'query!method()'
XQL makes a distinction between functions and methods.
See the L<XML::XQL> man page for details.
XQL provides methods for advanced manipulation of collections. These methods
provide specialized collections of nodes (see Collection methods), as well as
information about sets and nodes.
Methods are of the form I<method(arglist)>
Consider the query book[author]. It will find all books that have authors.
Formally, we call the book corresponding to a particular author the reference
node for that author. That is, every author element that is examined is an author
for one of the book elements. (See the Annotated XQL BNF Appendix for a much
more thorough definition of reference node and other terms. See also the
XML::XQL man page.) Methods always apply to the reference node.
For example, the text() method returns the text contained within a node,
minus any structure. (That is, it is the concatenation of all text nodes
contained with an element and its descendants.) The following expression will
return all authors named 'Bob':
author[text() = 'Bob']
The following will return all authors containing a first-name child whose
text is 'Bob':
author[first-name!text() = 'Bob']
The following will return all authors containing a child named Bob:
author[*!text() = 'Bob']
Method names are case sensitive.
See the L<XML::XQL> man page on how to define your own methods and functions.
=head2 Information methods
The following methods provide information about nodes in a collection.
These methods return strings or numbers,
and may be used in conjunction with comparison operators within subqueries.
=over 4
=item Method: text()
The text() method concatenates text of the descendents of a node,
normalizing white space along the way. White space will be preserved for a node
if the node has the xml:space attribute set to 'preserve', or if the
nearest ancestor with the xml:space attribute has the attribute set to
'preserve'. When white space is normalized, it is normalized across the
entire string. Spaces are used to separate the text between nodes.
When entity references are used in a document, spacing is not inserted
around the entity refs when they are expanded.
In this implementation, the method may receive an optional parameter
to indicate whether the text() of Element nodes should include the text() of
its Element descendants. See L<XML::XQL> man page for details.
Examples:
Find the authors whose last name is 'Bob':
author[last-name!text() = 'Bob']
Note this is equivalent to:
author[last-name = 'Bob']
Find the authors with value 'Matthew Bob':
author[text() = 'Matthew Bob']
author[. = 'Matthew Bob']
author = 'Matthew Bob'
=item Method: rawText()
The rawText() method is similar to the text() method, but it does not
normalize whitespace.
In this implementation, the method may receive an optional parameter
to indicate whether the rawText() of Element nodes should include the
rawText() of its Element descendants. See L<XML::XQL> man page for details.
=item Method: value()
Returns a type cast version of the value of a node. If no data type is
provided, returns the same as text().
=over 4
=item Shortcuts
For the purposes of comparison, value( )is implied if omitted.
In other words, when two items are compared, the comparison is between
the value of the two items. Remember that in absence of type information,
value() returns text().
The following examples are equivalent:
author[last-name!value() = 'Bob' $and$ first-name!value() = 'Joe']
author[last-name = 'Bob' $and$ first-name = 'Joe']
price[@intl!value() = 'canada']
price[@intl = 'canada']
=back
=item Method: nodeType()
Returns a number to indicate the type of the node. The values were based
on the node type values in the DOM:
element 1
attribute 2
text 3
entity 6 (not in XQL spec)
PI 7
comment 8
document 9
doc. fragment 10 (not in XQL spec)
notation 11 (not in XQL spec)
Note that in XQL, CDATASection nodes and EntityReference nodes also return 3,
whereas in the DOM CDATASection returns 4 and EntityReference returns 5.
Use the XQL+ method DOM_nodeType() to get DOM node type values.
See the L<XML::DOM> man page for node type values of nodes not mentioned here.
=item Method: nodeTypeString
Returns the name of the node type in lowercase or an empty string. The
following node types are currently supported 1 (element), 2 (attribute),
3 (text), 7 (processing_instruction), 8 (comment), 9 (document)
=item Method: nodeName()
Returns the tag name for Element nodes and the attribute name of attributes.
=back
=head2 Collection index methods
=over 4
=item Method: index()
Returns the index of the value within the search context (i.e. with the input
list of the subquery.) This is not necessarily the same as the index of a
node within its parent node. Note that the XQL spec doesn't explain it well.
=over 4
=item Examples:
Find the first 3 degrees:
degree[index() $lt$ 3]
Note that it skips over other nodes that may exist between the degree elements.
Consider the following data:
<x>
<y/>
<y/>
</x>
<x>
<y/>
<y/>
</x>
The following expression will return the first y from each x:
x/y[index() = 0]
This could also be accomplished by (see Indexing into a Collection):
x/y[0]
=back
=item Method: end()
The end() method returns true for the last element in the search context.
Again, the XQL spec does not explain it well.
=over 4
=item Examples:
Find the last book:
book[end()]
Find the last author for each book:
book/author[end()]
Find the last author from the entire set of authors of books:
(book/author)[end()]
=back
=back
=head2 Aggregate methods
=over 4
=item Method: count( [QUERY] )
Returns the number of values inside the search context.
In XQL+, when the optional QUERY parameter is supplied, it returns the number of
values returned by the QUERY.
=back
=head2 Namespace methods
The following methods can be applied to a node to return namespace information.
=over 4
=item Method: baseName()
Returns the local name portion of the node, excluding the prefix.
Local names are defined only for element nodes and attribute nodes.
The local name of an element node is the local
portion of the node's element type name. The local name of an attribute node is
the local portion of the node's attribute name. If a local name is not defined
for the reference node, the method evaluates to the empty set.
=item Method: namespace()
Returns the URI for the namespace of the node.
Namespace URIs are defined only for element nodes and attribute nodes.
The namespace URI of an element node is the namespace URI associated with the
node's element type name. The namespace URI of an attribute node is
the namespace URI associated with the node's attribute name. If a namespace
URI is not defined for the reference node, the method evaluates to the
empty set.
=item Method: prefix()
Returns the prefix for the node. Namespace prefixes are defined only for
element nodes and attribute nodes. The namespace prefix of an element
node is the shortname for the namespace of the node's element type name.
The namespace prefix of an attribute
node is the shortname for the namespace of the node's attribute name.
If a namespace prefix is not defined
for the reference node, the method evaluates to the empty set.
The spec states: A node's namespace prefix may be defined
within the query expression, within the document under query, or within both
the query expression and the document under query. If it is defined in both
places the prefixes may not agree. In this case, the prefix assigned by
the query expression takes precedence.
In this implementation you cannot define the namespace for a query, so this
can never happen.
=over 4
=item Examples:
Find all unqualified book elements. Note that this does not return my:book
elements:
book
Find all book elements with the prefix 'my'. Note that this query does not
return unqualified book elements:
my:book
Find all book elements with a 'my' prefix that have an author subelement:
my:book[author]
Find all book elements with a 'my' prefix that have an author subelement with a
my prefix:
my:book[my:author]
Find all elements with a prefix of 'my':
my:*
Find all book elements from any namespace:
*:book
Find any element from any namespace:
*
Find the style attribute with a 'my' prefix within a book element:
book/@my:style
=back
All attributes of an element can be returned using @*.
This is potentially useful for applications that treat attributes
as fields in a record.
=over 4
=item Examples:
Find all attributes of the current element context:
@*
Find style attributes from any namespace:
@*:style
Find all attributes from the 'my' namespace, including unqualified attributes on
elements from the 'my' namespace:
@my:*
=back
=back
=head1 Functions
This section defines the functions of XQL. The spec states that:
XQL defines two kinds of functions:
collection functions and pure functions. Collection functions use the search
context of the Invocation instance, while pure functions ignore the
search context, except to evaluate the function's parameters. A collection
function evaluates to a subset of the search context, and a pure function
evaluates to either a constant value or to a value that depends only on the
function's parameters.
Don't worry if you don't get it. Just use them!
=head2 Collection functions
The collection functions provide access to the various types of nodes in a
document. Any of these collections can be constrained and indexed.
The collections return the set of children of the reference node meeting the
particular restriction.
=over 4
=item Function: textNode()
The collection of text nodes.
=item Function: comment()
The collection of comment nodes.
=item Function: pi()
The collection of processing instruction nodes.
=item Function: element( [NAME] )
The collection of all element nodes. If the optional text
parameter is provided, it only returns element children
matching that particular name.
=item Function: attribute( [NAME] )
The collection of all attribute nodes. If the optional text
parameter is provided, it only returns attributes matching that
particular name.
=item Function: node()
The collection of all non-attribute nodes.
=over 4
=item Examples:
Find the second text node in each p element in the current context:
p/textNode()[1]
Find the second comment anywhere in the document. See Context for details on
setting the context to the document root:
//comment()[1]
=back
=back
=head2 Other XQL Functions
=over 4
=item Function: ancestor(QUERY)
Finds the nearest ancestor matching the provided query. It returns either a
single element result or an empty set [].
Note that this node is never the reference node itself.
=over 4
=item Examples:
Find the nearest book ancestor of the current element:
ancestor(book)
Find the nearest ancestor author element that is contained in a book element:
ancestor(book/author)
=back
=item Function: id(NAME)
Pure function that evaluates to a set. The set contains an element node that
has an 'id' attribute whose value is identical to the string that the Text
parameter quotes. The element node may appear anywhere within the
document under query. If more than one element node meets these criteria,
the function evaluates to a set that contains the first node appearing in a
document ordering of the nodes.
=item Function: true() and false()
Pure functions that each evaluate to a Boolean. "true()" evaluates to 'true',
and "false()" evaluates to 'false'. These functions are useful in expressions
that are constructed using entity references or variable substitution, since
they may replace an expression found in an instance of Subquery without
violating the syntax required by the instance of Subquery.
They return an object of type XML::XQL::Boolean.
=item Function: date(QUERY)
"date" is a pure function that typecasts the value of its parameter to a set of
dates. If the parameter matches a single string, the value of the function is a
set containing a single date. If the parameter matches a QUERY, the value of
the function is a set of dates, where the set contains one date for each member
of the set to which the parameter evaluates.
XQL does not define the representation of the date value, nor does it
define how the function translates parameter values into dates.
This implementation uses the Date::Manip module to parse dates, which accepts
almost any imaginable format. See L<XML::XQL> to plug in your own
Date implementation.
Include the L<XML::XQL::Date> package to add the XQL date type and the date()
function, like this:
use XML::XQL::Date;
=item Perl builtin functions and other XQL+ functions
XQL+ provides XQL function wrappers for most Perl builtin functions.
It also provides other cool functions like subst(), map(), and eval() that
allow you to modify documents and embed perl code.
If this is still not enough, you can add your own function and methods.
See L<XML::XQL> man page for details.
=back
=head1 Sequence Operators - ';' and ';;'
The whitepaper 'The Design of XQL' by Jonathan Robie, which can be found
at L<http://www.texcel.no/whitepapers/xql-design.html> describes the sequence
operators ';;' (precedes) and ';' (immediately precedes.) Although these
operators are not included in the XQL spec, I thought I'd add them anyway.
=head2 Immediately Precedes - ';'
=over 4
=item Example:
With the following input:
<TABLE>
<ROWS>
<TR>
<TD>Shady Grove</TD>
<TD>Aeolian</TD>
</TR>
<TR>
<TD>Over the River, Charlie</TD>
<TD>Dorian</TD>
</TR>
</ROWS>
</TABLE>
Find the TD node that contains "Shady Grove" and the TD node that immediately
follows it:
//(TD="Shady Grove" ; TD)
=back
Note that in XML::DOM there is actually a text node with whitespace between
the two TD nodes, but those are ignored by this operator, unless the text node
has 'xml:space' set to 'preserve'. See ??? for details.
=head2 Precedes - ';;'
=over 4
=item Example:
With the following input (from Hamlet):
<SPEECH>
<SPEAKER>MARCELLUS</SPEAKER>
<LINE>Tis gone!</LINE>
<STAGEDIR>Exit Ghost</STAGEDIR>
<LINE>We do it wrong, being so majestical,</LINE>
<LINE>To offer it the show of violence;</LINE>
<LINE>For it is, as the air, invulnerable,</LINE>
<LINE>And our vain blows malicious mockery.</LINE>
</SPEECH>
Return the STAGEDIR and all the LINEs that follow it:
SPEECH//( STAGEDIR ;; LINE )
Suppose an actor playing the ghost wants to know when to exit; that is, he
wants to know who says what line just before
he is supposed to exit. The line immediately precedes the stagedir, but the
speaker may occur at any time before the line.
In this query, we will use the "precedes" operator (";;") to identify a speaker
that precedes the line somewhere within a
speech. Our ghost can find the required information with the following query,
which selects the speaker, the line, and the stagedir:
SPEECH//( SPEAKER ;; LINE ; STAGEDIR="Exit Ghost")
=back
=head1 Operator Precedence
The following table lists operators in precedence order, highest precedence
first, where operators of a given row have the same precedence.
The table also lists the associated productions:
Production Operator(s)
---------- -----------
Grouping ( )
Filter [ ]
Subscript [ ]
Bang !
Path / //
Match $match$ $no_match$ =~ !~ (XQL+ only)
Comparison = != < <= > >= $eq$ $ne$ $lt$ $le$ $gt$
$ge$ $ieq$ $ine$ $ilt$ $ile$ $igt$ $ige$
Intersection $intersect$
Union $union$ |
Negation $not$
Conjunction $and$
Disjunction $or$
Sequence ; ;;
=head1 Sample XML Document - bookstore.xml
This file is also stored in samples/bookstore.xml that comes with the
XML::XQL distribution.
<?xml version='1.0'?>
<!-- This file represents a fragment of a book store inventory database -->
<bookstore specialty='novel'>
<book style='autobiography'>
<title>Seven Years in Trenton</title>
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
<book style='textbook'>
<title>History of Trenton</title>
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>
Selected Short Stories of
<first-name>Mary</first-name> <last-name>Bob</last-name>
</publication>
</author>
<price>55</price>
</book>
<magazine style='glossy' frequency='monthly'>
<title>Tracking Trenton</title>
<price>2.50</price>
<subscription price='24' per='year'/>
</magazine>
<book style='novel' id='myfave'>
<title>Trenton Today, Trenton Tomorrow</title>
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
<degree from='Trenton U'>B.A.</degree>
<degree from='Harvard'>Ph.D.</degree>
<award>Pulizer</award>
<publication>Still in Trenton</publication>
<publication>Trenton Forever</publication>
</author>
<price intl='canada' exchange='0.7'>6.50</price>
<excerpt>
<p>It was a dark and stormy night.</p>
<p>But then all nights in Trenton seem dark and
stormy to someone who has gone through what
<emph>I</emph> have.</p>
<definition-list>
<term>Trenton</term>
<definition>misery</definition>
</definition-list>
</excerpt>
</book>
<my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-here.com/schema/'>
<my:title>Who's Who in Trenton</my:title>
<my:author>Robert Bob</my:author>
</my:book>
</bookstore>
=head1 SEE ALSO
The Japanese version of this document can be found on-line at
L<http://member.nifty.ne.jp/hippo2000/perltips/xml/xql/tutorial.htm>
L<XML::XQL>, L<XML::XQL::Date>, L<XML::XQL::Query> and L<XML::XQL::DOM>
|