1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734
|
/*
* $Id$
*
* This file is part of the OpenLink Software Virtuoso Open-Source (VOS)
* project.
*
* Copyright (C) 1998-2012 OpenLink Software
*
* This project is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; only version 2 of the License, dated June 1991.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
%option 8bit
%option case-sensitive
%option never-interactive
%option noyywrap
%option reentrant
%option bison-bridge
%option stack
%{
#include "Dk.h"
#include "xmltree.h"
#include "rdf_core.h"
#include "turtle_p.h"
#include "numeric.h"
#include "turtle_p.h"
#include "nquad_p.h"
#include "security.h"
#include "sqlbif.h"
#if (__NQUAD_NONPUNCT_END != __TTL_NONPUNCT_END)
Sources of parsers are out of sync: mismatch between token declarations in nquad_p.y and turtle_p.y
#endif
#define YY_EXTRA_TYPE ttlp_t *
#define ttlyyerror(strg) ttlyyerror_impl(ttlp_arg, yytext, (strg))
#define TTLYYERROR_OR_REPORT(strg) do { \
if (!(ttlp_arg[0].ttlp_flags & TTLP_ERROR_RECOVERY)) \
ttlyyerror (strg); \
else tf_report (ttlp_arg[0].ttlp_tf, 'E', NULL, NULL, (strg)); } while (0)
#define TTLYYERROR_OR_RECOVER(strg) do { \
TTLYYERROR_OR_REPORT(strg); \
return TTL_RECOVERABLE_ERROR; } while (0)
#define ttlyylval (((YYSTYPE *)(yylval_param))[0])
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
do \
{ \
ttlp_t *ttlp_arg = ttlyyget_extra (yyscanner); \
int rest_len = ttlp_arg[0].ttlp_text_len - ttlp_arg[0].ttlp_text_ofs; \
int get_len = (max_size); \
if (rest_len > 0) \
{ \
if (get_len > rest_len) \
get_len = rest_len; \
memcpy ((buf), (ttlp_arg[0].ttlp_text + ttlp_arg[0].ttlp_text_ofs), get_len); \
(result) = get_len; \
ttlp_arg[0].ttlp_text_ofs += get_len; \
break; \
} \
if (NULL != ttlp_arg[0].ttlp_iter) \
{ \
(result) = ttlp_arg[0].ttlp_iter (ttlp_arg[0].ttlp_iter_data, buf, max_size); \
break; \
} \
(result) = 0; \
} while (0);
int ttl_yy_null = YY_NULL;
struct yyguts_t;
extern int ttlyyerror_if_long_qname (ttlp_t *ttlp_arg, int lexcode, caddr_t box, const char *lex_type_descr, struct yyguts_t * yyg);
#define TTL_TOKBOX_Q_FINAL(lexcode,lex_type_descr) \
if (box_length (ttlyylval.box) > MAX_XML_LNAME_LENGTH) \
return ttlyyerror_if_long_qname (ttlp_arg, (lexcode), ttlyylval.box, lex_type_descr , yyg); \
return (lexcode);
#define TTL_TOKBOX_Q(n,lexcode,lex_type_descr) { \
ttlyylval.box = box_dv_short_string (yytext+(n)); \
TTL_TOKBOX_Q_FINAL(lexcode,lex_type_descr) }
#define TTL_TOKBOX_BNODE(n,lexcode,lex_type_descr) { \
int slen = strlen (yytext+(n)); \
if (slen > 1+MAX_XML_LNAME_LENGTH) { \
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_DIRTY_NAMES)) \
TTLYYERROR_OR_RECOVER ("Blank node label is abnormally long; this error can be suppressed by parser flags"); \
else \
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Blank node label is abnormally long"); \
} \
ttlyylval.box = box_dv_short_string (yytext+(n)); \
return (lexcode); }
#define TTL_SPECIAL_QNAME(bit,id) \
if (!(ttlp_arg[0].ttlp_special_qnames & bit)) \
return id; \
ttlyylval.box = box_dv_short_string (yytext); \
return QNAME;
extern int ttlp_NUMBER_int (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg);
extern int ttlp_NUMBER_decimal (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg);
extern int ttlp_NUMBER_double (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg);
extern int ttlyylex (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, yyscan_t yyscanner);
#define YY_DECL int ttlyylex (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, yyscan_t yyscanner)
extern int ttlyydebug;
%}
/* Top-level INITIAL state */
/*%x INITIAL*/
/* Internals of single-quoted INITIAL string lit */
%x TURTLE_SQ
/* Internals of double-quoted INITIAL string lit */
%x TURTLE_DQ
/* Internals of invalid single-quoted INITIAL string lit */
%x TURTLE_BAD_SQ
/* Internals of invalid double-quoted INITIAL string lit */
%x TURTLE_BAD_DQ
/* Internals of triple-single-quoted INITIAL string lit */
%x TURTLE_SSSQ
/* Internals of triple-double-quoted INITIAL string lit */
%x TURTLE_DDDQ
/* State after '@' right after quoted string */
%x TURTLE_AT_AFTER_QUOTED
/* State after _really_ unexpected character, to skip to the dot and whitespace */
%x TURTLE_SKIP_TO_DOT_WS
/* Special unreacheable state to fill the first item of ttlp_arg[0].ttlp_lexstates */
%x UNREACHEABLE
INTEGER_LITERAL ([0-9]+)
DECIMAL_LITERAL (([0-9]+"."[0-9]*)|("."[0-9]+))
DOUBLE_LITERAL (({INTEGER_LITERAL}|{DECIMAL_LITERAL})[eE][+-]?[0-9]+)
SPAR_SQ_PLAIN ([^\x00-\x1f\\''\r\n]|[\t])
SPAR_DQ_PLAIN ([^\x00-\x1f\\""\r\n]|[\t])
SPAR_ECHAR ([\\]([atbvnrf\\""'']|("u"{HEX}{HEX}{HEX}{HEX})|("U"{HEX}{HEX}{HEX}{HEX}{HEX}{HEX}{HEX}{HEX})))
SPAR_LANGTAG ([a-zA-Z]+)(("-"([a-zA-Z0-9]+))*)
SPAR_OLD_LANGTAG ([a-z]+)"_"([a-zA-Z0-9]+)
S_NL ((\r\n)|(\n\r)|\n|\r)
HEX ([0-9A-Fa-f])
SPAR_NCCHAR1p ([A-Za-z\x7f-\xfe])
SPAR_NCCHAR1 ([A-Za-z\x7f-\xfe_])
SPAR_VARNAME ([A-Za-z0-9\x7f-\xfe_]+)
SPAR_NCCHAR ([A-Za-z0-9_\x7f-\xfe-])
SPAR_NCCHAR_X ([A-Za-z0-9_#%?&=*\x7f-\xfe-])
SPAR_NCNAME_PREFIX ({SPAR_NCCHAR1p}([A-Za-z0-9_.\x7f-\xfe-]*{SPAR_NCCHAR})?)
SPAR_NCNAME ({SPAR_NCCHAR1}([A-Za-z0-9_.\x7f-\xfe-]*{SPAR_NCCHAR})?)
SPAR_NCNAME_X (([A-Za-z0-9_./#%+?&=@*:\x7f-\xfe-]*[A-Za-z0-9_/#%+?&=@*:\x7f-\xfe-]+)|([A-Za-z0-9_./#%+?&=@*:\x7f-\xfe-]*[./#%+?&=@*:][A-Za-z0-9_./#%+?&=@*:\x7f-\xfe-]*[.]))
%%
<INITIAL>"^"[ \t] { return _CARET_WS ; }
<INITIAL>"^"{S_NL} { ttlp_arg[0].ttlp_lexlineno++; return _CARET_WS ; }
<INITIAL>"^" { return _CARET_NOWS ; }
<INITIAL>"^^" { return _CARET_CARET ; }
<INITIAL>"," { return _COMMA; }
<INITIAL>"."[ \t] { return _DOT_WS; }
<INITIAL>"."{S_NL} { ttlp_arg[0].ttlp_lexlineno++; return _DOT_WS; }
<INITIAL>"."[^0-9] { TTLYYERROR_OR_RECOVER ("Whitespace is required after dot if dot is not inside decimal number, string or IRI"); }
<INITIAL>"." { return _DOT_WS; }
<INITIAL>":" { return _COLON; }
<INITIAL>";" { return _SEMI; }
<INITIAL>"=" { return (((0 == ttlp_arg[0].ttlp_lexdepth) && (ttlp_arg[0].ttlp_flags & TTLP_ALLOW_TRIG)) ? _EQ_TOP_TRIG : _EQ); }
<INITIAL>"=>" { return _EQ_GT; }
<INITIAL>"<=" { return _LT_EQ; }
<INITIAL>"!" { return _BANG; }
<INITIAL>"@a" { return _AT_a_L; }
<INITIAL>"@base" { return _AT_base_L; }
<INITIAL>"@has" { return _AT_has_L; }
<INITIAL>"@is" { return _AT_is_L; }
<INITIAL>"@of" { return _AT_of_L; }
<INITIAL>"@this" { return _AT_this_L; }
<INITIAL>"@keywords" { return _AT_keywords_L; }
<INITIAL>"@prefix" { return _AT_prefix_L; }
<INITIAL>"@forAll" { ttlyyerror ("Current version of Virtuoso does not support @forAll keyword"); }
<INITIAL>"@forSome" { ttlyyerror ("Current version of Virtuoso does not support @forSome keyword"); }
<INITIAL>"-INF" { return _MINUS_INF_L; }
<INITIAL>[+]?"INF" { return INF_L; }
<INITIAL>[+-]?"NaN" { return NaN_L; }
<INITIAL>"false" { return false_L; }
<INITIAL>"true" { return true_L; }
<INITIAL>"a" { TTL_SPECIAL_QNAME (TTLP_ALLOW_QNAME_A, _AT_a_L) }
<INITIAL>"has" { TTL_SPECIAL_QNAME (TTLP_ALLOW_QNAME_HAS, _AT_has_L) }
<INITIAL>"is" { TTL_SPECIAL_QNAME (TTLP_ALLOW_QNAME_IS, _AT_is_L) }
<INITIAL>"of" { TTL_SPECIAL_QNAME (TTLP_ALLOW_QNAME_OF, _AT_of_L) }
<INITIAL>"this" { TTL_SPECIAL_QNAME (TTLP_ALLOW_QNAME_THIS, _AT_this_L) }
<INITIAL>"(" { ttlp_arg[0].ttlp_lexdepth++; return _LPAR; }
<INITIAL>")" { ttlp_arg[0].ttlp_lexdepth--; return _RPAR; }
<INITIAL>"[" { ttlp_arg[0].ttlp_lexdepth++; return _LSQBRA; }
<INITIAL>"[]" { return _LSQBRA_RSQBRA; }
<INITIAL>"]" { ttlp_arg[0].ttlp_lexdepth--; return _RSQBRA; }
<INITIAL>"{" {
int depth = ttlp_arg[0].ttlp_lexdepth++;
return (((0 == depth) && (ttlp_arg[0].ttlp_flags & TTLP_ALLOW_TRIG)) ? _LBRA_TOP_TRIG : _LBRA);
}
<INITIAL>"}" { ttlp_arg[0].ttlp_lexdepth--; return _RBRA; }
<INITIAL>"<"([^\\>\001-\040])*">" {
ttlyylval.box = box_dv_short_nchars (yytext + 1, yyleng - 2);
return Q_IRI_REF;
}
<INITIAL>"<"(([^\\>\001-\040])|{SPAR_ECHAR})*">" {
ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_LTGT, '>');
return Q_IRI_REF;
}
<INITIAL>"<"([^<>\001-\037]|"\t")*">" {
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_DIRTY_NAMES))
TTLYYERROR_OR_RECOVER ("Invalid characters in angle-bracketed name; this error can be suppressed by parser flags");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Invalid characters in angle-bracketed name");
ttlyylval.box = box_dv_short_nchars (yytext + 1, yyleng - 2);
return Q_IRI_REF;
}
<INITIAL>"<"([^<>\001-\037]|"\t")+{S_NL}([^<>.;,\[\]{}\'\"\\\001-\037]|"\t")*">" {
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_DIRTY_NAMES))
TTLYYERROR_OR_RECOVER ("Line break in angle-bracketed name; this error can be suppressed by parser flags");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Line break in angle-bracketed name");
ttlyylval.box = box_dv_short_nchars (yytext + 1, yyleng - 2);
ttlp_arg[0].ttlp_lexlineno++;
return Q_IRI_REF;
}
<INITIAL>"<"([^<>\001-\037]|"\t")* {
TTLYYERROR_OR_REPORT ("Invalid characters in angle-bracketed name");
BEGIN (TURTLE_SKIP_TO_DOT_WS); yymore (); }
<INITIAL>({SPAR_NCNAME_PREFIX}?)":"{SPAR_NCNAME} { TTL_TOKBOX_Q(0,QNAME,"qualified URI"); }
<INITIAL>({SPAR_NCNAME_PREFIX}?)":"{SPAR_NCNAME_X} {
if (!(ttlp_arg[0].ttlp_flags & TTLP_NAME_MAY_CONTAIN_PATH))
TTLYYERROR_OR_RECOVER ("Invalid characters in local part of QName; this error can be suppressed by parser flags");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Invalid characters in local part of QName");
TTL_TOKBOX_Q(0,QNAME,"qualified URI"); }
<INITIAL>({SPAR_NCNAME_PREFIX}?)":" { TTL_TOKBOX_Q(0,QNAME_NS,"namespace"); }
<INITIAL>{SPAR_NCNAME} { TTL_TOKBOX_Q(0,QNAME,"name without prefix"); }
<INITIAL>"_:"{SPAR_NCNAME} { TTL_TOKBOX_BNODE(0,BLANK_NODE_LABEL,"blank node label"); }
<INITIAL>"_:"{SPAR_NCCHAR_X}+ {
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_DIRTY_NAMES))
TTLYYERROR_OR_RECOVER ("Ill formed blank node label; this error can be suppressed by parser flags");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Ill formed blank node label");
TTL_TOKBOX_BNODE(0,BLANK_NODE_LABEL,"blank node label"); }
<INITIAL>"?"{SPAR_NCNAME} {
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_VARIABLES))
TTLYYERROR_OR_RECOVER ("Full N3 syntax allow variables, TURTLE does not; this error can be suppressed by parser flags");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Full N3 syntax allow variables, TURTLE does not");
TTL_TOKBOX_Q(0,VARIABLE,"variable name"); }
<INITIAL>"@"([a-zA-Z]+)(("-"([a-zA-Z0-9]+))*) {
ttlyylval.box = box_dv_short_nchars (yytext + 1, yyleng - 1);
return KEYWORD;
}
<TURTLE_AT_AFTER_QUOTED>{SPAR_LANGTAG} {
BEGIN INITIAL;
ttlyylval.box = box_dv_short_nchars (yytext, yyleng);
return LANGTAG;
}
<TURTLE_AT_AFTER_QUOTED>{SPAR_OLD_LANGTAG} {
char *tail;
BEGIN INITIAL;
if (!(ttlp_arg[0].ttlp_flags & TTLP_ACCEPT_DIRTY_NAMES))
TTLYYERROR_OR_RECOVER ("Obsolete format of language; this error can be recovered if proper parser flags are set");
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "Obsolete format of language");
ttlyylval.box = box_dv_short_nchars (yytext, yyleng);
for (tail = ttlyylval.box + (yyleng); tail >= ttlyylval.box; tail--)
if ('_' == tail[0])
tail[0] = '-';
return LANGTAG;
}
<TURTLE_AT_AFTER_QUOTED>([a-zA-Z0-9-]+) { TTLYYERROR_OR_RECOVER ("The identifier after '@' at the end of quoted literal is not a valid language id"); }
<TURTLE_AT_AFTER_QUOTED><<EOF>> { TTLYYERROR_OR_RECOVER ("Missing language id after '@' at the end of quoted literal"); }
<TURTLE_AT_AFTER_QUOTED>. { TTLYYERROR_OR_RECOVER ("Bad character instead of language id after '@' at the end of quoted literal"); }
<INITIAL>([""][^""\\\n]*[""])|([''][^''\\\n]*['']) {
ttlyylval.box = box_dv_short_nchars (yytext+1, yyleng - 2);
return TURTLE_STRING;
}
<INITIAL>(([""][^""\\\n]*[""])|([''][^''\\\n]*['']))"@" {
ttlyylval.box = box_dv_short_nchars (yytext+1, yyleng - 3);
BEGIN TURTLE_AT_AFTER_QUOTED;
return TURTLE_STRING;
}
<INITIAL>[''][''][''] { yymore(); BEGIN TURTLE_SSSQ; }
<INITIAL>[""][""][""] { yymore(); BEGIN TURTLE_DDDQ; }
<TURTLE_SSSQ>[''][''](['']+) { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_3QUOT, '\''); BEGIN INITIAL; return TURTLE_STRING; }
<TURTLE_DDDQ>[""][""]([""]+) { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_3QUOT, '\"'); BEGIN INITIAL; return TURTLE_STRING; }
<TURTLE_SSSQ>['']['']['']"@" { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_3QUOT_AT, '\''); BEGIN TURTLE_AT_AFTER_QUOTED; return TURTLE_STRING; }
<TURTLE_DDDQ>[""][""][""]"@" { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_3QUOT_AT, '\"'); BEGIN TURTLE_AT_AFTER_QUOTED; return TURTLE_STRING; }
<TURTLE_SSSQ>(([''](['']?))?{S_NL}) { ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_DDDQ>(([""]([""]?))?{S_NL}) { ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_SSSQ>((([''](['']?))?({SPAR_SQ_PLAIN}|{SPAR_ECHAR}))+) { yymore(); }
<TURTLE_DDDQ>((([""]([""]?))?({SPAR_DQ_PLAIN}|{SPAR_ECHAR}))+) { yymore(); }
<TURTLE_SSSQ>[\\] { ttlyyerror ("Bad escape sequence in a long single-quoted string"); }
<TURTLE_DDDQ>[\\] { ttlyyerror ("Bad escape sequence in a long double-quoted string"); }
<TURTLE_SSSQ>. { ttlyyerror ("Bad character in a long single-quoted string"); }
<TURTLE_DDDQ>. { ttlyyerror ("Bad character in a long double-quoted string"); }
<TURTLE_SSSQ><<EOF>> { ttlyyerror ("Unterminated long single-quoted string"); }
<TURTLE_DDDQ><<EOF>> { ttlyyerror ("Unterminated long double-quoted string"); }
<INITIAL>[''] { yymore(); BEGIN TURTLE_SQ; }
<INITIAL>[""] { yymore(); BEGIN TURTLE_DQ; }
<TURTLE_SQ>[''] { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_QUOT, '\''); BEGIN INITIAL; return TURTLE_STRING; }
<TURTLE_DQ>[""] { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_QUOT, '\"'); BEGIN INITIAL; return TURTLE_STRING; }
<TURTLE_SQ>['']"@" { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_QUOT_AT, '\''); BEGIN TURTLE_AT_AFTER_QUOTED; return TURTLE_STRING; }
<TURTLE_DQ>[""]"@" { ttlyylval.box = ttlp_strliteral (ttlp_arg, yytext, TTLP_STRLITERAL_QUOT_AT, '\"'); BEGIN TURTLE_AT_AFTER_QUOTED; return TURTLE_STRING; }
<TURTLE_SQ>{S_NL} {
if (!(TTLP_STRING_MAY_CONTAIN_CRLF & ttlp_arg[0].ttlp_flags))
{
TTLYYERROR_OR_REPORT ("End-of-line in a short single-quoted string; this error can be suppressed by parser flags");
BEGIN TURTLE_BAD_SQ;
}
else
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "End-of-line in a short single-quoted string");
ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_DQ>{S_NL} {
if (!(TTLP_STRING_MAY_CONTAIN_CRLF & ttlp_arg[0].ttlp_flags))
{
TTLYYERROR_OR_REPORT ("End-of-line in a short double-quoted string; this error can be suppressed by parser flags");
BEGIN TURTLE_BAD_DQ;
}
else
tf_report (ttlp_arg[0].ttlp_tf, 'W', NULL, NULL, "End-of-line in a short double-quoted string");
ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_SQ>[\\]{S_NL} {
if (!(TTLP_STRING_MAY_CONTAIN_CRLF & ttlp_arg[0].ttlp_flags))
{
TTLYYERROR_OR_REPORT ("End-of-line escape sequence in a single-quoted string is not permitted by strict TURTLE; this error can be suppressed by parser flags");
BEGIN TURTLE_BAD_SQ;
}
ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_DQ>[\\]{S_NL} {
if (!(TTLP_STRING_MAY_CONTAIN_CRLF & ttlp_arg[0].ttlp_flags))
{
TTLYYERROR_OR_REPORT ("End-of-line escape sequence in a double-quoted string is not permitted by strict TURTLE; this error can be suppressed by parser flags");
BEGIN TURTLE_BAD_DQ;
}
ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_SQ>(({SPAR_SQ_PLAIN}|{SPAR_ECHAR})+) { yymore(); }
<TURTLE_DQ>(({SPAR_DQ_PLAIN}|{SPAR_ECHAR})+) { yymore(); }
<TURTLE_SQ>[\\] {
TTLYYERROR_OR_REPORT ("Bad escape sequence in a short single-quoted string");
BEGIN (TURTLE_BAD_SQ); yymore(); }
<TURTLE_DQ>[\\] {
TTLYYERROR_OR_REPORT ("Bad escape sequence in a short double-quoted string");
BEGIN (TURTLE_BAD_DQ); yymore(); }
<TURTLE_BAD_SQ>(({SPAR_SQ_PLAIN}|{SPAR_ECHAR})+) {}
<TURTLE_BAD_DQ>(({SPAR_DQ_PLAIN}|{SPAR_ECHAR})+) {}
<TURTLE_BAD_SQ>{S_NL} { ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_BAD_DQ>{S_NL} { ttlp_arg[0].ttlp_lexlineno++; yymore(); }
<TURTLE_BAD_SQ>[^''] {}
<TURTLE_BAD_DQ>[^""] {}
<TURTLE_BAD_SQ>['']("@"{SPAR_LANGTAG})? { BEGIN INITIAL; return TTL_RECOVERABLE_ERROR; }
<TURTLE_BAD_DQ>[""]("@"{SPAR_LANGTAG})? { BEGIN INITIAL; return TTL_RECOVERABLE_ERROR; }
<TURTLE_SQ,TURTLE_BAD_SQ><<EOF>> { ttlyyerror ("Unterminated short single-quoted string"); }
<TURTLE_DQ,TURTLE_BAD_DQ><<EOF>> { ttlyyerror ("Unterminated short double-quoted string"); }
<INITIAL>[+-]?{INTEGER_LITERAL} { return ttlp_NUMBER_int (yylval, ttlp_arg, yyg); }
<INITIAL>[+-]?{DECIMAL_LITERAL} { return ttlp_NUMBER_decimal (yylval, ttlp_arg, yyg); }
<INITIAL>[+-]?{DOUBLE_LITERAL} { return ttlp_NUMBER_double (yylval, ttlp_arg, yyg); }
<INITIAL>("#"(.*))?{S_NL} { ttlp_arg[0].ttlp_lexlineno++; }
<INITIAL>"#"(.*) { }
<INITIAL>[ \t]+ { }
<INITIAL>. {
char buf[100]; sprintf (buf, "Unexpected character '%c'", yytext[yyleng-1]);
TTLYYERROR_OR_REPORT (buf);
BEGIN TURTLE_SKIP_TO_DOT_WS; yymore(); }
<TURTLE_SKIP_TO_DOT_WS>"."[ \t] { BEGIN INITIAL; yyless (yyleng-2); return _GARBAGE_BEFORE_DOT_WS; }
<TURTLE_SKIP_TO_DOT_WS>"."{S_NL} { BEGIN INITIAL; yyless (('.' == yytext[yyleng-2]) ? yyleng-2 : yyleng-3); ttlp_arg[0].ttlp_lexlineno++; return _GARBAGE_BEFORE_DOT_WS; }
<TURTLE_SKIP_TO_DOT_WS>. {
if (yyleng > 8000)
{ char buf[100]; sprintf (buf, "Failed to recover syntax error at \"%.50s...\"", yytext); ttlyyerror (buf); }
yymore(); }
%%
/*void ttlyy_reset (yyscan_t scanner)
{
ttlyyrestart (NULL, scanner);
BEGIN INITIAL;
}*/
int
ttlyyerror_if_long_qname (ttlp_t *ttlp_arg, int lexcode, caddr_t box, const char *lex_type_descr, struct yyguts_t * yyg)
{
size_t boxlen = box_length (box);
char buf[100];
char *colon;
if (boxlen > MAX_XML_QNAME_LENGTH)
{
snprintf (buf, sizeof (buf), "%.90s is too long", lex_type_descr);
goto err; /* see below */
}
colon = strrchr (box, ':');
if (NULL == colon)
{
if (boxlen > MAX_XML_LNAME_LENGTH)
{
snprintf (buf, sizeof (buf), "%.90s is too long", lex_type_descr);
goto err; /* see below */
}
return lexcode;
}
if (colon+1-box > MAX_XML_LNAME_LENGTH)
{
snprintf (buf, sizeof (buf), "%.90s contains abnormally long namespace prefix", lex_type_descr);
goto err; /* see below */
}
if (boxlen-(colon-box) > MAX_XML_LNAME_LENGTH)
{
snprintf (buf, sizeof (buf), "%.90s contains abnormally long 'local part' after the colon", lex_type_descr);
goto err; /* see below */
}
return lexcode;
err:
TTLYYERROR_OR_RECOVER (buf);
return 0; /* Never reached */
}
int
ttlp_NUMBER_int (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg)
{
int l = strlen (yytext);
int has_sign = ((('-' == yytext[0]) || ('+' == yytext[0])) ? 1 : 0);
int is_negative = ('-' == yytext[0]);
if (((10+has_sign) > l) ||
(((10+has_sign) == l) &&
(0 >= strcmp (yytext + has_sign,
(is_negative ? "2147483648" : "2147483647") ) ) ) )
{
ttlyylval.box = box_num_nonull (atol (yytext));
return TURTLE_INTEGER;
}
else
{
numeric_t num = numeric_allocate ();
int rc = numeric_from_string (num, yytext);
ttlyylval.box = (caddr_t) num;
if (NULL == ttlyylval.box)
ttlyylval.box = box_num_nonull (0);
if(rc != NUMERIC_STS_SUCCESS)
TTLYYERROR_OR_RECOVER ("The absolute value of numeric constant is too large");
return TURTLE_INTEGER;
}
}
int
ttlp_NUMBER_decimal (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg)
{
numeric_t num = numeric_allocate ();
int rc = numeric_from_string (num, yytext);
if (NUMERIC_STS_SUCCESS == rc)
{
ttlyylval.box = (caddr_t) num;
if (NULL == ttlyylval.box)
ttlyylval.box = box_num_nonull (0);
return TURTLE_DECIMAL;
}
numeric_free (num);
ttlyylval.box = box_double (atof (yytext));
return TURTLE_DECIMAL;
}
int
ttlp_NUMBER_double (YYSTYPE *yylval_param, ttlp_t *ttlp_arg, struct yyguts_t * yyg)
{
ttlyylval.box = box_double (atof (yytext));
return TURTLE_DOUBLE;
}
caddr_t
ttl_lex_analyze (caddr_t str, int mode_bits, wcharset_t *query_charset)
{
dk_set_t lexems = NULL;
caddr_t result_array;
ttlp_t *ttlp;
yyscan_t scanner;
if (!DV_STRINGP(str))
{
return list (1, list (3, (ptrlong)0, (ptrlong)0, box_dv_short_string ("TURTLE parser: input text is not a string")));
}
ttlp = ttlp_alloc ();
QR_RESET_CTX
{
ttlp->ttlp_text = str;
ttlp->ttlp_text_len = strlen (str);
ttlp->ttlp_err_hdr = ((mode_bits & TTLP_ALLOW_NQUAD) ? "NQuads lexical analyzer" : ((mode_bits & TTLP_ALLOW_TRIG) ? "TriG lexical analyzer" : "TURTLE lexical analyzer"));
if (NULL == query_charset)
query_charset = default_charset;
if (NULL == query_charset)
ttlp->ttlp_enc = &eh__ISO8859_1;
else
{
ttlp->ttlp_enc = eh_get_handler (CHARSET_NAME (query_charset, NULL));
if (NULL == ttlp->ttlp_enc)
ttlp->ttlp_enc = &eh__ISO8859_1;
}
ttlyylex_init (&scanner);
ttlyyset_extra (ttlp, scanner);
#ifdef MALLOC_DEBUG
yy_push_state (INITIAL, scanner);
#endif
for (;;)
{
caddr_t lex_value = NULL;
int lexem;
lexem = ttlyylex ((YYSTYPE *)(&lex_value), ttlp, scanner);
if (0 == lexem)
break;
dk_set_push (&lexems, list (5,
box_num (ttlp->ttlp_lexlineno),
ttlp->ttlp_lexdepth,
box_copy (ttlp->ttlp_raw_text),
lexem, lex_value ) );
}
}
QR_RESET_CODE
{
du_thread_t *self = THREAD_CURRENT_THREAD;
ttlp->ttlp_catched_error = thr_get_error_code (self);
thr_set_error_code (self, NULL);
/*no POP_QR_RESET*/;
}
END_QR_RESET
ttlyylex_destroy (scanner);
if (NULL != ttlp->ttlp_catched_error)
{
dk_set_push (&lexems, list (3,
box_num (ttlp->ttlp_lexlineno),
ttlp->ttlp_lexdepth,
box_copy (ERR_MESSAGE (ttlp->ttlp_catched_error)) ) );
}
ttlp_free (ttlp);
result_array = revlist_to_array (lexems);
return result_array;
}
caddr_t
rdf_load_turtle (
caddr_t text_or_filename, int arg1_is_filename, caddr_t base_uri, caddr_t graph_uri, long flags,
ccaddr_t *cbk_names, caddr_t *app_env,
query_instance_t *qi, wcharset_t *query_charset, caddr_t *err_ret )
{
FILE *srcfile = NULL;
bh_from_client_fwd_iter_t bcfi;
bh_from_disk_fwd_iter_t bdfi;
dk_session_fwd_iter_t dsfi;
/* !!!TBD: add wide support: int text_strg_is_wide = 0; */
dtp_t dtp_of_text = (arg1_is_filename ? 0 : DV_TYPE_OF (text_or_filename));
caddr_t res;
ttlp_t *ttlp;
yyscan_t scanner;
triple_feed_t *tf;
if (arg1_is_filename)
sec_check_dba (qi, "<creading TURTLE from local file>");
if (DV_BLOB_XPER_HANDLE == dtp_of_text)
sqlr_new_error ("42000", "SP036", "Unable to parse TURTLE from persistent XML object");
ttlp = ttlp_alloc ();
ttlp->ttlp_flags = flags;
tf = ttlp->ttlp_tf;
tf->tf_qi = qi;
tf->tf_app_env = app_env;
if (arg1_is_filename)
{
sec_check_dba (qi, "<read XML from URL of type file://...>");
tf->tf_boxed_input_name = file_native_name_from_iri_path_nchars (text_or_filename, strlen (text_or_filename));
file_path_assert (tf->tf_boxed_input_name, NULL, 1);
srcfile = fopen (tf->tf_boxed_input_name, "rb");
if (NULL == srcfile)
{
ttlp_free (ttlp);
sqlr_new_error ("42000", "SR598", "TURTLE parser has failed to open file '%s' for reading", tf->tf_boxed_input_name);
}
ttlp->ttlp_iter = file_read;
ttlp->ttlp_iter_data = srcfile;
goto iter_is_set;
}
if ((DV_BLOB_HANDLE == dtp_of_text) /* !!!TBD: add wide support: || (DV_BLOB_WIDE_HANDLE == dtp_of_text)*/ )
{
blob_handle_t *bh = (blob_handle_t *) text_or_filename;
#if 0 /* !!!TBD: add wide support: */
text_strg_is_wide = ((DV_BLOB_WIDE_HANDLE == dtp_of_text) ? 1 : 0);
#endif
if (bh->bh_ask_from_client)
{
bcfi_reset (&bcfi, bh, qi->qi_client);
ttlp->ttlp_iter = bcfi_read;
ttlp->ttlp_iter_abend = bcfi_abend;
ttlp->ttlp_iter_data = &bcfi;
goto iter_is_set;
}
bdfi_reset (&bdfi, bh, qi);
ttlp->ttlp_iter = bdfi_read;
ttlp->ttlp_iter_data = &bdfi;
goto iter_is_set;
}
if (DV_STRING_SESSION == dtp_of_text)
{
dk_session_t *ses = (dk_session_t *) text_or_filename;
dsfi_reset (&dsfi, ses);
ttlp->ttlp_iter = dsfi_read;
ttlp->ttlp_iter_data = &dsfi;
goto iter_is_set;
}
#if 0 /* !!!TBD: add wide support: */
if (IS_WIDE_STRING_DTP (dtp_of_text))
{
text_len = (s_size_t) (box_length(text_or_filename)-sizeof(wchar_t));
text_strg_is_wide = 1;
goto iter_is_set;
}
#endif
if (IS_STRING_DTP (dtp_of_text))
{
ttlp->ttlp_text = text_or_filename;
ttlp->ttlp_text_len = box_length(text_or_filename) - 1;
goto iter_is_set;
}
ttlp_free (ttlp);
sqlr_new_error ("42000", "SP037",
"Unable to parse TURTLE from data of type %s (%d)", dv_type_title (dtp_of_text), dtp_of_text);
iter_is_set:
tf->tf_default_graph_uri = box_copy (graph_uri);
ttlp->ttlp_err_hdr = ((flags & TTLP_ALLOW_NQUAD) ? "NQuads RDF loader" : ((flags & TTLP_ALLOW_TRIG) ? "TriG RDF loader" : "TURTLE RDF loader"));
if (NULL == query_charset)
query_charset = default_charset;
if (NULL == query_charset)
ttlp->ttlp_enc = &eh__ISO8859_1;
else
{
ttlp->ttlp_enc = eh_get_handler (CHARSET_NAME (query_charset, NULL));
if (NULL == ttlp->ttlp_enc)
ttlp->ttlp_enc = &eh__ISO8859_1;
}
if (box_length (base_uri) > 1)
tf->tf_base_uri = box_copy (base_uri);
tf->tf_line_no_ptr = &(ttlp->ttlp_lexlineno);
QR_RESET_CTX
{
ttlyylex_init (&scanner);
ttlyyset_extra (ttlp, scanner);
if ((ttlp->ttlp_iter == file_read) && !lite_mode && (YY_BUF_SIZE < 0x10000))
ttlyypush_buffer_state (ttlyy_create_buffer (NULL, (YY_BUF_SIZE > 0x2000) ? (YY_BUF_SIZE * 8) : 0x10000, scanner), scanner);
#ifdef MALLOC_DEBUG
yy_push_state (INITIAL, scanner);
#endif
tf_set_cbk_names (tf, (ccaddr_t *)cbk_names);
TF_CHANGE_GRAPH_TO_DEFAULT (tf);
/* ttlyydebug = 1; */
if (flags & TTLP_ALLOW_NQUAD)
nqyyparse (ttlp, scanner);
else
ttlyyparse (ttlp, scanner);
/* ttlyydebug = 0; */
tf_commit (tf);
}
QR_RESET_CODE
{
du_thread_t *self = THREAD_CURRENT_THREAD;
/* ttlyydebug = 0; */
ttlp->ttlp_catched_error = thr_get_error_code (self);
thr_set_error_code (self, NULL);
if (NULL != ttlp->ttlp_iter_abend)
{
ttlp->ttlp_iter_abend (ttlp->ttlp_iter_data);
ttlp->ttlp_iter_abend = NULL;
}
/*no POP_QR_RESET*/;
}
END_QR_RESET
if (NULL != srcfile)
fclose (srcfile);
#ifdef MALLOC_DEBUG_1
dbg_allows_free_nulls++;
#endif
ttlyylex_destroy (scanner);
#ifdef MALLOC_DEBUG_1
dbg_allows_free_nulls--;
#endif
err_ret[0] = ttlp->ttlp_catched_error;
ttlp->ttlp_catched_error = NULL;
res = box_copy (graph_uri);
ttlp_free (ttlp);
return res;
}
|