1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>HTML TIDY - Release Notes</title>
<meta name="keywords" content=
"HTML, validation, error correction, pretty-printing">
<meta name="author" content="Dave Raggett <dsr@w3.org>">
<style type="text/css">
body {
margin-left: 10%;
margin-right: 10%;
font-family: sans-serif
}
h1 { margin-left: -8% }
h2,h3,h4,h5,h6 { margin-left: -4% }
pre { color: green; font-weight: bold; font-size: 80%; font-family: monospace}
em { font-style: italic; font-weight: bold }
strong { text-transform: uppercase; font-weight: bold }
.note {font-style: italic; color: rgb(192, 101, 101) }
//hr {text-align: center; width: 60% }
blockquote {
color: navy;
margin-left: 1%;
margin-right: 1%;
text-align: center;
font-family: "Comic Sans MS", "Times New Roman", serif
}
table {
font-family: sans-serif;
font-size: 80%;
background: rgb(255,255,153)
}
td {
font-size: 80%
}
.people {font-family: "Lucida Calligraphy", serif}
:link { color: rgb(0, 0, 153) }
:visited { color: rgb(153, 0, 153) }
:active { color: rgb(255, 0, 102) }
:hover { color: rgb(0, 0, 255) }
</style>
<style type="text/css">
p.c1 {font-style: italic}
</style>
</head>
<body bgcolor="#FFFFFF" background="grid.gif" text="black" link=
"navy" vlink="black" alink="red">
<h1>HTML TIDY - Release Notes</h1>
<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a
href="mailto:dsr@w3.org">dsr@w3.org</a></p>
<h4>Public Email List for Tidy: <<a href=
"mailto:html-tidy@w3.org">html-tidy@w3.org</a>></h4>
<p>I have set up an archived mailing list devoted to Tidy. To
subscribe send an email to html-tidy-request@w3.org with the word
subscribe in the subject line (include the word unsubscribe if
you want to unsubscribe). The <a href=
"http://lists.w3.org/Archives/Public/html-tidy/">archive</a> for
this list is accessible online. Please use this list to report
errors or enhancement requests.</p>
<h2>Things awaiting further attention</h2>
<ul>
<li>Support for BIG5 and ShiftJIS (Rick Jelliffe)</li>
<li>Check doctype FPI for upper case DTD, EN etc.</li>
<li>Stronger checking on which attributes appear on what
elements</li>
<li>Sorting attributes in a canonical order</li>
<li>Version checking for HTML 4.01 vs 4.0</li>
<li>Léa Gris reports that Tidy doesn't know that map
isn't allowed as a direct child of body in HTML strict.</li>
<li>Converting <font face="Symbol">a</font>
etc. to the corresponding Unicode characters, when cleaning
HTML.</li>
</ul>
<p>I need to set up an index of precisely what attributes are
supported on each element. Right now, some elements check their
own attributes, whilst others are checked via default checks
defined for each attribute independently of the element. Until
this is done, you sometimes find that validation services
discovering errors unnoticed by Tidy itself.</p>
<p>Jelks Cabaniss asks: <i>Could Tidy be made to automatically
"clean" (FONTs to CSS) if the Strict DOCTYPE is
requested? An HTML or XHTML Strict document can't have FONT
tags according to the DTDs</i>. Jelks has a bunch of other good
ideas such as converting the bgcolor attribute over to CSS. I
hope to tackle these in the next release.</p>
<p>John Russel would like to see stronger checks on quote marks
for attribute values, his example:</p>
<pre><a href=m1776bat.htm">List of Battles</a></pre>
<p>suggests a heuristic whereby " followed by > or whitespace
produces a warning when found in an attribute without an initial
quotemark. Another idea would be to generate an error for this
provided the appropriate option has been set.</p>
<p>Adding an option to select slide transition effects. I would also
like to provide an optional feature for sorting attribute values.</p>
<p>I am having problems with form elements as direct children of tr or
table. It is dangerous to create an implicit table cell, and what
is needed is a way to move the form element into the next cell. If this
can't be done an error needs to be raised since Tidy will be stuck. On
a separate note, Tidy is still breaking lines between <img> and
</a> which in Netscape shows as an underlined space. It's fine
in IE.</p>
<p>Rick Parsons would like there to be a new wrap-attributes option
that can be used to suppress line wrapping within attributes. There is
already a similar option for JavaScript literals.</p>
<p>Armando Asantos would like to use Tidy to produce a list of URLs
for images or hypertext links according to a config option. This would
be straightforward, but is a lower priority than bug fixes etc.</p>
</blockquote>
<p>Tidy needs to check for text as direct child of blockquote etc.
which isn't allowed in HTML 4 strict. This could be implemented
as a special check which or's in transitional into the version vector
when appropriate.</p>
<p>Berend de Boer suggests that if enclose-text is set to yes, then it
should apply to div as well as to body. In fact shouldn't this be
sorted for any block element that allows mixed content for HTML
transitional but not HTML strict?</p>
<p>Omri Traub would like an option to wrap the contents of style and
script elements in CDATA marked sections when converting to XHTML. He is
also interested in direct support for 16 bit character file I/O.</p>
<p>A number of people were interested in Tidied documents be marked
as such using a meta element. Tidy will now add the following to the
head if not already present:</p>
<pre><meta name="generator" content="HTML Tidy, see www.w3.org"></pre>
<p>If you don't want this added, set the option tidy-mark to no.</p>
<h2>January 2000</h2>
<p>I have added a new function ApparentVersion() which takes the
doctype into account as well as other clues. This is now used to
report the apparent version of the html in use.</p>
<p>Thanks to the encouragement of Denis Barbier, I finally got around
to deal with the extra bracketing needed to quiet gcc -Wall. This
involved the initialization of the tag, attribute and entity tables,
and miscellaneous side-effecting while and for loops.</p>
<p>PPrintXMLTree has been updated so that it only inserts line breaks
after start tags and before end tags for elements without mixed
content. This brings Tidy into line with current wisdom for XML
editors. My thanks to Eric Thorbjornsen for suggesting a fix to FindTag
that ensures that Tidy doesn't mistreat elements looking like html.</p>
<p><table border> is now converted to <table border="1">
when converting to XHTML.</p>
<p>I have added support for CDATA marked sections which are passed through
without change, e.g.</p>
<pre><![CDATA[ .. markup here has no effect .. ]]></pre>
<p>In the January 12th release, ParseXMLElement screwed up on doctypes
and toplevel comments, causing a memory exception. This has now been fixed.
PPrintXMLTree now uses zero indent for comments to avoid progressive
indention as an XML document is repeatedly tidied. I have added a blank
line after elements unless they are the last in the parent's content.</p>
<h2>December 1999</h2>
<p>Tidy now generates the XHTML namespace and system identifier as
specified by the current <a href="http://www.w3.org/TR/xhtml1/"XHTML
Proposed Recommendation</a>. In addition it now assumes the latest
version of HTML4 - HTML 4.01. This fixes an omission in 4.0 by
adding the name attribute to the img and form elements. This means
that documents with rollovers and smart forms will now validate!</p>
<p>James Pickering noticed that Tidy was missing off the xhtml- prefix
for the XHTML DTD file names in the system identifier on the doctype.
This was a recent change to XHTML. I have fixed lexer.c to deal with
this.</p>
<p>This release adds suport for <a href=
"http://developer.netscape.com/viewsource/schroder_template/schroder_template.html">
JSTE</a> psuedo elements looking like: <# #>. Note that
Tidy can't distinguish between ASP and JSTE for psuedo elements
looking like: <% %>. Line wrapping of this syntax is
inhibited by setting either the wrap-asp or wrap-jste options to no.</p>
<p>Thanks to Jacek Niedziela, The Win32 executable for tidy
is now able to example wild cards in filenames. This utilizes
the setargv library supplied with VC++.</p>
<p>Jonathan Adair asked for the hashtables to be cleared when emptied
to avoid problems when running Tidy a second time, when Tidy is
embedded in other code. I have applied this to FreeEntities(),
FreeAttrTable(), FreeConfig(), and FreeTags().</p>
<p>Ian Davey spotted that Tidy wasn't deleting inline emphasis elements
when these only contained whitespace (other than non-breaking spaces).
This was due to an oversight in the CanPrune() function, now fixed.</p>
<p>Michel Lemay spotted some bugs in if statements and provided
some sample html files that caused Tidy to crash. On further study,
I found a bug in the code that moves font elements
inside anchors. I have fixed this and added a new method to test the tree
for internal consistency in its bidirectional links: CheckNodeIntegrity().</p>
<p>I have also refined the code for handling noframes to make it more
robust. It will now handle noframes within a body within a noframes etc.
(something permitted by HTML4). It will also recover if the noframes
end tag is missing or is in the wrong place.</p>
<p>I have fleshed out the table for mapping characters in the Windows
Western character set into Unicode, see Win2Unicode[]. Yahoo was, for
example, using the Windows Western character for bullet, which is in
Unicode is U+2022.</p>
<p>David Halliday noticed that applets without any content between
the start and end tags were being pruned by Tidy. This is a bug and
has now been fixed.</p>
<p>I have changed the way Tidy handles empty paragraphs when the
drop-empty-paras is set to no. HTML4 doesn't allow empty paragraphs
so I am now replacing them by a pair of br elements, so that the
formatting is preserved. When drop-empty-paras is set to yes, empty
paragraphs are simply removed.</p>
<p>Darren Forcier asked for a way to suppress fixing up of comments
when these include adjacent hyphens since this was screwing up Cold
Fusion's special comment syntax. The new option is called:
<i>fix-bad-comments</i> and defaults to yes.</p>
<p>Using Michel's examples I have improved the way the table parser
deals with unexpected content. This is now consistently moved before
the table, or to the head element as appropriate. Microsoft and Netscape
differ in how an unclosed blockquote renders when found at the table
or tr level. Netscape indents the table but Microsoft does not. This
is getting too tricky for me to deal with!</p>
<p>Using a sample page from Yahoo, I discovered that Netscape Navigator
doesn't implement the text-align style property on tr or table elements.
As a result I have added a special check for this in BlockStyle() to
avoid translating the align attribute on tr or table into a style rule.</p>
<p>Richard Allsebrook would like to be able to map b/i to strong/em
without the full clean process being invoked. I have therefore decoupled
these two options. Note that setting logical-emphasis is also decoupled
from drop-font-tags.</p>
<h2>30th November 1999</h2>
<p>This is an interim release to provide a bug fix for a bug
introduced earlier in the month. I have fixed a bug in the
emphasis code which looks for start tags Which are most likely
intended as end tags. This bug only appeared in the November
release and could cause a crash or indefinite looping. My thanks
to a respondent calling himself "Michael" who provided a
collection of files that allowed me to track this down.</p>
<p>I have also added page transition effects for the slide
maker feature. The effects are currently only visible on IE4
and above, and take advantage of the meta element. I will provide
an option to select between a range of transition effects in
the next release.</p>
<h2>November 1999</h2>
<p>David Duffy found a case causing Tidy to loop indefinitely.
The problem occurred when a blocklevel element is found within a
list item that isn't enclosed in a ul or ol element. I have
added a check to ParseList to prevent this.</p>
<p>Takuya Asada tells me that in Raw mode Tidy is incorrectly
mapping 0xA0 to the entity   causing problems for Shift_JIS
etc. Now fixed. Larry Virden reported a problem with ParseConfig
when one of the arguments was null. I have added a check for
this.</p>
<p>Thomas McGuigan notes that Tidy issues a warning for noframes
elements without a body element. HTML4 is defined so that the
content of the noframes element is restricted to a single body
element. However, it also allows you to omit the start and end
tags for body, something that isn't allowed for XHTML. I have
changed the code to only issue the warning when generating
XML.</p>
<p>Added new --version or -v option that reports the release date
to the error stream. ParseConfig() now returns false if it
doesn't use the parameter. This avoids the next argument on
the command line from being swallowed inadvertently, e.g. for
unknown options. Tidy now warns about unrecognized options.</p>
<p>I have revised the way Tidy deals with comments to avoid
problems with repeated hyphens. First "--" is illegal
in XML, and second, the comment syntax for SGML is very error
prone when it comes to when and where you can use hyphens. As a
result, Tidy will now replace repeated hyphens with "="
characters. My thanks to Yudong Yang and Randy Waki for their
input on this.</p>
<p>Emphasis start tags will now be coerced to end tags when the
corresponding element is already open. For instance
<u>...<u>. This behavior doesn't apply to font
tags or start tags with attributes. My thanks to Luis M. Cruz for
suggesting this idea.</p>
<p>Jonathan Adair would like Tidy to warn when the same attribute
appears more than once in the same element. This is an error for
both SGML and XML. The best way to make this check would be to
sort the attributes and look for duplicate entries. Other people
have asked for the attributes to be sorted, but I need further
input on the appropriate sort order. As an interim solution, Tidy
uses a simple test which generates n+1 warnings if an attribute
is repeated n times.</p>
<h2>October 1999</h2>
<p>On Unix systems you can get Tidy to look for a config file in
~/.tidyrc or ~your/.tidyrc etc. when the HTML_TIDY environment
variable isn't set. To enable this feature don't forget
to uncomment SUPPORT_GETPWNAM in the platform.h file. This
feature won't work on Windows. My thanks to Todd Lewis who
contributed the code.</p>
<p>Darren Forcier reports that Cold Fusion uses the following
syntax:</p>
<pre>
<CFIF True IS True>
This should always be output
<CFELSE>
This will never output
</CFIF>
</pre>
<p>After declaring the CFIF tag in the config file, Tidy was
screwing up the Cold Fusion expression syntax, mapping
'True' to 'True=""' etc. My fix was to
leave such psuedo attributes untouched if they occur on user
defined elements.</p>
<p>Jelks Cabaniss noticed that Tidy wasn't adding an id
attribute to the map element when converting to XHTML. I have
added routines to do this for both 'a' and 'map'.
The value of the id attribute is taken from the name
attribute.</p>
<p>Larry Cousin noted that Tidy is now screwing up on option
elements. This proved to be a recently introduced error, which I
have now fixed. Peter Ruevski forwarded an example that caused
Tidy to loop endlessly. The problem was caused by an ol start tag
followed by a b start tag and then an li element. I have solved
the problem with a fix to ParseBlock.</p>
<p>I have revised the way Tidy deals with unexpected content in
lists. Tidy now wraps such content in list items with the style
attribute set to "list-style: none" to suppress list
bullets. If an li element is found unexpectedly in the body or
block-level content, it is wrapped into a ul element with the
style attribute set to "margin-left: -2em". This
provides a closer match to the observed rendering on current
browsers. I use a couple of postprocessing steps (List2BQ and
BQ2Div) to further clean this up to use div elements. My thanks
to Thomas Ribbrock for sending me a challenging example that led
me to this solution.</p>
<p>A number of people have asked for a config option to set the
alt attribute for images when missing. The alt-text property can
now be used for this purpose. Please note that YOU are
responsible for making your documents accessible to people who
can't view the images!</p>
<p>Terry Teague spotted a bug in ParseConfigFile() that prevented
Tidy from parsing more that one file. This has been fixed by
setting the char buffer to zero in the call to InitConfig()
before parsing. Terry also noted a few places where I had slipped
back into using malloc and free rather than MemAlloc and MemFree,
now fixed.</p>
<p>Bjoern Hoehrmann notes that the September 27th release mapped
empty paragraphs to br elements, which introduces extra
whitespace in IE and Navigator. The former behavior to strip
empty paragraphs is as per HTML4 and works fine on most browsers
with the exception of Lynx. I have reverted to stripping empty
P's, but have added an option to leave them alone.</p>
<p>Bjoern also drew my attention to a bug in the September
release where table content is lacking a preceding td or th start
tag. Tidy moves such content to before the table element to match
the observed rendering. This is now working as planned. I have
tweaked the printing behavior when the omit end tags option is
set. It now omits the </html> as well as the optional start
tags for html, head and body.</p>
<p>Pao-Hsi Huang had problems with the contents of the option
element being discarded. I was unable to reproduce this problem,
but did notice that I unintentionally preserving newlines within
option text. This is now fixed. Shane Harrelson spotted that
table cells containing a single font element, when cleaned
dropped the font element without getting the corresponding style.
Now fixed via a tweak to InlineStyle().</p>
<p>Andre Hinrichs wanted Tidy to do a better job on font elements
with relative size changes. This is in fact rather tricky.
Currently, Tidy uses percentage scaling values for fonts rather
than the enumeration defined by CSS [xx-small | x-small | small |
medium | large | x-large | xx-large]. The first problem is to
match these 7 values onto the 6 define by the font element. The
next problem is caused by the fact that CSS doesn't provide
matching relative font size values that you could match to the
ones defined for the font element. I have done my best using
percentage values, base on tests with IE and Navigator. If anyone
can come up with a better approach, please let me know.</p>
<p>Tom Berger reported a problem when quote-marks was set to yes.
Using his test file everything is now working fine. Several
people asked for a way to turn off line wrapping. Tidy will now
interpret zero as meaning disable wrapping. Johannes Zellner
wants to include some tcl code in his XML markup and asks for a
way define new tags that behave in the same way as HTML's pre
element. The new option is new-pre-tags.</p>
<h2>September 1999</h2>
<p>Tidy will now add a type attribute to the style and script
attributes when this is missing. Tidy examines the language
attribute to determine what media type to use. I have also added
code to create an id attribute for anchors when a name attribute
is present, and to report a warning if id and name don't
match.</p>
<p>Added support for cleaning up HTML generated by Microsoft Word
2000 when you save as "Web Page". When you set
"word-2000: yes" Tidy makes a Herculean effort to clean
up the mess created when Word 2000 exports to HTML. Word bulks
out HTML with presentation information that allows it to
round-trip documents between HTML and Word without lost of
information. This makes the HTML hard to edit and can cause some
very popular browsers to crash! I haven't dealt with the VML
markup Word uses for line drawings.</p>
<p>Applied fix to InsertNodeAfterElement() to set
node->next->prev. My thanks to "Advocate" for
this. This was only encountered when dealing with PRE tags
containing content illegal for PRE. (Called twice by ParsePre to
move illegal PRE content to be a later sibling of PRE, then open
PRE again afterward)</p>
<p>Change to table row parser so that when Tidy comes across an
empty row, it inserts an empty cell rather than deleting it. This
is consistent with browser behavior and avoids problems with
cells that span rows.</p>
<p>Baruch Even sent extensive patches for improved support for
the PHP preprocessing psuedo tags. You can now use the
'wrap-php: no' to suppress line wrapping within PHP
instructions. In the process of this work, I have created a new
function InsertMisc() for dealing with comments, processing
instructions, ASP and PHP.</p>
<p>I have update the table of tags to include additional
proprietary tags such as server, ilayer, layer, nolayer and
multicol. Using patches sent in by Edward Avis, Tidy now offers a
quiet mode which suppresses the initial welcome message and the
summary report on the number of errors or warnings. Jason
Tribbeck sent in patches to allow config options normally set in
the config file to be set on the command line, by preceding them
with a "--" (no intervening space), for example:</p>
<pre>
tidy --break-before-br true --show-warnings false
</pre>
<p>Kenichi Numata discovered that Tidy looped indefinitely for
examples similar to the following:</p>
<pre>
<font size=+2>Title
<ol>
</font>Text
</ol>
</pre>
<p>I have now cured this problem which used to occur when a
</font> tag was placed at the beginning of a list element.
If the example included a list item before the </ol> Tidy
will now create the following markup:</p>
<pre>
<font size=+2>Title</font>
<blockquote>Text </blockquote>
<ol>
<li>list item</li>
</ol>
</pre>
<p>This uses blockquote to indent the text without the
bullet/number and switches back to the ol list for the first true
list item.</p>
<p>I have worked hard to improve support for server side
preprocessing instructions such as ASP, PHP and Tango. Tidy now
allows you to replace attribute values by such instructions and
is able to fix up the case where the instruction appears without
delimiting quote marks. Tidy supports ASP and PHP in element
content and also in place of attribute value pairs. Support for
Tango is limited to attribute values only.</p>
<p>John Love-Jensen contribute a table for mapping the MacRoman
character set into Unicode. I have added a new charset option
"mac" to support this. Note the translation is one way
and doesn't convert back to the Mac codes on output.</p>
<p>Some people place <p> at the end of their list items to
introduce whitespace before the next item. I have modified
TrimEmptyElement to coerce empty p elements to br elements to
reproduce this rendering. If a p start tag is found in dt
elements, I now coerce the p to a br. Satwinder Mangat has
alerted me to several such problems. First, text as a direct
child of dl should be wrapped in a dt and not a dd element.
Second, unlike other inline tags, browser only close anchors on a
anchor start or end tag. Actually Navigator and IE differ in how
they handle this. Try the following example:</p>
<pre>
<p><b><a href=foo>some text</i> which should be in the label</a></p>
<p>next para and guess what the emphasis will be?</p>
</pre>
<p>Navigator 4 renders the second paragraph in normal text while
IE renders it in bold. If you substitute <a> for the
</i>, once again the browsers differ. IE stops underlining
at the <a> text while Navigator continues until the
</a>, although it realizes that you can't click
there.</p>
<p>Satwinder continues: browsers happily interpret center within
a heading. Tidy now moves the center element to be the parent of
the rest of the heading, splitting it as needed, rather than
prematurely ending the heading. The same applies to a div element
within a heading. Satwinder notes that Tidy inserts a ul when an
li is encountered as a direct child of body.</p>
<p>This is a case where you can't produce a legal HTML file
that renders the same way as browsers handle this. The same
applies to a dt or dd element without an enclosing dl element. I
can report that W3C's HTML working group was unwilling to
bless naked li's etc. A similar problem arises for dt
elements when they contain hr, center or div. The specs say this
is illegal, but browsers render it fine!</p>
<p>I have done my best for hr, splitting the dt as needed and
enclosing the hr within a dd. The hr doesn't look the same,
sadly, as it now starts at the left margin for the dd'st
rather than the left margin for dt's. I wasn't sure how
to deal with center and div within dt, and chose to discard
them.</p>
<p></br> is now mapped to <br> to match observed
browser rendering. On the same basis, an unmatched </p> is
mapped to <br><br>. This should improve fidelity of
tidied files to the original rendering, subject to the
limitations in the HTML standards described above.</p>
<p>Vlad Harchev spotted that Tidy was swallowing the first and
last spaces within inline elements when in a pre element. Now
fixed. Zac Thompson spotted that Tidy didn't know that the
tags s, strike and u weren't allowed in HTML4 strict. I have
now fixed this.</p>
<p>Tidy now preserves the last modified time for the files it
writes back to. This was introduced on the suggestion of
René Fritz, who uses the SiteCopy utility to upload recently
modified files to his Web server. By preserving file timestamps
Tidy can be used on all files in a directory without impacting
which ones will be uploaded, the next time SiteCopy runs. This is
implemented using the fstat and futime system calls. If your
platform doesn't support these calls, set PRESERVEFILETIMES
to 0 in platform.h</p>
<p>I have fixed a bug on lexer.c which screwed up the removal of
doctype elements. This bug was associated with the symptom of
printing an indefinite number of doctype elements.</p>
<h2>August 1999</h2>
<p>Added lowsrc and bgproperties attributes to attribute table.
Rob Clark tells me that bgproperties="fixed" on the
body elements causes NS and IE to fix the background relative to
the window rather that the document's content.</p>
<p>Terry Teague kindly drew my attention to several bugs
discovered by other people: My thanks to Randy Waki for
discovering a bug when an unexpected inline end-tag is found in a
ul or ol element. I have added new code to ParseList in parser.c
to pop the inline stack and discard the end tag. I am checking to
see whether a similar problem occurs elsewhere. Randy also
discovered a bug (now fixed) in TrimInitialSpace() in parser.c
which caused it to fail when the element was the first in the
content. John Cumming found that comments cause problems in table
row group elements such as tbody. I have fixed this oversight in
this release.</p>
<p>Bjoern Hoehrmann tells me that bgsound is only allowed in the
head and not in the body, according to the Microsoft
documentation. I have therefore updated the entry in tags.c. The
slide generation feature caused an exception when the original
document didn't include a document type declaration. The fix
involve setting the link to the parent node when creating the
doctype node.</p>
<h2>26th July 1999</h2>
<p>Jussi Vestman reported a bug in FixDocType in lexer.c which
caused tidy to corrupt the parse tree, leading to an infinite
loop. I independently spotted this and fixed it. Justin
Farnsworth spotted that Tidy wasn't handling XML processing
instructions which end in ?> rather than just > as
specified by SGML. I have added a new option:
assume-xml-procins: yes which when set to yes expects the
XML style of processing instruction. It defaults to no, but is
automatically set to yes for XML input. Justin notes that the XML
PIs are used for a server preprocessor format called PHP, which
will now be easy to handle with Tidy. Richard Allsebrook's
mail prompted me to make sure that the contents of processing
instructions are treated as CDATA so that < and > etc. are
passed through unescaped.</p>
<p>Bill Sowers asks for Tidy to support another server
preprocessor format called Tango which features syntaxt such
as:</p>
<pre>
<b><@include <@cgi><appfilepath>includes/message.html></b>
</pre>
<p>I don't have time to add support for Tango in this
release, but would be happy if someone else were to mailin
appropriate changes. Darrell Bircsak reports problems when using
DOS on Win98. I am using Win95 and have been unable to reproduce
the problem. Jelks Cabaniss notes that Tidy doesn't support
XML document type subset declarations. This is a documented
shortcoming and needs to be fixed in the not too distant future.
Tidy focusses on HTML, so this hasn't been a priority
todate.</p>
<p>Jussi Vestman asks for an optional feature for mapping IP
addresses to DNS hostnames and back again in URLs. Sadly, I
don't expect to be able to do this for quite a while. Adding
network support to Tidy would also allow it to check for bad
URLs.</p>
<p>Ryan Youck reports that Tidy's behavior when finding a ul
element when it expects an li start tag doesn't match
Netscape or IE. I have confirmed this and have changed the code
for parsing lists to append misplaced lists to the end of the
previous list item. If a new list is found in place of the first
list item, I now place it into a blockquote and move it before
the start of the current list, so as to preserve the intended
rendering.</p>
<p>I have added a new option - enclose-text which encloses any
text it finds at the body level within p elements. This is very
useful for curing problems with the margins when applying style
sheets.</p>
<h2>9th July 1999</h2>
<p>Added bgsound to tags.c. Added '_' to definition of
namechars to match html4.decl. My thanks to Craig Horman for
spotting this.</p>
<p>Jelks Cabaniss asked for the clean option to be automatically
set when the drop-font-tags option is set. Jelks also notes that
a lot of the authoring tools automatically generate, for example,
<I> and <B> in place of <em> and <strong>
(MS FrontPage 98 generated the latter, but FP2000 has reverted to
the former - with no option to change or set it). Jelks suggested
adding a general tag substitution mechanism. As a simpler measure
for now, I have added a new property called logical-emphasis to
the config file for replacing i by em and b by strong.</p>
<h2>7th July 1999</h2>
<p>Fixed recent bug with escaping ampersands and plugged memory
leaks following Terry Teagues suggestions. Changed
IsValidAttrName() in lexer.c to test for namechars to allow - and
: in names.</p>
<h2>2nd July 1999</h2>
<p>Chami noticed that the definition for the marquee tag was
wrong. I have fixed the entry in tags.c and Tidy now works fine
on the example he sent. To support mixing MathML with HTML I have
added a new config option for declaring empty inline tags
"new-empty-tags". Philip Riebold noted that single
quote marks were being silently dropped unless quote marks was
set to yes. This is an unfortunate bug recently introduced and
now fixed.</p>
<p>Paul Smith sent in an example of badly formed tables, where
paragraph elements occurred in table rows without enclosing table
cells. Tidy was handling this by inserting a table cell. After
comparison with Netscape and IE, I have revised the code for
parsing table rows to move unexpected content to just before the
table.</p>
<h2>26th June 1999</h2>
<p>Tony Leneis reports that Tidy incorrectly thinks the table
frame attribute is a transitional feature. Now fixed. Chami
reported a bug in ParseIndent in config.c and that onsumbit is
missing from the table of attributes. Both now fixed. Carsten
Allefeld reports that Tidy doesn't know that the valign
attribute was introduced in HTML 3.2 and is ok in HTML 4.0
strict, necessitating a trivial change to attrs.c.</p>
<p>Axel Kielhorn notes that Tidy wasn't checking the preamble
for the DOCTYPE tag matches either "html PUBLIC" or
"html SYSTEM". Bill Homer spotted changes needed for
Tidy to compile with SGI MIPSpro C++. All of Bill's changes
have been incorporated, except for the include file
"unistd.h" (for the unlink call) which isn't
available on win32. To include this define NEEDS_UNISTD_H</p>
<p>Bjoern Hoehrmann asked for information on how to use the
result returned by Tidy when it exits. I have included a example
using Perl that Bjoern sent in. Bodo Eing reported that Tidy gave
misleading warning when title text is emphasized. It now reports
a missing </title> before any unexpected markup.</p>
<p>Bruce Aron says that many WYSIWYG HTML editors place a font
element around an hypertext link enclosing the anchor element
rather that its contents. Unfortunately, the anchor element then
overrides the color change specified by the font element! I have
added an extra rule to ParseInline to move the font element
inside an anchor when the anchor is the only child of the font
element. Note CSS is a better long term solution, and Tidy can be
used to replace font elements by style rules using the clean
option.</p>
<p>Carsten Allefeld reported that valign on table cells caused
Tidy to mislabel content as HTML 4.0 transitional rather than
strict. Now fixed. A number of people said they expected the
quote-mark option to apply to all text and not just to attribute
values. I have obliged and changed the option accordingly.</p>
<p>Some people have wondered why "</" causes an
error when present within scripts. The reason is that this
substring is not permitted by the SGML and XML standards. Tidy
now fixes this by inserting a backslash, changing the substring
to "<\/". Note this is only done for JavaScript and
not for other scripting languages.</p>
<p>Chami reported that onsubmit wasn't recognized by Tidy -
now fixed. Chris Nappin drew my attention to the fact that script
string literals in attributes weren't being wrapped correctly
when QuoteMarks was set to no. Now fixed. Christian Zuckschwerdt
asked for support for the POSIX long options format e.g. --help.
I have modified tidy.c to support this for all the long options.
I have kept support for -help and -clean etc.</p>
<p>Craig Horman sent in a routine for checking attribute names
don't contain invalid characters, such as commas. I have used
this to avoid spurious attribute/value pairs when a quotemark is
misplaced. Darren Forcier is interested in wrapping Tidy up as a
Win32 DLL. Darren asked for Tidy to release its memory resources
for the various tables on exit. Now done, see DeInitTidy() in
tidy.c</p>
<p>Darren also asks about the config file mechanism for declaring
additional tags, e.g. <b>new-blocklevel-tags: cfoutput,
cfquery</b> for use with Cold Fusion. You can add inline and
blocklevel elements but as yet you can't add empty elements
(similar to br or hr) or to change the content model for the
table, ul, ol and dl elements. Note that the indent option
applies to new elements in the same way as it does for built-in
elements. Tidy will accept the following:</p>
<pre>
<cfquery name="MyQuery" datasource="Customer">
select CustomerName from foo where x > 1
</cfquery>
<cfoutput query="MyQuery">
<table>
<tr>
<td>#CustomerName#</TD>
</tr>
</table>
</cfoutput>
</pre>
<p>but the next example <b>won't</b> since you can't as
yet modify the content model for the table element:</p>
<pre>
<cfquery name="MyQuery" datasource="Customer">
select CustomerName from foo where x > 1
</cfquery>
<table>
<cfoutput query="MyQuery">
<tr>
<td>#CustomerName#</TD>
</tr>
</cfoutput>
</table>
</pre>
<p>I have been studying richer ways to support modular extensions
to html using assertions and a generalization of regular
expressions to trees. This work has led a tool for generating
DTDs named <b>dtdgen</b> and I am in the process of creating a
further tool for verification. More information is available in
my note on <a href=
"http://www.w3.org/People/Raggett/dtdgen/Docs">Assertion
Grammars</a>. Please contact me if you are interested in helping
with this work.</p>
<p>David Fallon is interested in using Tidy to dynamically repair
markup in an HTML editor as people type. My recommendation is to
take advantage of the tables in tags.c and attrs.c for this, and
to defer to application of the full range of heuristics to such a
time as saving to disk or when explicitly requested. The CM_OPT
property in the tags table indicates that the end tag is
optional, while CM_EMPTY indicates that an element is <i>
empty</i>, i.e. has no content.</p>
<p>Betsy Miller reports: <i>I tried printing the HTML Tidy page
for a class I am teaching tomorrow on HTML, and everything in the
"green" style (all of the examples) print in the
smallest font I have ever seen (in fact they look like tiny
little horizontal lines). Any explanation?</i>.</p>
<p>Yes. This is a problem with Internet Explorer and Style
Sheets. The Tidy page includes a CSS style sheet that tries to
make the size of the font used for the examples 80% smaller than
for normal text. Internet Explorer gets this wrong, picking a
very much smaller font. I am hoping this bug is fixed in the IE
5.0 release. I have changed the style sheet to work around
this.</p>
<p>Francisco Guardiola writes that Tidy wasn't fixing
frameset documents with body elements unenclosed in noframes
elements. Now fixed. Frederik Fouvry found that comments after
the html end tag generated a warning for content after body. I
can't reproduce this symptom and assume it was fixed in an
earlier release.</p>
<p>Indrek Toom wants to know how to format tables so that tr
elements indent their content, but td tags do not. The solution
is to use <i>indent: auto</i>. Jelks Cabaniss noted that the
clean option created style rules with tag names in uppercase,
which would cause problems for Extensible HTML (xhtml). This
prompted me to overhaul Tidy to switch to lower case for that tag
tables and literals. I have adopted Jelks' suggestion for
adding support for a doctype property in config files. This
supports <em>omit, auto, strict, loose</em> or a string
specifying the fpi (formal public identifier).</p>
<p>Johannes Koch notes that Tidy doesn't fix up the doctype
correctly when bursting to slides. He says that if a document
contains the HTML 4.0 strict DT declaration, then the slides also
include the same strict DT declaration, but also contain the
center tag which does not appear in the strict DTD. I have
applied a simple work around, which is to remove the original
doctype when bursting to slides.</p>
<p>I have extended the support for the ASP preprocessing syntax
to cope with the use of ASP within tags for attributes. I have
also added a new option <tt>wrap-asp</tt> to the config file
support to allow you to turn off wrapping within ASP code. Thanks
to Ken Cox for this idea.</p>
<p>Larry Virden asked for a compile-time option for setting the
config file, he says "The reason it would be useful is to be
able to define a set of commonly used additional tags. For
instance, our site is starting to use a lot of ColdFusion. I
would love to be able to put the CF tags into a site wide file so
that users of tidy automatically get them defined". You can
now do this by defining CONFIG_FILE in platform.h</p>
<p>Loïc Trégan asks: Is there a way to generate a
"light" xml, with no "<!DOCTYPE...>"
and "xlmns=..."? I have tweaked the code to allow the
doctype property to apply when outputting XML, and added a new
property "add-xml-pi" to control whether an
<?xml?> processing instruction is added or not. To generate
a minimal XML document, you can set the xml-out property to yes,
the doctype and add-xml-pi property to no.</p>
<p>Marc Jauvin has been using Windows Application to generate Web
pages and found that some of them generate very
"non-portable" HTML. One of the problems that is often
introduced is the use of "\" in URLs instead of
"/" which confuses Unix Web servers. To deal with this
I have introduced the "fix-backslash" property. This
has been set by default to yes, but can be set to no if that
causes problems.</p>
<p>The new property <tt>indent-attributes</tt> when set to yes
places each attribute on a new line. Note that the attributes are
only indented one space. Paul Ossenbruggen asked for something
slightly different, where the second and subsequent attributes
start on a new line and are indented to line up under the first
attribute. That proved to involve rather more work to implement
than I have time for right now. I plan to work some more on this
for a future release.</p>
<p>Peter Jeremy reported that when an error file is specified to
tidy (-f file), the error file is opened for every HTML file
specified on the command line, but not closed until all HTML
files have been processed. If a large number of files are
specified on the command line (e.g. processing the FreeBSD
handbook), this can overflow the process or system file
descriptor table. I have now fixed this so that the error file is
only opened once.</p>
<p>Rafi Stern notes: I have entered output-xml: yes in my config
file, not output-xhtml. Tidy second guesses me and adds the xmlns
attribute for XHTML at the head of my file, which I then have to
remove as this interferes with my XSLT parser. Fixed along with
the other bugs reported by Rafi.</p>
<p>Steffen Ullrich and Andy Quick both spotted a problem with
attribute values consisting of an empty string, e.g. <tt>
alt=""</tt>. This was caused by bugs in tidy.c and in
lexer.c, both now fixed. Jussi Vestman noted Tidy had problems
with hr elements within headings. This appears to be an old bug
that came back to life! Now fixed. Jussi also asked for a config
file option for fixing URLs where non-conforming tools have used
backslash instead of forward slash.</p>
<p>An example from Thomas Wolff allowed me to the idea of
inserting the appropriate container elements for naked list items
when these appear in block level elements. At the same time I
have fixed a bug in the table code to infer implicit table rows
for text occurring within row group elements such as thead and
tbody. An example sent in by Steve Lee allowed me to pin point an
endless loop when a head or body element is unexpectedly found in
a table cell.</p>
<h2>15th April 1999</h2>
<p>Another minor release. Jacob Sparre Andersen reports a bug
with &quot; in attribute values. Now fixed. Francisco
Guardiola reports problems when a body element follows the
frameset end tag. I have fixed this with a patch to ParseHTML,
ParseNoFrames and ParseFrameset in parser.c Chris Nappin wrote in
with the suggestion for a config file option for enabling
wrapping script attributes within embedded string literals. You
can now do this using
"wrap-script-strings: yes".</p>
<h2>14th April 1999</h2>
<p>Added check for Asp tags on line 2674 in parser.c so that Asp
tags are not forcibly moved inside an HTML element. My thanks to
Stuart Updegrave for this. Fixed problem with & entities.
Bede McCall spotted that &amp; was being written out as
&amp;amp;. The fix alters ParseEntity() in lexer.c</p>
<h2>12th April 1999</h2>
<p>Added a missing "else" on line 241 in config.c
(thanks for Keith Blakemore-Noble for spotting this). Added
config.c and .o to the Makefile (an oversight in the release on
the 8th April).</p>
<h2>8th April 1999</h2>
<h4>Localization:</h4>
<p>All the message text is now defined in localize.c which should
make it a tad easier to localize Tidy for different
languages.</p>
<h4>Config file support:</h4>
<p>I have added support for configuring tidy via a configuration
file. The new code is in config.h which provides a table driven
parser for RFC822 style headers. The new command line option
-config <filename> can be used to identify the config file.
The environment variable "HTML_TIDY" may be used to
name the config file. If defined, it is parsed before scanning
the command line. You are advised to use an absolute path for the
variable to avoid problems when running tidy in different
directories.</p>
<h4>Allan Kuchinsky:</h4>
<p>Reports that the XML DOM parser by Eduard Derksen screws up on
 , naked & and % in URLs as well as having problems with
newlines after the '=' before attribute values.</p>
<p>I have tweaked PrintChar when generating XML to output  
in place of &nbsp; and &amp; in place of &. In
general XHTML when parsed as well-formed XML shouldn't use
named entities other than those defined in XML 1.0. Note that
this isn't a problem if the parser uses the XHTML DTDs which
import the entity definitions.</p>
<h4>Allan Odgaard:</h4>
<p>When tidy encounter entities without a terminating semi-colon
(e.g. "©") then it correctly outputs
"©", but it doesn't report an error.</p>
<p>I have added a ReportEntityError procedure to localize.c and
updated ParseEntity to call this for missing semicolons and
unknown entities.</p>
<h4>Andreas Buchholz:</h4>
<p>Tidy warns if table element is missing. This is incorrect for
HTML 3.2 which doesn't define this attribute.</p>
<p>The summary attribute was introduced in HTML 4.0 as an aid for
accessibility. I have modified CheckTABLE to suppress the warning
when the document type explicitly designates the document as
being HTML 2.0 or HTML 3.2.</p>
<h4>Andy Brown:</h4>
<p>I have renamed the field from class to tag_class as
"class" is a reserved word in C++ with the goal of
allowing tidy to be compiled as C++ e.g. when part of a larger
program.</p>
<p>I have switched to Bool and the values yes and no to avoid
problems with detecting which compilers define bool and those
that don't.</p>
<p>Andy would prefer a return code or C++ exception rather than
an exit. I have removed the calls to exit from pprint.c and used
a long jump from FatalError() back to main() followed by
returning 2. It should be easy to adapt this to generate a C++
exception.</p>
<p>Sometimes the prev links are inconsistent with next links. I
have fixed some tree operations which might have caused this. Let
me know if any inconsistencies remain.</p>
<h4>Ann Navarro:</h4>
<p>Would like to be able to use:</p>
<pre>
tidy file.html | more
</pre>
<p>to pause the screen output, and/or full output passing to file
as with</p>
<pre>
tidy file.html > output.txt
</pre>
<p>Tidy writes markup to stdout and errors to stderr.
'More' only works for stdout so that the errors fly by.
My compromise is to write errors to stdout when the markup is
suppressed using the command line option -e or "markup:
no" in the config file.</p>
<h4>html-kit@chamisplace.com</h4>
<p>Writes asking for a single output routine for Tidy. Acting on
his suggestion, I have added a new routine tidy_out() which
should make it easier to embed HTML Tidy in a GUI application
such as HTML-Kit. The new routine is in localize.c. All input
takes place via ReadCharFromStream() in tidy.c, excepting command
line arguments and the new config file mechanism.</p>
<p>Chami also asks for single routines for initializing and
de-initializing Tidy, something that happens often from the GUI
environment of HTML-Kit. I have added InitTidy() and DeInitTidy()
in tidy.c to try to satisfy this need. Chami now supports an
online interface for Tidy at the URL:</p>
<pre>
<a href=
"http://www.chamisplace.com/asp/hk.asp">http://www.chamisplace.com/asp/hk.asp</a>
</pre>
<p>He further asks for Tidy to optionally output a length
parameter whenever possible. This could represent the length of
the element, attribute or code block related to the error. An
online validator could then highlight the starting and ending
columns which may be easier for beginners to understand, rather
than pointing to a single character column. I will investigate
this for a future release.</p>
<h4>Chang Hyun Baek:</h4>
<p>Reports a problem when generating XML using -iso2022. Tidy
inserts ?/p< rather than </p>. I tried Chang's test
file but it worked fine with in all the right places. Please let
me know if this problem persists.</p>
<h4>Christian Ruetgers:</h4>
<p>When using -indent option Tidy emits a newline before which
alters the layout of some tables.</p>
<p>I note that browsers aren't conforming to the SGML spec on
generally ignoring a newline immediately after start tags and
immediately before end tags. Netscape does this for pre elements
but not for other tags! My work around is to avoid additional
newlines for the content of th and td elements, except where
their content starts with a block level element. This kind of
thing is getting really hairy!</p>
<h4>Christian Pantel:</h4>
<p>Would like the servlet tag added to tidy. This looks very
similar to applet and used for preprocessing document content
before delivery. Servelet acts as a container for param elements
and fallback content to be shown if the server doesn't
support servlet. I have added it as a proprietary tag and parse
it in the same way as applet.</p>
<p>Christian also reports that <td><hr/></td>
caused Tidy to discard the <hr/> element. I have fixed the
associated bug in ParseBlock.</p>
<h4>Chuck Baslock:</h4>
<p>Points out that an isolated & is converted to & in
element content and in attribute values. This is in fact correct
and in agreement with the recommendations for HTML 2.0
onwards.</p>
<h4>Craig Horman:</h4>
<p>Reports that Tidy loops indefinitely if a naked LI is found in
a table cell. I have patched ParseBlock to fix this, and now
successfully deal with naked list items appearing in table cells,
clothing them in a ul.</p>
<h4>Craig Johnson:</h4>
<p>Reports that Tidy gets confused by </comment> before the
doctype. This is apparently inserted by some authoring tool or
other. I have patched Tidy to safely recover from the
unrecognized and unexpected end tag without moving the parse
state into the head or body.</p>
<h4>Daniel Vogelheim:</h4>
<p>Asks for Tidy to recognize obsolete elements such as LISTING
and to replace them by more modern equivalents, in this case pre.
I have added code to issue a warning and replace such elements as
xmp, listing, plaintext by pre, and dir and menu by ul. Daniel
also asks for a means to suppressing warnings, i.e. to only
report errors. I have added the boolean "show-warnings"
to the config file support to deal with this and split off
warnings to ReportWarnings().</p>
<h4>Dan Rudman:</h4>
<p>Would love a version of Tidy written in Java. This is a big
job. I am working on a completely new implementation of Tidy,
this time using an object-oriented approach but I don't
expect to have this done until later this year. <b>
DEFERRED</b></p>
<h4>David Brooke:</h4>
<p>Reports that when tidying an XMLfile with characters above 127
Tidy is outputting the numeric entity followed by the character.
I have fixed this by a patch to PPrintChar() for XmlTags.</p>
<h4>David Getchell:</h4>
<p>Reports that Tidy thinks an ol list is HTML 4.0 when you use
the type attribute. I have fixed an error in attrs.c to correct
this feature to first appearing in HTML 3.2.</p>
<h4>Drew Adams:</h4>
<p>Reported problems when using comments to hide the contents of
script elements from ancient browsers. I wasn't able to
reproduce the problem, and guess I fixed it earlier.</p>
<p>Drew also reported a problem which on further investigation is
caused by the very weird syntax for comments in SGML and XML. The
syntax for comments is really error prone:</p>
<pre>
<!--[text excluding --]--[[whitespace]*--[text excluding --]--]*>
</pre>
<p>This means that <!----> is a complete comment but
<!------> is not since the parser is expecting a matching
terminating -- and as it doesn't find the -- it ploughs on
and on treating the rest of the markup as a comment unless it
finds another end comment. I have added a rule of thumb (a
heuristic) for detecting this situation. Basically I count the
number of comment groups without other characters and if the
count is > 2 and a '>' is seen, a warning is
generated.</p>
<p>Drew goes on to comment on the -clean option. This made me
take another look at the relative font sizes I am using for the
absolute font sizes for 0 through 6. I have tweaked them to get a
reasonable match before/after applying -clean as viewed on NS4
and IE4. Font size=3 is taken as the normal body font size and as
such the font element is silently dropped unless it also defines
a color.</p>
<p>I have also added InlineStyle to deal with the cases where an
inline element has as its only child a font element. A further
possibility would be to promote style properties common to all
children of an element to the element. I will have to leave this
for future work.</p>
<p>Drew asks why </ is not allowed in script content. The
answer is that SGML treats </ as delimiting the end of CDATA
element content, so that it ends prematurely before the
</script> end tag. Browsers tend not to follow the SGML
standard in this respect, but Tidy is designed to help you do
so.</p>
<h4>Guus Goos:</h4>
<p>Notes that tidy *.html doesn't work under DOS. This is
because DOS unlike Unix doesn't expand names with wildcards
to the list of matching file names. This is a right nuisance and
one more reason why Linux is gaining popularity. I plan to
provide a work around in a future release of Tidy. Are there any
free drop-in replacements for the DOS shell that fix this
problem?</p>
<h4>Jack Horsfield:</h4>
<p>Like a number of others would like list items and table cells
to be output compactly where possible. I have added a flag to
avoid indentation of content to tags.c that avoids further
indentation when the content is inline, e.g.</p>
<pre>
<ul>
<li>some text</li>
<li>
<p>
a new paragraph
</p>
</li>
</ul>
</pre>
<p>This behavior is enabled via "smart-indent: yes" and
overrides "indent: no". Use "indent-spaces:
5" to set the number of spaces used for each level of
indentation.</p>
<h4>Jeff Young:</h4>
<p>Has a few suggestions that will make Tidy work with XSL.
Thanks, I have incorporated all of them into the new release.</p>
<h4>Jelks Cabaniss:</h4>
<p>Reports that the Tidy thinks the end tag is missing if the
script element has no content. I have patched ParseScript to fix
this. Jelks also asks for a way to ask Tidy to hide the contents
of script and style elements; a way to avoid promoting inline
styles with -clean to style rules as a work around for a bug in
IE for URLs with relative URLs; finally, a way to avoid empty
elements being discarded, especially if they define an ID for
scripting. Very reasonable, but I would prefer leave these to a
future release. (This release is big enough right now!).</p>
<p>One thing I can satisfy right away is a mailing list for Tidy.
html-tidy@w3.org has been created for discussing Tidy and I have
placed the details for subscribing and accessing the Web archive
on the Tidy overview page.</p>
<h4>Johannes Koch:</h4>
<p>Reports that Tidy isn't quite right about when it reports
the doctype as inconsistent or not. I have tweaked HTMLVersion()
to fix this. Let me know if any further problems arise.</p>
<h4>John Tobler:</h4>
<p>Wants to know how to get Tidy to preserve his explicit
entities e.g. " and  . Currently Tidy interprets all
entities as character values and as such has no way to
distinguish whether these were derived from entities or not. To
help John with this release you can use "quote-marks:
yes" in the config file if you want all " marks to
appear as " and "quote-nbsp: yes" if you want
non-breaking spaces to be shown as entities. Note that for XML in
general   is not-predeclared, so you should also use
"numeric-entities: yes". This doesn't apply to
XHTML though.</p>
<p>John also reports that the weirdly complex URLs using the
javascript: scheme as used by www.bookmarklets.com can cause Tidy
indigestion. I have made Tidy aware of which attributes are using
Javascript and disabled the missing quote mark heuristic for
these. I have also tweaked the way unknown entities are reported
to say that the markup have contain unescaped ampersands.</p>
<h4>Mathew Cepl:</h4>
<p>Notes that dir and menu are deprecated and not allowed in
HTML4 strict. I have updated the entry in the tags table for
these two. I also now coerce them automatically to ul when -clean
is set.</p>
<h4>Maurice Buxton:</h4>
<p>Reports that some implementations of gcc don't work with
the current compiler directive Tidy uses to avoid duplicate
typedefs for uint and ulong. I don't have a truly platform
independent solution for this, so you may need to edit platform.h
if the code doesn't compile out of the box on your
platform.</p>
<h4>Osma Ahvenlampi:</h4>
<p>Found that Tidy is confused by map elements in the head. Tidy
knows that map is only allowed in the body and thinks the author
has left out the</p>
start tag. Thereafter elements which it knows only belong in the
head are moved to the head, so things should work out ok. Osma
also reports having difficulties with non-breaking spaces, but I
was unable to reproduce these with the new release of Tidy, so
perhaps the problems have been fixed.
<h4>Paul Ward:</h4>
<p>Reports that Tidy caused Javascript errors when it introduced
linebreaks in Javascript attributes. Tidy goes to some efforts to
avoid this and I am interested in any reports of further problems
with the new release.</p>
<h4>Rafi Stern:</h4>
<p>Would like Tidy to warn when a tag has an extra quote mark, as
in <a href="xxxxxx"">. I have patched
ParseAttribute to do this.</p>
<h4>Rene Fritz:</h4>
<p>Reported a space being inserted at the end of lines when a the
text is wrapped at the start of hypertext links. This isn't
occurring with this release, so I guess the problem was solved a
while back. Rene also suggests that Tidy could be used to add and
remove metadata and attributes etc. for a group of files, e.g. to
add a link to a style sheet or to assert attribution. This sounds
like a good idea for work in the future.</p>
<h4>Shane McCarron:</h4>
<p>Reports that Tidy sometimes wraps text within markup that
occurs in the context of a pre element. I am only able to repeat
this when the markup wraps within start tags, e.g. between
attribute values. This is perfectly legitimate and doesn't
effect rendering.</p>
<h4>Steven Lobo:</h4>
<p>Notes that Tidy doesn't remove entities such as &nbsp;
or &copy; which aren't defined by XML 1.0. That is true -
these entities <b>are</b> fine if you are using XHTML. If you
want to generate generic XML then you need to use the -n option
or to set "numeric-entities: yes" in the config file.
This will then output all such entities in their numeric form or
as direct character values according to the character encoding
flags.</p>
<h4>Steven Pemberton:</h4>
<p>Comments that he would like Tidy to replace naked & in
URLs by &. You can now use "quote-ampersands: yes"
in the config file to ensure this. Note that this is always done
when outputting to XML where naked '&' characters are
illegal.</p>
<p>Steven also asks for a way to allow Tidy to proceed after
finding unknown elements. The issue is how to parse them, e.g. to
treat them as inline or block level elements? The latter would
terminate the current paragraph whereas the former would not.</p>
<p>If treated as inline, presumably, unknown tags should be
treated specially, for instance, normal inline end tags close the
currently open inline element, but this doesn't feel right
for unknown tags. What should the content model for unknown tags
be - flow? Again its far from obvious. One way to avoid these
difficulties would be to provide a means for authors to declare
unknown tags in the config file.</p>
<p>You can now declare new inline and block-level tags in the
config file, e.g.:</p>
<pre>
define-inline-tags: foo, bar
define-blocklevel-tags: blob
</pre>
<p>The content model for new tags allows for block or inline
content. Steven further comments that some authors use ul without
an li to indent content. Tidy currently coerces these to wrap the
content within an li which alters the rendering. He suggests
using blockquote instead. I have done this, and if you use the
-clean option at the same time, it gets replaced by a div element
with a class and style rule for indenting the content.</p>
<h4>Stuart Updegrave:</h4>
<p>Would like to be able to coerce attributes to uppercase. I
have added support for "uppercase-attributes: yes" for
this. Stuart also asks for Tidy to support Microsoft's ASP
tags. These are part of Microsoft's server-side scripting
model (similar to CGI). I have treated ASP tags in the same way
as processing instructions, and they don't effect the version
of HTML as they are assumed to have been interpreted before
delivery to the client.</p>
<p>Stuart is also interested in having Tidy reading from and
writing back to the Windows clipboard. This sounds interesting
but I have to leave this to a future release.</p>
<h4>Terry Cassidy:</h4>
<p>Points out that Tidy doesn't like "top" or
"bottom" for the align attribute on the caption
element. I have added a new routine to check the align attribute
for the caption element and cleaned up the code for checking the
document type.</p>
<h4>Xavier Plantefeve:</h4>
<p>Suggests that I should ensure that the options are self
consistent, e.g. if -asxml is set, then this should imply lower
case and override any instruction to omit optional end tags.
Accordingly, I have introduced a new routine AdjustConfig() that
is applied after reading the command line and config files and
before tidying any files.</p>
<p>Xavier wonders whether name attributes should be replaced or
supplemented by id attributes when translating HTML anchors to
XHTML. This is something I am thinking about for a future release
along with supplementing lang attributes by xml:lang
attributes.</p>
<h4>Zdenek Kabelac:</h4>
<p>Asks for headings and paragraphs to be treated specially when
other tags are indented. I have dealt with this via the new
smart-indent mechanism.</p>
<h2>22nd February 1999</h2>
<p>Tidy can now fix up XML empty tags for which the attribute
values are unquoted, e.g. <br clear=all/>. Care is taken to
avoid this being applied to tags with URLs, e.g. <a
href=http://acme.com/> where the / is part of the attribute
value and doesn't signify an empty tag. Authors are advised
to always quote attribute values to avoid such problems!</p>
<h2>22nd January 1999</h2>
<p>Tidy no longer complains about a missing </tr> before a
<tbody>. Added link to a free <a href=
"http://www.chami.com/free/html-kit/">win32 GUI for tidy</a>.</p>
<h2>11th January 1999</h2>
<p>Added a link to the OS/2 distribution of Tidy made available
by Kaz SHiMZ. No changes to Tidy's source code.</p>
<h2>7th January 1999</h2>
<p>Fixed bug in ParseBlock that resulted in nested table
cells.</p>
<p>Fixed clean.c to add the style property
"text-align:" rather than "align:".</p>
<p>Disabled line wrapping within HTML alt, content and value
attribute values. Wrapping will still occur when output as
XML.</p>
<h2>16th December 1998</h2>
<p>This release fixes a problem with missing quotemarks in
attribute values introduced in the December 14th release. It also
fixes problems with parsing tables when the table cells include
naked list items and when unexpected end tags are encountered for
td and tr cells. Warnings are now generated for unknown entities
(those not defined by HTML 4.0). It may be worth thinking about a
new option to determine how to handle these, especially for
XML.</p>
<h2>14th December 1998</h2>
<p>Rewrote parser for elements with CDATA content to fix problems
with tags in script content.</p>
<p>New pretty printer for XML mode. I have also modified the XML
parser to recognize xml:space attributes appropriately. I have
yet to add support for CDATA marked sections though.</p>
<p>script and noscript are now allowed in inline content.</p>
<p>To make it easier to drive tidy from scripts, it now returns 2
if any errors are found, 1 if any warnings are found, otherwise
it returns 0. Note tidy doesn't generate the cleaned up
markup if it finds errors other than warnings.</p>
<p>Fixed bug causing the column to be reported incorrectly when
there are inline tags early on the same line.</p>
<p>Added -numeric option to force character entities to be
written as numeric rather than as named character entities.
Hexadecimal character entities are never generated since Netscape
4 doesn't support them.</p>
<p>Entities which aren't part of HTML 4.0 are now passed
through unchanged, e.g. &precompiler-entity; This means that
an isolated & will be pass through unchanged since there is
no way to distinguish this from an unknown entity.</p>
<p>Tidy now detects malformed comments, where something other
than whitespace or '--' is found when '>' is
expected at the end of a comment.</p>
<p>The <br> tags are now positioned at the start of a blank
line to make their presence easier to spot.</p>
<p>The -asxml mode now inserts the appropriate Voyager html
namespace on the html element and strips the doctype. The html
namespace will be usable for rigorous validation as soon as W3C
finishes work on formalizing the definition of document profiles,
see: <a href="http://www.w3.org/TR/WD-html-in-xml/">
WD-html-in-xml</a>.</p>
<h2>13th November 1998 and earlier releases</h2>
<p>Fixed bug wherein <style type=text/css> was written
out as <style type="text/ss">.</p>
<p>Tidy now handles wrapping of attributes containing JavaScript
text strings, inserting the line continuation marker as needed,
for instance:</p>
<pre>
onmouseover="window.status='Mission Statement, \
Our goals and why they matter.'; return true"
</pre>
<p>You can now set the wrap margin with the -wrap option.</p>
<p>When the output is XML, tidy now ensures the content starts
with <?xml version="1.0"?>.</p>
<p>The Document type for HTML 2.0 is now "-//IETF//DTD HTML
2.0//". In previous versions of tidy, it was incorrectly set
to "-//W3C//DTD HTML 2.0//".</p>
<p>When using the -clean option isolated FONT elements are now
mapped to SPAN elements. Previously these FONT elements were
simply dropped.</p>
<p>NOFRAMES now works fine with BODY element in frameset
documents.</p>
<h2>Future releases may address:</h2>
<ul>
<li>Recursion through subdirectories, so you can fix up your
entire web site at one go. This assumes I can find a way that is
portable across a wide range of platforms!</li>
<li>Support for W3C's <a href=
"http://www.w3.org/TR/REC-DOM-Level-1/">Document Object Model</a>
(DOM) level one.</li>
<li>Full validation of all attribute values.</li>
<li>Mapping Unicode bidi control characters to HTML tags.</li>
<li>Full support for parsing XML (still somewhat limited).</li>
<li>How to say which XML elements should be printed
"inline".</li>
<li>Acting on the XML encoding attribute, e.g.
<?xml encoding="iso-8859-1"></li>
<li>Improved mapping from HTML presentation attributes/elements
to CSS.</li>
</ul>
</body>
</html>
|