1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<title>Clean up your Web pages with HTML TIDY</title>
<meta name="keywords"
content="HTML, validation, error correction, pretty-printing" />
<meta name="author" content="Dave Raggett <dsr@w3.org>" />
<style type="text/css">
body {
margin-left: 10%;
margin-right: 10%;
font-family: sans-serif
}
h1 { margin-left: -8% }
h2,h3 { margin-left: -4% }
pre { color: green; font-weight: bold; font-family: monospace}
em { font-style: italic; color: rgb(0, 0, 153) }
strong { text-transform: uppercase; font-weight: bold }
.note {font-style: italic; color: rgb(192, 101, 101) }
//hr {text-align: center; width: 60% }
blockquote {
color: navy;
font-family: "Comic Sans MS", "Times New Roman", serif
}
blockquote.people { text-align: center; }
p.splash { color: maroon}
div h4 {margin-left 3%}
div p {margin-left: 5%}
table {
font-family: sans-serif;
font-size: 80%;
background: rgb(255,255,153)
}
td {
font-size: 80%
}
.people {font-family: "Lucida Calligraphy", serif}
:link { color: rgb(0, 0, 153) }
:visited { color: rgb(153, 0, 153) }
:active { color: rgb(255, 0, 102) }
a :hover { color: rgb(0, 0, 255) }
</style>
<style type="text/css">
blockquote.c9 {font-style: italic}
span.c8 {color: maroon}
p.c7 {font-style: italic}
a.c6 {font-weight: bold}
div.c5 {text-align: center}
hr.c4 {text-align: center}
p.c3 {text-align: center}
p.c2 {font-weight: bold; text-align: center}
h1.c1 {text-align: center}
</style>
<style type="text/css">
p.c1 {font-weight: bold}
</style>
</head>
<body bgcolor="#FFFFFF" background="grid.gif" text="black"
link="navy" vlink="black" alink="red">
<h1 class="c1"><img src="tidy.gif" width="32" height="32"
align="top" alt="icon" /> Clean up your Web pages<br />
with HTML TIDY</h1>
<p class="c2">This version 4th August 2000</p>
<p class="c3"><small>Copyright © 1998-2000 <a
href="http://www.w3.org/">W3C</a>, see <a
href="tidy.c">tidy.c</a> for copyright notice.</small></p>
<blockquote>With many thanks to <a
href="http://www.hp.com/">Hewlett Packard</a> for financial
support during the development of this software!</blockquote>
<hr width="80%" class="c4" />
<p class="c3"><a href="#help">How to use Tidy</a> | <a
href="#download">Downloading Tidy</a> | <a
href="release-notes.html">Release Notes</a><br />
<a href="#quotes">Integration with other Software</a> | <a
href="#acks">Acknowledgements</a></p>
<hr width="80%" class="c4" />
<p>To get the latest version of Tidy please visit the original
version of this page at: <a
href="http://www.w3.org/People/Raggett/tidy/">http://www.w3.org/People/Raggett/tidy/</a>.
Courtesy of Netmind, you can register for email reminders when
new versions of tidy become available.</p>
<form method="get"
action="http://www.netmind.com/cgi-bin/uncgi/url-mind">
<div class="c5"><input type="submit"
value="Press Here to Register" /></div>
</form>
<p>The public email list devoted to HTML Tidy is: <<a
href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>>. To
subscribe send an email to html-tidy-request@w3.org with the word
subscribe in the subject line (include the word unsubscribe if
you want to unsubscribe). The <a
href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
for this list is accessible online. Please use this list to
report errors or enhancement requests. See the <a
href="release-notes.html" class="c6">release notes</a> for
information on recent changes. Your feedback is welcome!</p>
<p>If you find HTML Tidy useful and you would like to say thanks,
then please send me a (paper) postcard or other souvenir from the
area in which you live along with a few words on what you are
using Tidy for. It will be fun to map out where Tidy users are to
be found! My <a href="#address">postal address</a> is given at
the end of this file.</p>
<h3>Tutorials for HTML and CSS</h3>
<p>If you are just starting off and would like to know more about
how to author Web pages, you may find my <a
href="http://www.w3.org/MarkUp/Guide/">guide to HTML and CSS</a>
helpful. Please send me feedback on this, and I will do my best
to further improve it.</p>
<h4>Support for Word2000</h4>
<p>Tidy can now perform wonders on HTML saved from Microsoft Word
2000! Word bulks out HTML files with stuff for round-tripping
presentation between HTML and Word. If you are more concerned
about using HTML on the Web, check out Tidy's "<a
href="#word2000">Word-2000"</a> config option! Of course Tidy
does a good job on Word'97 files as well!</p>
<h3>Introduction to TIDY</h3>
<p>When editing HTML it's easy to make mistakes. Wouldn't it be
nice if there was a simple way to fix these mistakes
automatically and tidy up sloppy editing into nicely layed out
markup? Well now there is! Dave Raggett's HTML TIDY is a free
utility for doing just that. It also works great on the
atrociously hard to read markup generated by specialized HTML
editors and conversion tools, and can help you identify where you
need to pay further attention on making your pages more
accessible to people with disabilities.</p>
<p>Tidy is able to fix up a wide range of problems and to bring
to your attention things that you need to work on yourself. Each
item found is listed with the line number and column so that you
can see where the problem lies in your markup. Tidy won't
generate a cleaned up version when there are problems that it
can't be sure of how to handle. These are logged as "errors"
rather than "warnings".</p>
<p class="c7">Tidy features in a <a
href="http://webreview.com/wr/pub/1999/07/16/feature/index.html">recent
article on XHTML</a> by webreview.com.</p>
<!-- is the final "index.html" needed or appropriate? -->
<h3>Examples of TIDY at work</h3>
<p>Tidy corrects the markup in a way that matches where possible
the observed rendering in popular browsers from Netscape and
Microsoft. Here are just a few examples of how TIDY perfects your
HTML for you:</p>
<ul>
<li><b>Missing or mismatched end tags are detected and
corrected</b>
<pre>
<h1>heading
<h2>subheading</h3>
</pre>
<p>is mapped to</p>
<pre>
<h1>heading</h1>
<h2>subheading</h2>
</pre>
</li>
<li><b>End tags in the wrong order are corrected:</b>
<pre>
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
</pre>
<p>is mapped to</p>
<pre>
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
</pre>
</li>
<li><b>Fixes problems with heading emphasis</b>
<pre>
<h1><i>italic heading</h1>
<p>new paragraph
</pre>
<p>In Netscape and Internet Explorer this causes everything
following the heading to be in the heading font size, not the
desired effect at all!</p>
<p>Tidy maps the example to</p>
<pre>
<h1><i>italic heading</i></h1>
<p>new paragraph
</pre>
</li>
<li><b>Recovers from mixed up tags</b>
<pre>
<i><h1>heading</h1></i>
<p>new paragraph <b>bold text
<p>some more bold text
</pre>
<p>Tidy maps this to</p>
<pre>
<h1><i>heading</i></h1>
<p>new paragraph <b>bold text</b>
<p><b>some more bold text</b>
</pre>
</li>
<li><b>Getting the <hr> in the right place:</b>
<pre>
<h1><hr>heading</h1>
<h2>sub<hr>heading</h2>
</pre>
<p>Tidy maps this to</p>
<pre>
<hr>
<h1>heading</h1>
<h2>sub</h2>
<hr>
<h2>heading</h2>
</pre>
</li>
<li><b>Adding the missing "/" in end tags for anchors:</b>
<pre>
<a href="#refs">References<a>
</pre>
<p>Tidy maps this to</p>
<pre>
<a href="#refs">References</a>
</pre>
</li>
<li><b>Perfecting lists by putting in tags missed out:</b>
<pre>
<body>
<li>1st list item
<li>2nd list item
</pre>
<p>is mapped to</p>
<pre>
<body>
<ul>
<li>1st list item</li>
<li>2nd list item</li>
</ul>
</pre>
</li>
<li><b>Missing quotes around attribute values are added</b>
<p>Tidy inserts quote marks around all attribute values for you.
It can also detect when you have forgotten the closing quote
mark, although this is something you will have to fix
yourself.</p>
</li>
<li><b>Unknown/Proprietary attributes are reported</b>
<p>Tidy has a comprehensive knowledge of the attributes defined
in the HTML 4.0 recommendation from W3C. This often allows you to
spot where you have mistyped an attribute or value.</p>
</li>
<li><b>Proprietary elements are recognized and reported as
such.</b>
<p>Tidy will even work out which version of HTML you are using
and insert the appropriate DOCTYPE element, as per the W3C
recommendations.</p>
</li>
<li><b>Tags lacking a terminating '>' are spotted</b>
<p>This is something you then have to fix yourself as Tidy is
unsure of where the > should be inserted.</p>
</li>
</ul>
<h3>Layout style</h3>
<p>You can choose which style you want Tidy to use when it
generates the cleaned up markup: for instance whether you like
elements to indent their contents or not. Several people have
asked if Tidy could preserve the original layout. I am sorry to
say that this would be very hard to support due to the way Tidy
is implemented. Tidy starts by building a clean parse tree from
the source file. The parse tree doesn't contain any information
about the original layout. Tidy then pretty prints the parse tree
using the current layout options. Trying to preserve the original
layout would interact badly with the repair operations needed to
build a clean parse tree and considerably complicate the
code.</p>
<p>Some browsers can screw up the right alignment of text
depending on how you layout headings. As an example,
consider:</p>
<pre>
<h1 align="right">
Heading
</h1>
<h1 align="right">Heading</h1>
</pre>
<p>Both of these should be rendered the same. Sadly a common
browser bug fails to trim trailing whitespace and misaligns the
first heading. HTML Tidy will protect you from this bug, except
when you set the indent option to "yes".</p>
<p>Setting the indent option to yes can also cause problems with
table layout for some browsers:</p>
<pre>
<td><img src="foo.gif"></td>
<td><img src="foo.gif"></td>
</pre>
<p>will look slightly different from:</p>
<pre>
<td>
<img src="foo.gif">
</td>
<td>
<img src="foo.gif">
</td>
</pre>
<p>You can avoid such quirks by using indent: no or
indent: auto in the config file.</p>
<h3>Internationalization issues</h3>
<p>Tidy offers you a choice of character encodings: US ASCII, ISO
Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The
full set of HTML 4.0 entities are defined. Cleaned up output uses
HTML entity names for characters when appropriate. Otherwise
characters outside the normal range are output as numeric
character entities. Tidy defaults to assuming you want the output
to be in US ASCII. Tidy doesn't yet recognize the use of the HTML
meta element for specifying the character encoding.</p>
<h3>Accessibility</h3>
<p>Tidy offers advice on accessibility problems for people using
non-graphical browsers. The most common thing you will see is the
suggestion you add a summary attribute to table elements. The
idea is to provide a summary of the table's role and structure
suitable for use with aural browsers.</p>
<h3>Cleaning up presentational markup</h3>
<p>Many tools generate HTML with an excess of FONT, NOBR and
CENTER tags. Tidy's <em>-clean</em> option will replace them by
style properties and rules using CSS. This makes the markup
easier to read and maintain as well as reducing the file size!
Tidy is expected to get smarter at this in the future.</p>
<p>Some pages rely on the presentation effects of isolated
<p> or </p> tags.Tidy deletes empty paragraph and
heading elements etc. The use of empty paragraph elements is not
recommended for adding vertical whitespace. Instead use style
sheets, or the <br> element. Tidy won't discard paragraphs
only containing a nonbreaking space &nbsp;</p>
<h3>Teaching Tidy about new tags!</h3>
<p>You can teach Tidy about new tags by declaring them in the
configuration file, the syntax is:</p>
<pre>
new-inline-tags: <em>tag1, tag2, tag3</em>
new-empty-tags: <em>tag1, tag2, tag3</em>
new-blocklevel-tags: <em>tag1, tag2, tag3</em>
new-pre-tags: <em>tag1, tag2, tag3</em>
</pre>
<p>The same tag can be defined as empty and as inline or as empty
and as block.</p>
<p>These declarations can be combined to define an a new empty
inline or empty block element, but you are not advised to declare
tags as being both inline and block!</p>
<p>Note that the new tags can only appear where Tidy expects
inline or block-level tags respectively. This means you can't
(yet) place new tags within the document head or other contexts
with restricted content models. So far the most popular use of
this feature is to allow Tidy to be applied to Cold Fusion
files.</p>
<p class="c7">I am working on ways to make it easy to customize
the permitted document syntax using <a
href="http://www.w3.org/People/Raggett/dtdgen/Docs/">assertion
grammars</a>, and hope to apply this to a much smarter version of
Tidy for release later this year or early next year.</p>
<h3>Limited support for ASP, JSTE and PHP</h3>
<p>Tidy is somewhat aware of the preprocessing language called
ASP which uses a pseudo element syntax <% ... %>
to include preprocessor directives. ASP is normally interpreted
by the web server before delivery to the browser. JSTE shares the
same syntax, but sometimes also uses <# ... #>.
Tidy can also cope with another such language called PHP, which
uses the syntax <?php ... ?></p>
<p>Tidy will cope with ASP, JSTE and PHP pseudo elements within
element content and as replacements for attributes, for
example:</p>
<pre>
<option <% if rsSchool.Fields("ID").Value
= session("sessSchoolID")
then Response.Write("selected") %>
value='<%=rsSchool.Fields("ID").Value%>'>
<%=rsSchool.Fields("Name").Value%>
(<%=rsSchool.Fields("ID").Value%>)
</option>
</pre>
<p>Note that Tidy doesn't understand the scripting language used
within pseudo elements and attributes, and can easily get
confused. Tidy may report missing attributes when these are
hidden within preprocessor code. Tidy can also get things wrong
if the code includes quote marks, e.g. if the example above is
changed to:</p>
<pre>
value="<%=rsSchool.Fields("ID").Value%>"
</pre>
<p>Tidy will now see the quote mark preceding ID as ending the
attribute value, and proceed to complain about what follows. Note
you can choose whether to allow line wrapping on spaces within
pseudo elements or not using the <tt>wrap-asp</tt> option. If you
used ASP, JSTE or PHP to create a start tag, but placed the end
tag explicitly in the markup, Tidy won't be able to match them
up, and will delete the end tag for you. So in this case you are
advise to make the start tag explicit and to use ASP, JSTE or PHP
for just the attributes, e.g.</p>
<pre>
<a href="<%=random.site()%>">do you feel lucky?</a>
</pre>
<p>Tidy allows you to control whether line wrapping is enabled
for ASP, JSTE and PHP instructions, see the wrap-asp, wrap-jste
and wrap-php config options, respectively.</p>
<p>I regret that Tidy does <b>not</b> support Tango preprocessing
instructions which look like:</p>
<pre>
<@if variable_1='a'>
do something
<@else>
do nothing
</@if>
<@include <@cgi><@appfilepath>includes/message.html>
</pre>
<p>Tidy supports another preprocessing syntax called "Tango", but
only for attribute values. Adding support for pseudo elements
written in Tango looks as if it would be quite tough, so I would
like to gauge the level of interest before committing to this
work.</p>
<h3>Limited support for XML</h3>
<p>XML processors compliant with W3C's XML 1.0 recommendation are
very picky about which files they will accept. Tidy can help you
to fix errors that cause your XML files to be rejected. Tidy
doesn't yet recognize all XML features though, e.g. it doesn't
understand CDATA sections or DTD subsets.</p>
<h3>Creating Slides</h3>
<p>The <em>-slides</em> option allows you to burst a single HTML
file into a number of linked slides. Each H2 element in the input
file is treated as delimiting the start of the next slide. The
slides are named slide1.html, slide2.html, slide3.html etc. This
is a relatively new feature and ideas are welcomed as to how to
improve it. In particular, I plan to add support to the
configuration file for setting the style sheet for slides and for
customizing the slides via a template.</p>
<p>I would be interested in hearing from anyone who can offer
help with using JavaScript for adding dynamic effects to slides,
for instance similar to those available in Microsoft
PowerPoint.</p>
<h3>Indenting text for a better layout</h3>
<p>Indenting the content of elements makes the markup easier to
read. Tidy can do this for all elements or just for those where
it's needed. The auto-indent mode has been used below to avoid
indenting the content of title, p and li elements:</p>
<pre>
<html>
<head>
<title>Test document</title>
</head>
<body>
<p>para which has enough text to cause a line break,
and so test the wrapping mechanism for long lines.</p>
<pre>
This is
<em>genuine
preformatted</em>
text
</pre>
<ul>
<li>1st list item</li>
<li>2nd list item</li>
</ul>
<!-- end comment -->
</body>
</html>
</pre>
<p>Indenting the content does increase the size of the file, so
you may prefer Tidy's default style:</p>
<pre>
<html>
<head>
<title>Test document</title>
</head>
<body>
<p>para which has enough text to cause a line break,
and so test the wrapping mechanism for long lines.</p>
<pre>This is
<em>genuine
preformatted</em>
text
</pre>
<ul>
<li>1st list item </li>
<li>2nd list item</li>
</ul>
<!-- end comment -->
</body>
</html>
</pre>
<h3><a id="help" name="help">How to run tidy</a></h3>
<pre>
<span class="c8">tidy</span> <em>[[options] filename]*</em>
</pre>
<p>HTML tidy is not (yet) a Windows program. If you run tidy
without any arguments, it will just sit there waiting to read
markup on the stdin stream. Tidy's input and output default to
stdin and stdout respectively. Errors are written to stderr but
can be redirected to a file with the -f <em>filename</em>
option.</p>
<p>I generally use the -m option to get tidy to update the
original file, and if the file is particularly bad I also use the
-f option to write the errors to a file to make it easier to
review them. Tidy supports a small set of character encoding
options. The default is ASCII, which makes it easy to edit markup
in regular text editors.</p>
<p>For instance:</p>
<pre>
tidy -f errs.txt -m index.html
</pre>
<p>which runs tidy on the file "index.html" updating it in place
and writing the error messages to the file "errs.txt". Its a good
idea to save your work before tidying it, as with all complex
software, tidy may have bugs. If you find any please let me
know!</p>
<p>Thanks to Jacek Niedziela, The Win32 executable for tidy is
now able to example wild cards in filenames. This utilizes the
setargv library supplied with VC++.</p>
<p>Tidy writes errors to stderr, and won't be paused by the more
command. A work around is to redirect stderr to stdout as
follows. This works on Unix and Windows NT, but not on other
platforms. My thanks to Markus Wolf for this tip!</p>
<pre>
tidy file.html 2>&1 | more
</pre>
<h4>Tidy's Options</h4>
<p>To get a list of available options use:</p>
<pre>
tidy -help
</pre>
<p>You may want to run it through more to view the help a page at
a time.</p>
<pre>
tidy -help | more
</pre>
<p>Input and Output default to stdin/stdout respectively. Single
letter options apart from -f may be combined as in: tidy -f
errs.txt -imu foo.html</p>
<p>Matej Vela <<a
href="mailto:vela@debian.org">vela@debian.org</a>> has written
a <a href="man_page.txt">Unix man page for Tidy</a>, but for the
latest details on config options and for the release notes please
visit this page: <a
href="http://www.w3.org/People/Raggett/tidy">http://www.w3.org/People/Raggett/tidy</a>.</p>
<h3><a id="config" name="config">Using a Configuration
File</a></h3>
<p>Tidy now supports a configuration file, and this is now much
the most convenient way to configure Tidy. Assuming you have
created a config file named "config.txt" (the name doesn't
matter), you can instruct Tidy to use it via the command line
option <tt>-config config.txt</tt>, e.g.</p>
<pre>
tidy -config config.txt file1.html file2.html
</pre>
<p>Alternatively, you can name the default config file via the
environment variable named "HTML_TIDY". Note this should be the
absolute path since you are likely to want to run Tidy in
different directories. You can also set a config file at compile
time by defining TIDY_CONFIG_FILE as the path string, see
platform.h.</p>
<p>You can now set config options on the command line by
preceding the name of the option immediately (no intervening
space) by "--", for example:</p>
<pre>
tidy --break-before-br true --show-warnings false
</pre>
<p>The following options are supported:</p>
<dl>
<dt>tidy-mark: <em>bool</em></dt>
<dd>If set to <em>yes</em> (the default) Tidy will add a meta
element to the document head to indicate that the document has
been tidied. To suppress this, set tidy-mark to <em>no</em>. Tidy
won't add a meta element if one is already present.</dd>
<dt>markup: <em>bool</em></dt>
<dd>Determines whether Tidy generates a pretty printed version of
the markup. Bool values are either <em>yes</em> or <em>no</em>.
Note that Tidy won't generate a pretty printed version if it
finds unknown tags, or missing trailing quotes on attribute
values, or missing trailing '>' on tags. The default is
<em>yes</em>.</dd>
<dt>wrap: <em>number</em></dt>
<dd>Sets the right margin for line wrapping. Tidy tries to wrap
lines so that they do not exceed this length. The default is 66.
Set wrap to zero if you want to disable line wrapping.</dd>
<dt>wrap-attributes: <em>bool</em></dt>
<dd>If set to <em>yes</em>, attribute values may be wrapped
across lines for easier editing. The default is no. This option
can be set independently of wrap-scriptlets</dd>
<dt>wrap-script-literals: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this allows lines to be wrapped
within string literals that appear in script attributes. The
default is <em>no</em>. The example shows how Tidy wraps a really
really long script string literal inserting a backslash character
before the linebreak:
<pre>
<a href="somewhere.html" onmouseover="document.status = '...some \
really, really, really, really, really, really, really, really, \
really, really long string..';">test</a>
</pre>
</dd>
<dt>wrap-asp: <em>bool</em></dt>
<dd>If set to <em>no</em>, this prevents lines from being wrapped
within ASP pseudo elements, which look like:
<% ... %>. The default is <em>yes</em>.</dd>
<dt>wrap-jste: <em>bool</em></dt>
<dd>If set to <em>no</em>, this prevents lines from being wrapped
within JSTE pseudo elements, which look like:
<# ... #>. The default is <em>yes</em>.</dd>
<dt>wrap-php: <em>bool</em></dt>
<dd>If set to <em>no</em>, this prevents lines from being wrapped
within PHP pseudo elements. The default is <em>yes</em>.</dd>
<dt>literal-attributes: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this ensures that whitespace
characters within attribute values are passed through unchanged.
The default is <em>no</em>.</dd>
<dt>tab-size: <em>number</em></dt>
<dd>Sets the number of columns between successive tab stops. The
default is 4. It is used to map tabs to spaces when reading
files. Tidy never outputs files with tabs.</dd>
<dt>indent: <em>no, yes</em> or <em>auto</em></dt>
<dd>If set to <em>yes</em>, Tidy will indent block-level tags.
The default is <em>no</em>. If set to <em>auto</em> Tidy will
decide whether or not to indent the content of tags such as
title, h1-h6, li, td, th, or p depending on whether or not the
content includes a block-level element. You are advised to avoid
setting indent to yes as this can expose layout bugs in some
browsers.</dd>
<dt>indent-spaces: <em>number</em></dt>
<dd>Sets the number of spaces to indent content when indentation
is enabled. The default is 2 spaces.</dd>
<dt>indent-attributes: <em>bool</em></dt>
<dd>If set to <em>yes</em>, each attribute will begin on a new
line. The default is <em>no</em>.</dd>
<dt>hide-endtags: <em>bool</em></dt>
<dd>If set to <em>yes</em>, optional end-tags will be omitted
when generating the pretty printed markup. This option is ignored
if you are outputting to XML. The default is <em>no</em>.</dd>
<dt>input-xml: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will use the XML parser rather
than the error correcting HTML parser. The default is
<em>no</em>.</dd>
<dt>output-xml: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will use generate the pretty
printed output writing it as well-formed XML. Any entities not
defined in XML 1.0 will be written as numeric entities to allow
them to be parsed by an XML parser. The tags and attributes will
be in the case used in the input document, regardless of other
options. The default is <em>no</em>.</dd>
<dt>add-xml-pi: <em>bool</em></dt>
<dt>add-xml-decl: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will add the XML declatation
when outputting XML or XHTML. The default is <em>no</em>. Note
that if the input document includes an <?xml?> declaration
then it will appear in the output independent of the value of
this option.</dd>
<dt>output-xhtml: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will generate the pretty printed
output writing it as extensible HTML. The default is <em>no</em>.
This option causes Tidy to set the doctype and default namespace
as appropriate to XHTML. If a doctype or namespace is given they
will checked for consistency with the content of the document. In
the case of an inconsistency, the corrected values will appear in
the output. For XHTML, entities can be written as named or
numeric entities according to the value of the "numeric-entities"
property. The tags and attributes will be output in the case used
in the input document, regardless of other options.</dd>
<dt>doctype: <em>omit, auto, strict, loose</em> or
<<em>fpi</em>></dt>
<dd>This property controls the doctype declaration generated by
Tidy. If set to <em>omit</em> the output file won't contain a
doctype declaration. If set to <em>auto</em> (the default) Tidy
will use an educated guess based upon the contents of the
document. If set to <em>strict</em>, Tidy will set the doctype to
the strict DTD. If set to <em>loose</em>, the doctype is set to
the loose (transitional) DTD. Alternatively, you can supply a
string for the formal public identifier (fpi) for example:</dd>
<dd>
<pre>
doctype: "-//ACME//DTD HTML 3.14159//EN"
</pre>
</dd>
<dd>If you specify the fpi for an XHTML document, Tidy will set
the system identifier to the empty string. Tidy leaves the
document type for generic XML documents unchanged.</dd>
<dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or
<em>iso2022</em></dt>
<dd>Determines how Tidy interprets character streams. For
<em>ascii</em>, Tidy will accept Latin-1 character values, but
will use entities for all characters whose value > 127. For
<em>raw</em>, Tidy will output values above 127 without
translating them into entities. For <em>latin1</em> characters
above 255 will be written as entities. For <em>utf8</em>, Tidy
assumes that both input and output is encoded as UTF-8. You can
use <em>iso2022</em> for files encoded using the ISO2022 family
of encodings e.g. ISO 2022-JP. The default is
<em>ascii</em>.</dd>
<dt>numeric-entities: <em>bool</em></dt>
<dd>Causes entities other than the basic XML 1.0 named entities
to be written in the numeric rather than the named entity form.
The default is <em>no</em></dd>
<dt>quote-marks: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes " characters to be
written out as &quot; as is preferred by some editing
environments. The apostrophe character ' is written out as
&#39; since many web browsers don't yet support &apos;.
The default is <em>no</em>.</dd>
<dt>quote-nbsp: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes non-breaking space
characters to be written out as entities, rather than as the
Unicode character value 160 (decimal). The default is
<em>yes</em>.</dd>
<dt>quote-ampersand: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes unadorned &
characters to be written out as &amp;. The default is
<em>yes</em>.</dd>
<dt>assume-xml-procins: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this changes the parsing of
processing instructions to require ?> as the terminator rather
than >. The default is <em>no</em>. This option is
automatically set if the input is in XML.</dd>
<dt>fix-backslash: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes backslash characters "\"
in URLs to be replaced by forward slashes "/". The default is
<em>yes</em>.</dd>
<dt>break-before-br: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will output a line break before
each <br> element. The default is <em>no</em>.</dd>
<dt>uppercase-tags: <em>bool</em></dt>
<dd>Causes tag names to be output in upper case. The default is
<em>no</em> resulting in lowercase, except for XML input where
the original case is preserved.</dd>
<dt>uppercase-attributes: <em>bool</em></dt>
<dd>If set to <em>yes</em> attribute names are output in upper
case. The default is <em>no</em> resulting in lowercase, except
for XML where the original case is preserved.</dd>
<dt><a id="word2000" name="word2000">word-2000:
<em>bool</em></a></dt>
<dd>If set to <em>yes</em>, Tidy will go to great pains to strip
out all the surplus stuff Microsoft Word 2000 inserts when you
save Word documents as "Web pages". The default is <em>no</em>.
Note that Tidy doesn't yet know what to do with VML markup from
Word, but in future I hope to be able to map VML to SVG.<br />
<br />
Microsoft has developed its own optional filter for exporting to
HTML, and the 2.0 version is much improved. You can download the
filter free from the <a
href="http://officeupdate.microsoft.com/2000/downloadDetails/Msohtmf2.htm">
Microsoft Office Update site</a>.</dd>
<dt>clean: <em>bool</em></dt>
<dd>If set to <em>yes</em>, causes Tidy to strip out surplus
presentational tags and attributes replacing them by style rules
and structural markup as appropriate. It works well on the html
saved from Microsoft Office'97. The default is <em>no</em>.</dd>
<dt>logical-emphasis: <em>bool</em></dt>
<dd>If set to <em>yes</em>, causes Tidy to replace any occurrence
of i by em and any occurrence of b by strong. In both cases, the
attributes are preserved unchanged. The default is <em>no</em>.
This option can now be set independently of the clean and
drop-font-tags options.</dd>
<dt>drop-empty-paras: <em>bool</em></dt>
<dd>If set to <em>yes</em>, empty paragraphs will be discarded.
If set to no, empty paragraphs are replaced by a pair of
<code>br</code> elements as HTML4 precludes empty paragraphs. The
default is <em>yes</em>.</dd>
<dt>drop-font-tags: <em>bool</em></dt>
<dd>If set to <em>yes</em> together with the clean option (see
above), Tidy will discard font and center tags rather than
creating the corresponding style rules. The default is
<em>no</em>.</dd>
<dt>enclose-text: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes Tidy to enclose any text
it finds in the body element within a p element. This is useful
when you want to take an existing html file and use it with a
style sheet. Any text at the body level will screw up the
margins, but wrap the text within a p element and all is well!
The default is <em>no</em>.</dd>
<dt>enclose-block-text: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes Tidy to insert a p
element to enclose any text it finds in any element that allows
mixed content for HTML transitional but not HTML strict. The
default is <em>no</em>.</dd>
<dt>fix-bad-comments: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes Tidy to replace
unexpected hyphens with "=" characters when it comes across
adjacent hyphens. The default is <em>yes</em>. This option is
provided for users of Cold Fusion which uses the comment syntax:
<!--- ---></dd>
<dt>add-xml-space: <em>bool</em></dt>
<dd>If set to <em>yes</em>, this causes Tidy to add
xml:space="preserve" to elements such as pre, style and script
when generating XML. This is needed if the whitespace in such
elements is to be parsed appropriately without having access to
the DTD. The default is <em>no</em>.</dd>
<dt>alt-text: <em>string</em></dt>
<dd>This allows you to set the default alt text for img
attributes. This feature is dangerous as it suppresses further
accessibility warnings. <b>YOU ARE RESPONSIBLE FOR MAKING YOUR
DOCUMENTS ACCESSIBLE TO PEOPLE WHO CAN'T SEE THE
IMAGES!!!</b></dd>
<dt>write-back: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy will write back the tidied
markup to the same file it read from. The default is <em>no</em>.
You are advised to keep copies of important files before tidying
them as on rare occasions the result may not always be what you
expect.</dd>
<dt>keep-time: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy won't alter the last modified
time for files it writes back to. The default is <em>yes</em>.
This allows you to tidy files without effecting which ones will
be uploaded to the Web server when using a tool such as
'SiteCopy'. Note that this feature may not work on some
platforms.</dd>
<dt>error-file: <em>filename</em></dt>
<dd>Writes errors and warnings to the named file rather than to
stderr.</dd>
<dt>show-warnings: <em>bool</em></dt>
<dd>If set to <em>no</em>, warnings are suppressed. This can be
useful when a few errors are hidden in a flurry of warnings. The
default is <em>yes</em>.</dd>
<dt>quiet: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy won't output the welcome message
or the summary of the numbers of errors and warnings. The default
is <em>no</em>.</dd>
<dt>gnu-emacs: <em>bool</em></dt>
<dd>If set to <em>yes</em>, Tidy changes the format for reporting
errors and warnings to a format that is more easily parsed by GNU
Emacs. The default is <em>no</em>.</dd>
<dt>split: <em>bool</em></dt>
<dd>If set to <em>yes</em> Tidy will use the input file to create
a sequence of slides, splitting the markup prior to each
successive <h2>. You can see an example of the results in a
<a
href="http://www.w3.org/Talks/1999/03/24-stockholm-xhtml/">recent
talk I made on XHTML</a>. The slides are written to
"slide1.html", "slide2.html" etc. The default is
<em>no</em>.</dd>
<dt>new-empty-tags: <em>tag1, tag2, tag3</em></dt>
<dd>Use this to declare new empty inline tags. The option takes a
space or comma separated list of tag names. Unless you declare
new tags, Tidy will refuse to generate a tidied file if the input
includes previously unknown tags. Remember to also declare empty
tags as either inline or blocklevel, see below.</dd>
<dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt>
<dd>Use this to declare new non-empty inline tags. The option
takes a space or comma separated list of tag names. Unless you
declare new tags, Tidy will refuse to generate a tidied file if
the input includes previously unknown tags.</dd>
<dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt>
<dd>Use this to declare new block-level tags. The option takes a
space or comma separated list of tag names. Unless you declare
new tags, Tidy will refuse to generate a tidied file if the input
includes previously unknown tags. Note you can't change the
content model for elements such as table, ul, ol and dl. This is
explained in more detail in the <a
href="release-notes.html">release notes</a>.</dd>
<dt>new-pre-tags: <em>tag1, tag2, tag3</em></dt>
<dd>Use this to declare new tags that are to be processed in
exactly the same way as HTML's pre element. The option takes a
space or comma separated list of tag names. Unless you declare
new tags, Tidy will refuse to generate a tidied file if the input
includes previously unknown tags. Note you can't as yet add new
CDATA elements (similar to script).</dd>
</dl>
<h4>Sample Config File</h4>
<p>This is just an example to get you started.</p>
<pre>
// sample config file for HTML tidy
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
input-xml: no
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: latin1
new-inline-tags: cfif, cfelse, math, mroot,
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
munder, mover, mmultiscripts, msup, msub, mtext,
mprescripts, mtable, mtr, mtd, mth
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse
</pre>
<h3><a id="scripts" name="scripts">Using Tidy from
scripts</a></h3>
<p>If you want to run Tidy from a Perl or other scripting
language you may find it of value to inspect the result returned
by Tidy when it exits: 0 if everything is fine, 1 if there were
warnings and 2 if there were errors. This is an example using
Perl:</p>
<pre>
if (close(TIDY) == 0) {
my $exitcode = $? >> 8;
if ($exitcode == 1) {
printf STDERR "tidy issued warning messages\n";
} elsif ($exitcode == 2) {
printf STDERR "tidy issued error messages\n";
} else {
die "tidy exited with code: $exitcode\n";
}
} else {
printf STDERR "tidy detected no errors\n";
}
</pre>
<h3><a id="download" name="download">Downloadable
Binaries</a></h3>
<p class="note">If you are prepared to maintain a public URL for
HTML Tidy compiled for a specific platform, please let me know so
that I can add a link to your page. This will avoid the need for
me to update this page whenever you recompile.</p>
<div class="platforms">
<h4>Windows 95/98/NT/2000</h4>
<p><b><a
href="http://www.w3.org/People/Raggett/tidy.exe">tidy.exe</a></b>.
Windows 95/98/NT/2000 executable (32-bit Windows console-mode
program). This is the executable that I maintain as part of the
HTML Tidy distribution. The command line parameters are described
above, along with the extensive configuration file options.</p>
<p><b><a
href="http://www.chami.com/free/html-kit/">HTML-Kit</a></b> - a
free HTML editor for Windows 95/98/NT/2000 with integrated
support for Tidy.</p>
<p><b><a
href="http://perso.wanadoo.fr/ablavier/TidyGUI/">TidyGUI</a></b>.
Windows front end for running Tidy, written by André
Blavier. André has also written a <b><a
href="http://perso.wanadoo.fr/ablavier/TidyCOM/">Windows COM
wrapper</a></b> for Tidy. He describes how to use this from
Visual Basic.</p>
<p><b><a href="http://www.evrsoft.com/">Evrsoft's 1st Page
2000</a></b> - a free HTML editor for Windows 95/98/NT/2000 with
integrated support for Tidy. 1st Page 2000 is a high-end
authoring tool that makes it easy to add effects based upon
scripting.</p>
<p><b><a href="http://www.notetab.com/">NoteTab</a></b> - an
award winning text and html editor for Windows with built-in
support for running HTML Tidy. NoteTab is written by Eric
Fookes.</p>
<h4>Mac OS</h4>
Several versions of <a
href="http://www.geocities.com/SiliconValley/1057/tidy.html">HTML
Tidy for Mac OS</a> are available, including a standalone
Macintosh application with a graphical user interface, a BBEdit
plugin, a MPW tool, or as a FilterTop filter ( <a
href="http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF">
Screenshot</a>). My thanks to <a
href="mailto:teague@mailandnews.com">Terry Teague</a> for this
port.<br />
<br />
<h4>Atari</h4>
<p>Arnaud Bercegeay's site for the <a
href="http://tidy.atari.org">Atari binary for Tidy</a>.</p>
<h4>Amiga</h4>
<p>Keith Blakemore-Noble maintains a page for <a
href="http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Tidy
on Amiga</a>.</p>
<h4>BeOS</h4>
<p>Peter Enzerink is maintaining <a
href="http://www.bytepeople.com/beos/apps/htmltidy.html">HTML
Tidy</a> for BeOS. Link points to download for HTML Tidy as well
as HTML Tidy editor addons for BeOS.</p>
<h4>AIX</h4>
<p>Ciaran Deignan maintains an <a
href="http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX
binary for Tidy</a>. The link is to a general download page. The
executable is available for AIX 4.3.2 and later.</p>
<h4>Linux</h4>
<p>Dimitri Papadopoulos maintains a <a
href="http://perso.club-internet.fr/dpo/rpm/">Tidy RPM package
for Redhat Linux</a> You may also be able to find Tidy on other
Linux distribution sites, e.g. <a
href="http://rpmfind.net/">http://rpmfind.net/</a>.</p>
<!-- no longer accessible :-(
<p><b><a href=
"http://www.astro.uni-bonn.de/~webstw/cm/w3c_tidy/index.html">
Linux users</a></b>! ochen M. Braun is maintaining Tidy binary
for Linux (ELF 32-bit LSB executable using '<tt>libc.so.5</tt>'
for Intel 80386): '<a href=
"ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy"><tt>tidy</tt></a>
'. Additionally a man page can be downloaded: <a href=
"ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy.1"><tt>
tidy.1</tt></a>.</p>
-->
<h4>UnixWare</h4>
<p>Simon Trimmer <<a
href="mailto:simon@ocston.org">simon@ocston.org</a>> maintains
a <a href="http://www.ocston.org/~simon/tidy/">Tidy binary for
Unixware</a>.</p>
<h4>HP-UX</h4>
<p>You can get precompiled versions of Tidy for HPUX, from <a
href="http://www.informatik.uni-stuttgart.de/ifi/gr/mitarbeiter/hopp/tidy/tidy.html">
Olaf Hopp</a>, and from <a
href="http://geocities.com/ian_springer/hpux_tidy.html">Ian
Springer</a>.</p>
<h4>MSDOS</h4>
<p>Nick B. maintains <a
href="http://members.xoom.com/nickbeee/tidy386/">Tidy386 for
DOS</a>. This exploits the DPMI mechanism for the memory
management.</p>
<h4>Solaris</h4>
<p>Stephen Fuqua maintains a page for <a
href="http://www.hep.utexas.edu/~sfuqua/unix">Tidy on
Solaris</a>.</p>
<h4>OS/2</h4>
<p>Kaz SHiMZ <<a
href="mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>> maintains
an <a
href="http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/index.html">OS/2
binary for Tidy</a>.</p>
<h4>FreeBSD</h4>
<p>Martin Fouts maintains <a
href="http://www.fogey.com/fouts/tidy.htm">Tidy on
FreeBSD</a>.</p>
<h4>RISC OS</h4>
<p><a href="mailto:archifishal@altavista.net">Alex Macfarlane
Smith</a> maintains a <a
href="http://www.toth.org.uk/~aardvark/programs/tidy.shtml">port
of Tidy to the RISC OS</a>.</p>
<h4>MiNT (Atari) OS</h4>
<p><a href="mailto:eaiching@t0.or.at)">Edgar Aichinger</a>
maintains a <a
href="http://wh58-508.st.uni-magdeburg.de/sparemint/html/packages/tidy.html">
port of Tidy to the MiNT OS</a>. MiNT is a UNIX for m68k Atari
computers and is nearly FHS compliant (we don't use bootable OS
images nor have any mounting capabilities, so neither /boot nor
/mnt are used). The binary also runs on ordinary TOS, since the
MiNT libraries cover all GEMDOS/GEM functions.</p>
</div>
<h3><a id="quotes" name="quotes">Integrating Tidy as part of
other Software</a></h3>
<p>You can also incorporate Tidy as part of a larger program, for
instance in HTML editors or HTML transformation tools used for
import filters, or for when you want to customize Web content to
get the best out of different kinds of browsers. Imagine
authoring clean HTML with CSS and at a touch of a button
producing variants that look great and work reliably on a large
variety of different browsers, taking into account the quirks of
each. For instance, providing the ability to tune content for
different versions of Netscape and Internet Explorer, and for
browsers running on set-top boxes for televisions, handheld and
palmtop devices, cell phones, and voice browsers. I am happy to
quote for software development for such tools.</p>
<p>Sebastian Lange has contributed a perl wrapper for calling
Tidy from your perl scripts, see <a
href="sl-tidy.pl">sl-tidy.pl</a>.</p>
<h4>Using Tidy from emacs</h4>
<p>Pete Gelbman emailed this <a
href="http://lists.w3.org/Archives/Public/html-tidy/2000AprJun/0047.html">
tip</a> for using Tidy with the Unix version of emacs. lets you
highlight a region of text and run Tidy on it. Tidy's "fixed"
output will replace your highlighted region right in place. The
error/warnings output will be directed into a separate
mini-buffer below in your main screen.</p>
<h3><a id="java" name="java">Java port of HTML Tidy</a></h3>
<p>Andy Quick <<a
href="mailto:ac.quick@sympatico.ca">ac.quick@sympatico.ca</a>>
maintains a Java port of Tidy, so you can now integrate Tidy into
your Java applications. Andy is tracking the releases of Tidy in
C (this page). More information is available on <a
href="http://www3.sympatico.ca/ac.quick/">Andy's home
page</a>.</p>
<h3><a id="implementation" name="implementation">Source
Code</a></h3>
<p>The code is in ANSI C and uses the C standard library for i/o.
The parser works top down, building a complete parse tree in
memory. Document text is held as Unicode represented as UTF-8 in
a character buffer that expands as needed. The code has so far
been tested on Windows'95, Windows'98, Windows NT, Windows 2000,
Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep,
MacOS, BeOS, OS/2, AIX, Amiga, Atari, SunOS, Solaris, IRIX and
HP-UX, amongst others.</p>
<p>Here is a link to the Open Source <a href="tidy.c">copyright
notice and license</a>.</p>
<dl>
<dt><a href="../tidy4aug00.tgz">tidy4aug00.tgz</a></dt>
<dd>gzipped tar file for source code (Unix line ends)</dd>
<dt><a href="../tidy4aug00.zip">tidy4aug00.zip</a></dt>
<dd>zipped source code (Windows line ends)</dd>
<dt><a href="platform.h">platform.h</a>, <a
href="html.h">html.h</a></dt>
<dd>the include files with common definitions</dd>
<dt><a href="config.c">config.c</a></dt>
<dd>support for customizing Tidy via config files</dd>
<dt><a href="lexer.c">lexer.c</a></dt>
<dd>lexical analysis and buffer management</dd>
<dt><a href="parser.c">parser.c</a></dt>
<dd>HTML and XML parsers</dd>
<dt><a href="tags.c">tags.c</a></dt>
<dd>dictionary of tags and their properties</dd>
<dt><a href="attrs.c">attrs.c</a></dt>
<dd>dictionary of attributes and their properties</dd>
<dt><a href="istack.c">istack.c</a></dt>
<dd>stack of active inline elements</dd>
<dt><a href="entities.c">entities.c</a></dt>
<dd>dictionary of entities</dd>
<dt><a href="clean.c">clean.c</a></dt>
<dd>smarts for cleaning up presentational markup</dd>
<dt><a href="pprint.c">pprint.c</a></dt>
<dd>pretty printing for HTML and XML</dd>
<dt><a href="localize.c">localize.c</a></dt>
<dd>Change this file to localize tidy's messages</dd>
<dt><a href="tidy.c">tidy.c</a></dt>
<dd>main() and error reporting routines</dd>
<dt><a href="Makefile">Makefile</a></dt>
<dd>Makefile for gcc</dd>
<dt><a href="man_page.txt">Unix Man page</a></dt>
<dd>Maintained by Matej Vela <vela@debian.org></dd>
</dl>
<p>Conventions for whether lines end with CRLF, LF or CR vary
from one system to another. I have included the C source for a
utility <b>tab2space</b> which can be used to ensure that files
use the line end convention of your choice, and to expand tabs to
spaces.</p>
<pre>
tab2space -t4 -unix *.h *.c
tab2space -tabs -unix Makefile
</pre>
<p>Note use of "-tabs" to ensure that tabs are preserved in the
Makefile (it won't work without them!).</p>
<p>For those of you on Unix, here is a script you can use to
strip carriage returns:</p>
<pre>
#!/bin/sh
echo Stripping Carriage Returns from files...
for i
do
# If a writable file
if [ -f $i ]
then
if [ -w $i ]
then
echo $i
# strip CRs from input and output to temp file
tr -d '\015' < $i > toix.tmp
mv toix.tmp $i
else
echo $i: write-protected
fi
else
echo $i: not a file
fi
done
</pre>
<p>Save this script to a file, e.g. "<em>scripcr</em>" and use
"<em>chmod +x stripcr</em>" to make it executable. You can then
run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p>
<h2><a id="acks" name="acks">Acknowledgements</a></h2>
<p>I would like to thank the many people who have written to me
with suggestions for improvements or reporting bugs. Your help
has been invaluable.</p>
<blockquote class="people">Jonathan Adair, Drew Adams, Osma
Ahvenlampi, Carsten Allefeld, Richard Allsebrook, Jacob Sparre
Andersen, Joe D'Andrea, Jerry Andrews, Bruce Aron, Takuya Asada,
Edward Avis, Carlos Piqueres Ayela, Nick B, Chang Hyun Baek, Nick
B, Denis Barbier, Chuck Baslock, Christer Bernerus, David J.
Biesack, John Bigby, Yu Jian Bin, Alexander Biron, Keith
Blakemore-Noble, Eric Blossom, Berend de Boer, Ochen M. Braun,
Dave Bryan, David Brooke, Andy Brown, Keith B. Brown, Andreas
Buchholz, Maurice Buxton, Jelks Cabaniss, John Cappelletti,
Trevor Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Rob
Clark, Jeremy Clulow, Dan Connolly, Larry Cousin, Ken Cox, Luis
M. Cruz, John Cumming, Ian Davey, Keith Davies, Ciaran Deignan,
David Duffy, Emma Duke-Williams, Tamminen Eero, Bodo Eing, Peter
Enzerink, Baruch Even, David Fallon, Claus André
Färber, Stephanie Foott, Darren Forcier, Martin Fouts,
Frederik Fouvry, Rene Fritz, Stephen Fuqua, Martin Gallwey, Pete
Gelbman, Francisco Guardiola, David Getchell, Michael Giroux,
Davor Golek, Guus Goos, Léa Gris, Rainer Gutsche, Kai
Hackemesser, Juha Häikiö, David Halliday,
Johann-Christian Hanke, Vlad Harchev, Shane Harrelson, Andre
Hinrichs, Bjoern Hoehrmann, G. Ken Holman, Bill Homer, Olaf Hopp,
Craig Horman, Jack Horsfield, Nigel Horspool, Pao-Hsi Huang,
Stuart Hungerford, Marc Jauvin, Rick Jelliffe, Peter Jeremy,
Craig Johnson, Charles LaFountain, Steven Lobo, Zdenek Kabelac,
Michael Kay, Jeffery Kendall, Axel Kielhorn, Konstantinos
Kleisouris, Johannes Koch, Daniel Kohn, Rudy Kohut, Allan
Kuchinsky, Volker Kuhlmann, Michael LaStella, Johnny Lee, Steve
Lee, Tony Leneis, Nick Leverton, Todd Lewis, Dietmar Lippold,
Gert-Jan C. Lokhorst, Murray Longmore, John Love-Jensen,
Satwinder Mangat, Carole Mah, Anton Marsden, Bede McCall, Shane
McCarron, Thomas McGuigan, Ian McKellar, Al Medeiros, Chris
Nappin, Ann Navarro, Jacek Niedziela, Morten Blinksbjerg Nielsen,
Kenichi Numata, Allan Odgaard, Matt Oshry, Gerald Oskoboiny, Paul
Ossenbruggen, Ernst Paalvast, Christian Pantel, Dimitri
Papadopoulos, Rick Parsons, Steven Pemberton, Daniel Persson, Lee
Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Jany
Quintard, Julian Reschke, Stephen Reynolds, Thomas Ribbrock, Ross
L. Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Peter
Ruevski, Christian Ruetgers, Klaus Johannes Rusch, John Russell,
Eric Schindler, J. Schlauch, Christian Schüler, Klaus
Alexander Seistrup, Jim Seymour, Kazuyoshi Shimizu, Geoff
Sinclair, Jo Smith, Paul Smith, Steve Spilker, Rafi Stern,
Jacques Steyn, Michael J. Suzio, Zac Thompson, Eric Thorbjornsen,
Oren Tirosh, John Tobler, Omri Traub, Loïc Trégan,
Jason Tribbeck, Simon Trimmer, Steffen Ullrich, Stuart Updegrave,
Charles A. Upsdell, Jussi Vestman, Larry W. Virden, Daniel
Vogelheim, Nigel Wadsworth, Jez Wain, Randy Waki, Paul Ward, Neil
Weber, Bertilo Wennergren, Yudong Yang, Jeff Young, Edward Zalta,
Johannes Zellner, Christian Zuckschwerdt</blockquote>
<h3><a id="address" name="address">Dave's Address</a></h3>
<pre>
73b Ground Corner
Holt
Wiltshire
BA14 6RT
United Kingdom
</pre>
<p><small><a href="http://www.w3.org/People/Raggett">Dave
Raggett</a> <<a href="mailto:dsr@w3.org">dsr@w3.org</a>> is
an engineer from <a href="http://www.hp.com/">Hewlett
Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK
Laboratories</a>, and works on assignment to the World Wide Web
Consortium, where he is the W3C lead for HTML, XForms and Voice
Browsers and Math.</small></p>
</body>
</html>
|