File: release-notes.html

package info (click to toggle)
tidy 20000113-1
  • links: PTS
  • area: main
  • in suites: potato
  • size: 576 kB
  • ctags: 971
  • sloc: ansic: 11,041; makefile: 47
file content (1620 lines) | stat: -rw-r--r-- 69,008 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>HTML TIDY - Release Notes</title>
<meta name="keywords" content=
"HTML, validation, error correction, pretty-printing">
<meta name="author" content="Dave Raggett &lt;dsr@w3.org&gt;">
<style type="text/css">
  body { 
    margin-left: 10%; 
    margin-right: 10%; 
    font-family: sans-serif
  }
  h1 { margin-left: -8% }
  h2,h3,h4,h5,h6 { margin-left: -4% }
  pre { color: green; font-weight: bold; font-size: 80%; font-family: monospace}
  em { font-style: italic; font-weight: bold }
  strong { text-transform: uppercase; font-weight: bold }
  .note {font-style: italic; color: rgb(192, 101, 101) }
  //hr {text-align: center; width: 60% }
  blockquote {
    color: navy;
    margin-left: 1%;
    margin-right: 1%;
    text-align: center;
    font-family: "Comic Sans MS", "Times New Roman", serif
  }
  table {
    font-family: sans-serif;
    font-size: 80%;
    background: rgb(255,255,153)
  }
  td {
    font-size: 80%
  }
  .people {font-family: "Lucida Calligraphy", serif}
  :link { color: rgb(0, 0, 153) }
  :visited { color: rgb(153, 0, 153) }
  :active { color: rgb(255, 0, 102) }
  :hover { color: rgb(0, 0, 255) }
</style>

<style type="text/css">
 p.c1 {font-style: italic}
</style>
</head>
<body bgcolor="#FFFFFF" background="grid.gif" text="black" link= 
"navy" vlink="black" alink="red">
<h1>HTML TIDY - Release Notes</h1>

<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a
href="mailto:dsr@w3.org">dsr@w3.org</a></p>

<h4>Public Email List for Tidy: &lt;<a href= 
"mailto:html-tidy@w3.org">html-tidy@w3.org</a>&gt;</h4>

<p>I have set up an archived mailing list devoted to Tidy. To
subscribe send an email to html-tidy-request@w3.org with the word
subscribe in the subject line (include the word unsubscribe if
you want to unsubscribe). The <a href= 
"http://lists.w3.org/Archives/Public/html-tidy/">archive</a> for
this list is accessible online. Please use this list to report
errors or enhancement requests.</p>

<h2>Things awaiting further attention</h2>

<ul>
<li>Support for BIG5 and ShiftJIS (Rick Jelliffe)</li>

<li>Check doctype FPI for upper case DTD, EN etc.</li>

<li>Stronger checking on which attributes appear on what
elements</li>

<li>Sorting attributes in a canonical order</li>

<li>Version checking for HTML 4.01 vs 4.0</li>

<li>L&#233;a Gris reports that Tidy doesn&#39;t know that map
isn&#39;t allowed as a direct child of body in HTML strict.</li>

<li>Converting &lt;font face=&quot;Symbol&quot;&gt;a&lt;/font&gt;
etc. to the corresponding Unicode characters, when cleaning
HTML.</li>
</ul>

<p>I need to set up an index of precisely what attributes are
supported on each element. Right now, some elements check their
own attributes, whilst others are checked via default checks
defined for each attribute independently of the element. Until
this is done, you sometimes find that validation services
discovering errors unnoticed by Tidy itself.</p>

<p>Jelks Cabaniss asks: <i>Could Tidy be made to automatically
&quot;clean&quot; (FONTs to CSS) if the Strict DOCTYPE is
requested? An HTML or XHTML Strict document can&#39;t have FONT
tags according to the DTDs</i>. Jelks has a bunch of other good
ideas such as converting the bgcolor attribute over to CSS. I
hope to tackle these in the next release.</p>

<p>John Russel would like to see stronger checks on quote marks
for attribute values, his example:</p>

<pre>&lt;a href=m1776bat.htm"&gt;List of Battles&lt;/a&gt;</pre>

<p>suggests a heuristic whereby " followed by &gt; or whitespace
produces a warning when found in an attribute without an initial
quotemark. Another idea would be to generate an error for this
provided the appropriate option has been set.</p>

<p>Adding an option to select slide transition effects. I would also
like to provide an optional feature for sorting attribute values.</p>

<p>I am having problems with form elements as direct children of tr or
table. It is dangerous to create an implicit table cell, and what
is needed is a way to move the form element into the next cell. If this
can't be done an error needs to be raised since Tidy will be stuck. On
a separate note, Tidy is still breaking lines between &lt;img&gt; and
&lt;/a&gt; which in Netscape shows as an underlined space. It's fine
in IE.</p>

<p>Rick Parsons would like there to be a new wrap-attributes option
that can be used to suppress line wrapping within attributes. There is
already a similar option for JavaScript literals.</p>

<p>Armando Asantos would like to use Tidy to produce a list of URLs
for images or hypertext links according to a config option. This would
be straightforward, but is a lower priority than bug fixes etc.</p>
</blockquote>

<p>Tidy needs to check for text as direct child of blockquote etc.
which isn't allowed in HTML 4 strict. This could be implemented
as a special check which or's in transitional into the version vector
when appropriate.</p>

<p>Berend de Boer suggests that if enclose-text is set to yes, then it
should apply to div as well as to body. In fact shouldn't this be
sorted for any block element that allows mixed content for HTML
transitional but not HTML strict?</p>

<p>Omri Traub would like an option to wrap the contents of style and
script elements in CDATA marked sections when converting to XHTML. He is
also interested in direct support for 16 bit character file I/O.</p>

<p>A number of people were interested in Tidied documents be marked
as such using a meta element. Tidy will now add the following to the
head if not already present:</p>

<pre>&lt;meta name="generator" content="HTML Tidy, see www.w3.org"&gt;</pre>

<p>If you don't want this added, set the option tidy-mark to no.</p>

<h2>January 2000</h2>

<p>I have added a new function ApparentVersion() which takes the
doctype into account as well as other clues. This is now used to
report the apparent version of the html in use.</p>

<p>Thanks to the encouragement of Denis Barbier, I finally got around
to deal with the extra bracketing needed to quiet gcc -Wall. This
involved the initialization of the tag, attribute and entity tables,
and miscellaneous side-effecting while and for loops.</p>

<p>PPrintXMLTree has been updated so that it only inserts line breaks
after start tags and before end tags for elements without mixed
content. This brings Tidy into line with current wisdom for XML
editors. My thanks to Eric Thorbjornsen for suggesting a fix to FindTag
that ensures that Tidy doesn't mistreat elements looking like html.</p>

<p>&lt;table border&gt; is now converted to &lt;table&nbsp;border="1"&gt;
when converting to XHTML.</p>

<p>I have added support for CDATA marked sections which are passed through
without change, e.g.</p>

<pre>&lt;![CDATA[ .. markup here has no effect .. ]]&gt;</pre>

<p>In the January 12th release, ParseXMLElement screwed up on doctypes
and toplevel comments, causing a memory exception. This has now been fixed.
PPrintXMLTree now uses zero indent for comments to avoid progressive
indention as an XML document is repeatedly tidied. I have added a blank
line after elements unless they are the last in the parent's content.</p>

<h2>December 1999</h2>

<p>Tidy now generates the XHTML namespace and system identifier as
specified by the current <a href="http://www.w3.org/TR/xhtml1/"XHTML
Proposed Recommendation</a>. In addition it now assumes the latest
version of HTML4 - HTML 4.01. This fixes an omission in 4.0 by
adding the name attribute to the img and form elements. This means
that documents with rollovers and smart forms will now validate!</p>

<p>James Pickering noticed that Tidy was missing off the xhtml- prefix
for the XHTML DTD file names in the system identifier on the doctype.
This was a recent change to XHTML. I have fixed lexer.c to deal with
this.</p>

<p>This release adds suport for <a href=
"http://developer.netscape.com/viewsource/schroder_template/schroder_template.html">
JSTE</a> psuedo elements looking like: &lt;#&nbsp;#&gt;. Note that 
Tidy can't distinguish between ASP and JSTE for psuedo elements
looking like: &lt;%&nbsp;%&gt;. Line wrapping of this syntax is
inhibited by setting either the wrap-asp or wrap-jste options to no.</p>

<p>Thanks to Jacek Niedziela, The Win32 executable for tidy
is now able to example wild cards in filenames. This utilizes
the setargv library supplied with VC++.</p>

<p>Jonathan Adair asked for the hashtables to be cleared when emptied
to avoid problems when running Tidy a second time, when Tidy is
embedded in other code. I have applied this to FreeEntities(),
FreeAttrTable(), FreeConfig(), and FreeTags().</p>

<p>Ian Davey spotted that Tidy wasn't deleting inline emphasis elements
when these only contained whitespace (other than non-breaking spaces).
This was due to an oversight in the CanPrune() function, now fixed.</p>

<p>Michel Lemay spotted some bugs in if statements and provided
some sample html files that caused Tidy to crash. On further study,
I found a bug in the code that moves font elements
inside anchors. I have fixed this and added a new method to test the tree
for internal consistency in its bidirectional links: CheckNodeIntegrity().</p>

<p>I have also refined the code for handling noframes to make it more
robust. It will now handle noframes within a body within a noframes etc.
(something permitted by HTML4). It will also recover if the noframes
end tag is missing or is in the wrong place.</p>

<p>I have fleshed out the table for mapping characters in the Windows
Western character set into Unicode, see Win2Unicode[]. Yahoo was, for
example, using the Windows Western character for bullet, which is in
Unicode is U+2022.</p>

<p>David Halliday noticed that applets without any content between
the start and end tags were being pruned by Tidy. This is a bug and
has now been fixed.</p>

<p>I have changed the way Tidy handles empty paragraphs when the
drop-empty-paras is set to no. HTML4 doesn't allow empty paragraphs
so I am now replacing them by a pair of br elements, so that the
formatting is preserved. When drop-empty-paras is set to yes, empty
paragraphs are simply removed.</p>

<p>Darren Forcier asked for a way to suppress fixing up of comments
when these include adjacent hyphens since this was screwing up Cold
Fusion's special comment syntax. The new option is called:
<i>fix-bad-comments</i> and defaults to yes.</p>

<p>Using Michel's examples I have improved the way the table parser
deals with unexpected content. This is now consistently moved before
the table, or to the head element as appropriate. Microsoft and Netscape
differ in how an unclosed blockquote renders when found at the table
or tr level. Netscape indents the table but Microsoft does not. This
is getting too tricky for me to deal with!</p>

<p>Using a sample page from Yahoo, I discovered that Netscape Navigator
doesn't implement the text-align style property on tr or table elements.
As a result I have added a special check for this in BlockStyle() to
avoid translating the align attribute on tr or table into a style rule.</p>

<p>Richard Allsebrook would like to be able to map b/i to strong/em
without the full clean process being invoked. I have therefore  decoupled
these two options. Note that setting logical-emphasis is also decoupled
from drop-font-tags.</p>


<h2>30th November 1999</h2>

<p>This is an interim release to provide a bug fix for a bug
introduced earlier in the month. I have fixed a bug in the
emphasis code which looks for start tags Which are most likely
intended as end tags. This bug only appeared in the November
release and could cause a crash or indefinite looping. My thanks
to a respondent calling himself "Michael" who provided a
collection of files that allowed me to track this down.</p>

<p>I have also added page  transition effects for the slide
maker feature. The effects are currently only visible on IE4
and above, and take advantage of the meta element. I will provide
an option to select between a range of transition effects in
the next release.</p>

<h2>November 1999</h2>

<p>David Duffy found a case causing Tidy to loop indefinitely.
The problem occurred when a blocklevel element is found within a
list item that isn&#39;t enclosed in a ul or ol element. I have
added a check to ParseList to prevent this.</p>

<p>Takuya Asada tells me that in Raw mode Tidy is incorrectly
mapping 0xA0 to the entity &#160; causing problems for Shift_JIS
etc. Now fixed. Larry Virden reported a problem with ParseConfig
when one of the arguments was null. I have added a check for
this.</p>

<p>Thomas McGuigan notes that Tidy issues a warning for noframes
elements without a body element. HTML4 is defined so that the
content of the noframes element is restricted to a single body
element. However, it also allows you to omit the start and end
tags for body, something that isn&#39;t allowed for XHTML. I have
changed the code to only issue the warning when generating
XML.</p>

<p>Added new --version or -v option that reports the release date
to the error stream. ParseConfig() now returns false if it
doesn&#39;t use the parameter. This avoids the next argument on
the command line from being swallowed inadvertently, e.g. for
unknown options. Tidy now warns about unrecognized options.</p>

<p>I have revised the way Tidy deals with comments to avoid
problems with repeated hyphens. First &quot;--&quot; is illegal
in XML, and second, the comment syntax for SGML is very error
prone when it comes to when and where you can use hyphens. As a
result, Tidy will now replace repeated hyphens with &quot;=&quot;
characters. My thanks to Yudong Yang and Randy Waki for their
input on this.</p>

<p>Emphasis start tags will now be coerced to end tags when the
corresponding element is already open. For instance
&lt;u&gt;...&lt;u&gt;. This behavior doesn&#39;t apply to font
tags or start tags with attributes. My thanks to Luis M. Cruz for
suggesting this idea.</p>

<p>Jonathan Adair would like Tidy to warn when the same attribute
appears more than once in the same element. This is an error for
both SGML and XML. The best way to make this check would be to
sort the attributes and look for duplicate entries. Other people
have asked for the attributes to be sorted, but I need further
input on the appropriate sort order. As an interim solution, Tidy
uses a simple test which generates n+1 warnings if an attribute
is repeated n times.</p>

<h2>October 1999</h2>

<p>On Unix systems you can get Tidy to look for a config file in
~/.tidyrc or ~your/.tidyrc etc. when the HTML_TIDY environment
variable isn&#39;t set. To enable this feature don&#39;t forget
to uncomment SUPPORT_GETPWNAM in the platform.h file. This
feature won&#39;t work on Windows. My thanks to Todd Lewis who
contributed the code.</p>

<p>Darren Forcier reports that Cold Fusion uses the following
syntax:</p>

<pre>
&lt;CFIF True IS True&gt;
   This should always be output 
&lt;CFELSE&gt;
   This will never output 
&lt;/CFIF&gt;
</pre>

<p>After declaring the CFIF tag in the config file, Tidy was
screwing up the Cold Fusion expression syntax, mapping
&#39;True&#39; to &#39;True=&quot;&quot;&#39; etc. My fix was to
leave such psuedo attributes untouched if they occur on user
defined elements.</p>

<p>Jelks Cabaniss noticed that Tidy wasn&#39;t adding an id
attribute to the map element when converting to XHTML. I have
added routines to do this for both &#39;a&#39; and &#39;map&#39;.
The value of the id attribute is taken from the name
attribute.</p>

<p>Larry Cousin noted that Tidy is now screwing up on option
elements. This proved to be a recently introduced error, which I
have now fixed. Peter Ruevski forwarded an example that caused
Tidy to loop endlessly. The problem was caused by an ol start tag
followed by a b start tag and then an li element. I have solved
the problem with a fix to ParseBlock.</p>

<p>I have revised the way Tidy deals with unexpected content in
lists. Tidy now wraps such content in list items with the style
attribute set to &quot;list-style: none&quot; to suppress list
bullets. If an li element is found unexpectedly in the body or
block-level content, it is wrapped into a ul element with the
style attribute set to &quot;margin-left: -2em&quot;. This
provides a closer match to the observed rendering on current
browsers. I use a couple of postprocessing steps (List2BQ and
BQ2Div) to further clean this up to use div elements. My thanks
to Thomas Ribbrock for sending me a challenging example that led
me to this solution.</p>

<p>A number of people have asked for a config option to set the
alt attribute for images when missing. The alt-text property can
now be used for this purpose. Please note that YOU are
responsible for making your documents accessible to people who
can&#39;t view the images!</p>

<p>Terry Teague spotted a bug in ParseConfigFile() that prevented
Tidy from parsing more that one file. This has been fixed by
setting the char buffer to zero in the call to InitConfig()
before parsing. Terry also noted a few places where I had slipped
back into using malloc and free rather than MemAlloc and MemFree,
now fixed.</p>

<p>Bjoern Hoehrmann notes that the September 27th release mapped
empty paragraphs to br elements, which introduces extra
whitespace in IE and Navigator. The former behavior to strip
empty paragraphs is as per HTML4 and works fine on most browsers
with the exception of Lynx. I have reverted to stripping empty
P&#39;s, but have added an option to leave them alone.</p>

<p>Bjoern also drew my attention to a bug in the September
release where table content is lacking a preceding td or th start
tag. Tidy moves such content to before the table element to match
the observed rendering. This is now working as planned. I have
tweaked the printing behavior when the omit end tags option is
set. It now omits the &lt;/html&gt; as well as the optional start
tags for html, head and body.</p>

<p>Pao-Hsi Huang had problems with the contents of the option
element being discarded. I was unable to reproduce this problem,
but did notice that I unintentionally preserving newlines within
option text. This is now fixed. Shane Harrelson spotted that
table cells containing a single font element, when cleaned
dropped the font element without getting the corresponding style.
Now fixed via a tweak to InlineStyle().</p>

<p>Andre Hinrichs wanted Tidy to do a better job on font elements
with relative size changes. This is in fact rather tricky.
Currently, Tidy uses percentage scaling values for fonts rather
than the enumeration defined by CSS [xx-small | x-small | small |
medium | large | x-large | xx-large]. The first problem is to
match these 7 values onto the 6 define by the font element. The
next problem is caused by the fact that CSS doesn&#39;t provide
matching relative font size values that you could match to the
ones defined for the font element. I have done my best using
percentage values, base on tests with IE and Navigator. If anyone
can come up with a better approach, please let me know.</p>

<p>Tom Berger reported a problem when quote-marks was set to yes.
Using his test file everything is now working fine. Several
people asked for a way to turn off line wrapping. Tidy will now
interpret zero as meaning disable wrapping. Johannes Zellner
wants to include some tcl code in his XML markup and asks for a
way define new tags that behave in the same way as HTML&#39;s pre
element. The new option is new-pre-tags.</p>

<h2>September 1999</h2>

<p>Tidy will now add a type attribute to the style and script
attributes when this is missing. Tidy examines the language
attribute to determine what media type to use. I have also added
code to create an id attribute for anchors when a name attribute
is present, and to report a warning if id and name don&#39;t
match.</p>

<p>Added support for cleaning up HTML generated by Microsoft Word
2000 when you save as &quot;Web Page&quot;. When you set
&quot;word-2000: yes&quot; Tidy makes a Herculean effort to clean
up the mess created when Word 2000 exports to HTML. Word bulks
out HTML with presentation information that allows it to
round-trip documents between HTML and Word without lost of
information. This makes the HTML hard to edit and can cause some
very popular browsers to crash! I haven&#39;t dealt with the VML
markup Word uses for line drawings.</p>

<p>Applied fix to InsertNodeAfterElement() to set
node-&gt;next-&gt;prev. My thanks to &quot;Advocate&quot; for
this. This was only encountered when dealing with PRE tags
containing content illegal for PRE. (Called twice by ParsePre to
move illegal PRE content to be a later sibling of PRE, then open
PRE again afterward)</p>

<p>Change to table row parser so that when Tidy comes across an
empty row, it inserts an empty cell rather than deleting it. This
is consistent with browser behavior and avoids problems with
cells that span rows.</p>

<p>Baruch Even sent extensive patches for improved support for
the PHP preprocessing psuedo tags. You can now use the
&#39;wrap-php: no&#39; to suppress line wrapping within PHP
instructions. In the process of this work, I have created a new
function InsertMisc() for dealing with comments, processing
instructions, ASP and PHP.</p>

<p>I have update the table of tags to include additional
proprietary tags such as server, ilayer, layer, nolayer and
multicol. Using patches sent in by Edward Avis, Tidy now offers a
quiet mode which suppresses the initial welcome message and the
summary report on the number of errors or warnings. Jason
Tribbeck sent in patches to allow config options normally set in
the config file to be set on the command line, by preceding them
with a &quot;--&quot; (no intervening space), for example:</p>

<pre>
  tidy --break-before-br true --show-warnings false
</pre>

<p>Kenichi Numata discovered that Tidy looped indefinitely for
examples similar to the following:</p>

<pre>
&lt;font size=+2&gt;Title
&lt;ol&gt;
&lt;/font&gt;Text
&lt;/ol&gt;
</pre>

<p>I have now cured this problem which used to occur when a
&lt;/font&gt; tag was placed at the beginning of a list element.
If the example included a list item before the &lt;/ol&gt; Tidy
will now create the following markup:</p>

<pre>
&lt;font size=+2&gt;Title&lt;/font&gt;
&lt;blockquote&gt;Text &lt;/blockquote&gt;
&lt;ol&gt;
&lt;li&gt;list item&lt;/li&gt;
&lt;/ol&gt;
</pre>

<p>This uses blockquote to indent the text without the
bullet/number and switches back to the ol list for the first true
list item.</p>

<p>I have worked hard to improve support for server side
preprocessing instructions such as ASP, PHP and Tango. Tidy now
allows you to replace attribute values by such instructions and
is able to fix up the case where the instruction appears without
delimiting quote marks. Tidy supports ASP and PHP in element
content and also in place of attribute value pairs. Support for
Tango is limited to attribute values only.</p>

<p>John Love-Jensen contribute a table for mapping the MacRoman
character set into Unicode. I have added a new charset option
&quot;mac&quot; to support this. Note the translation is one way
and doesn&#39;t convert back to the Mac codes on output.</p>

<p>Some people place &lt;p&gt; at the end of their list items to
introduce whitespace before the next item. I have modified
TrimEmptyElement to coerce empty p elements to br elements to
reproduce this rendering. If a p start tag is found in dt
elements, I now coerce the p to a br. Satwinder Mangat has
alerted me to several such problems. First, text as a direct
child of dl should be wrapped in a dt and not a dd element.
Second, unlike other inline tags, browser only close anchors on a
anchor start or end tag. Actually Navigator and IE differ in how
they handle this. Try the following example:</p>

<pre>
&lt;p&gt;&lt;b&gt;&lt;a href=foo&gt;some text&lt;/i&gt; which should be in the label&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;next para and guess what the emphasis will be?&lt;/p&gt;
</pre>

<p>Navigator 4 renders the second paragraph in normal text while
IE renders it in bold. If you substitute &lt;a&gt; for the
&lt;/i&gt;, once again the browsers differ. IE stops underlining
at the &lt;a&gt; text while Navigator continues until the
&lt;/a&gt;, although it realizes that you can&#39;t click
there.</p>

<p>Satwinder continues: browsers happily interpret center within
a heading. Tidy now moves the center element to be the parent of
the rest of the heading, splitting it as needed, rather than
prematurely ending the heading. The same applies to a div element
within a heading. Satwinder notes that Tidy inserts a ul when an
li is encountered as a direct child of body.</p>

<p>This is a case where you can&#39;t produce a legal HTML file
that renders the same way as browsers handle this. The same
applies to a dt or dd element without an enclosing dl element. I
can report that W3C&#39;s HTML working group was unwilling to
bless naked li&#39;s etc. A similar problem arises for dt
elements when they contain hr, center or div. The specs say this
is illegal, but browsers render it fine!</p>

<p>I have done my best for hr, splitting the dt as needed and
enclosing the hr within a dd. The hr doesn&#39;t look the same,
sadly, as it now starts at the left margin for the dd&#39;st
rather than the left margin for dt&#39;s. I wasn&#39;t sure how
to deal with center and div within dt, and chose to discard
them.</p>

<p>&lt;/br&gt; is now mapped to &lt;br&gt; to match observed
browser rendering. On the same basis, an unmatched &lt;/p&gt; is
mapped to &lt;br&gt;&lt;br&gt;. This should improve fidelity of
tidied files to the original rendering, subject to the
limitations in the HTML standards described above.</p>

<p>Vlad Harchev spotted that Tidy was swallowing the first and
last spaces within inline elements when in a pre element. Now
fixed. Zac Thompson spotted that Tidy didn&#39;t know that the
tags s, strike and u weren&#39;t allowed in HTML4 strict. I have
now fixed this.</p>

<p>Tidy now preserves the last modified time for the files it
writes back to. This was introduced on the suggestion of
Ren&#233; Fritz, who uses the SiteCopy utility to upload recently
modified files to his Web server. By preserving file timestamps
Tidy can be used on all files in a directory without impacting
which ones will be uploaded, the next time SiteCopy runs. This is
implemented using the fstat and futime system calls. If your
platform doesn&#39;t support these calls, set PRESERVEFILETIMES
to 0 in platform.h</p>

<p>I have fixed a bug on lexer.c which screwed up the removal of
doctype elements. This bug was associated with the symptom of
printing an indefinite number of doctype elements.</p>

<h2>August 1999</h2>

<p>Added lowsrc and bgproperties attributes to attribute table.
Rob Clark tells me that bgproperties=&quot;fixed&quot; on the
body elements causes NS and IE to fix the background relative to
the window rather that the document&#39;s content.</p>

<p>Terry Teague kindly drew my attention to several bugs
discovered by other people: My thanks to Randy Waki for
discovering a bug when an unexpected inline end-tag is found in a
ul or ol element. I have added new code to ParseList in parser.c
to pop the inline stack and discard the end tag. I am checking to
see whether a similar problem occurs elsewhere. Randy also
discovered a bug (now fixed) in TrimInitialSpace() in parser.c
which caused it to fail when the element was the first in the
content. John Cumming found that comments cause problems in table
row group elements such as tbody. I have fixed this oversight in
this release.</p>

<p>Bjoern Hoehrmann tells me that bgsound is only allowed in the
head and not in the body, according to the Microsoft
documentation. I have therefore updated the entry in tags.c. The
slide generation feature caused an exception when the original
document didn&#39;t include a document type declaration. The fix
involve setting the link to the parent node when creating the
doctype node.</p>

<h2>26th July 1999</h2>

<p>Jussi Vestman reported a bug in FixDocType in lexer.c which
caused tidy to corrupt the parse tree, leading to an infinite
loop. I independently spotted this and fixed it. Justin
Farnsworth spotted that Tidy wasn&#39;t handling XML processing
instructions which end in ?&gt; rather than just &gt; as
specified by SGML. I have added a new option:
assume-xml-procins:&#160;yes which when set to yes expects the
XML style of processing instruction. It defaults to no, but is
automatically set to yes for XML input. Justin notes that the XML
PIs are used for a server preprocessor format called PHP, which
will now be easy to handle with Tidy. Richard Allsebrook&#39;s
mail prompted me to make sure that the contents of processing
instructions are treated as CDATA so that &lt; and &gt; etc. are
passed through unescaped.</p>

<p>Bill Sowers asks for Tidy to support another server
preprocessor format called Tango which features syntaxt such
as:</p>

<pre>
&lt;b&gt;&lt;@include &lt;@cgi&gt;&lt;appfilepath&gt;includes/message.html&gt;&lt;/b&gt;
</pre>

<p>I don&#39;t have time to add support for Tango in this
release, but would be happy if someone else were to mailin
appropriate changes. Darrell Bircsak reports problems when using
DOS on Win98. I am using Win95 and have been unable to reproduce
the problem. Jelks Cabaniss notes that Tidy doesn&#39;t support
XML document type subset declarations. This is a documented
shortcoming and needs to be fixed in the not too distant future.
Tidy focusses on HTML, so this hasn&#39;t been a priority
todate.</p>

<p>Jussi Vestman asks for an optional feature for mapping IP
addresses to DNS hostnames and back again in URLs. Sadly, I
don&#39;t expect to be able to do this for quite a while. Adding
network support to Tidy would also allow it to check for bad
URLs.</p>

<p>Ryan Youck reports that Tidy&#39;s behavior when finding a ul
element when it expects an li start tag doesn&#39;t match
Netscape or IE. I have confirmed this and have changed the code
for parsing lists to append misplaced lists to the end of the
previous list item. If a new list is found in place of the first
list item, I now place it into a blockquote and move it before
the start of the current list, so as to preserve the intended
rendering.</p>

<p>I have added a new option - enclose-text which encloses any
text it finds at the body level within p elements. This is very
useful for curing problems with the margins when applying style
sheets.</p>

<h2>9th July 1999</h2>

<p>Added bgsound to tags.c. Added &#39;_&#39; to definition of
namechars to match html4.decl. My thanks to Craig Horman for
spotting this.</p>

<p>Jelks Cabaniss asked for the clean option to be automatically
set when the drop-font-tags option is set. Jelks also notes that
a lot of the authoring tools automatically generate, for example,
&lt;I&gt; and &lt;B&gt; in place of &lt;em&gt; and &lt;strong&gt;
(MS FrontPage 98 generated the latter, but FP2000 has reverted to
the former - with no option to change or set it). Jelks suggested
adding a general tag substitution mechanism. As a simpler measure
for now, I have added a new property called logical-emphasis to
the config file for replacing i by em and b by strong.</p>

<h2>7th July 1999</h2>

<p>Fixed recent bug with escaping ampersands and plugged memory
leaks following Terry Teagues suggestions. Changed
IsValidAttrName() in lexer.c to test for namechars to allow - and
: in names.</p>

<h2>2nd July 1999</h2>

<p>Chami noticed that the definition for the marquee tag was
wrong. I have fixed the entry in tags.c and Tidy now works fine
on the example he sent. To support mixing MathML with HTML I have
added a new config option for declaring empty inline tags
&quot;new-empty-tags&quot;. Philip Riebold noted that single
quote marks were being silently dropped unless quote marks was
set to yes. This is an unfortunate bug recently introduced and
now fixed.</p>

<p>Paul Smith sent in an example of badly formed tables, where
paragraph elements occurred in table rows without enclosing table
cells. Tidy was handling this by inserting a table cell. After
comparison with Netscape and IE, I have revised the code for
parsing table rows to move unexpected content to just before the
table.</p>

<h2>26th June 1999</h2>

<p>Tony Leneis reports that Tidy incorrectly thinks the table
frame attribute is a transitional feature. Now fixed. Chami
reported a bug in ParseIndent in config.c and that onsumbit is
missing from the table of attributes. Both now fixed. Carsten
Allefeld reports that Tidy doesn&#39;t know that the valign
attribute was introduced in HTML 3.2 and is ok in HTML 4.0
strict, necessitating a trivial change to attrs.c.</p>

<p>Axel Kielhorn notes that Tidy wasn&#39;t checking the preamble
for the DOCTYPE tag matches either &quot;html PUBLIC&quot; or
&quot;html SYSTEM&quot;. Bill Homer spotted changes needed for
Tidy to compile with SGI MIPSpro C++. All of Bill&#39;s changes
have been incorporated, except for the include file
&quot;unistd.h&quot; (for the unlink call) which isn&#39;t
available on win32. To include this define NEEDS_UNISTD_H</p>

<p>Bjoern Hoehrmann asked for information on how to use the
result returned by Tidy when it exits. I have included a example
using Perl that Bjoern sent in. Bodo Eing reported that Tidy gave
misleading warning when title text is emphasized. It now reports
a missing &lt;/title&gt; before any unexpected markup.</p>

<p>Bruce Aron says that many WYSIWYG HTML editors place a font
element around an hypertext link enclosing the anchor element
rather that its contents. Unfortunately, the anchor element then
overrides the color change specified by the font element! I have
added an extra rule to ParseInline to move the font element
inside an anchor when the anchor is the only child of the font
element. Note CSS is a better long term solution, and Tidy can be
used to replace font elements by style rules using the clean
option.</p>

<p>Carsten Allefeld reported that valign on table cells caused
Tidy to mislabel content as HTML 4.0 transitional rather than
strict. Now fixed. A number of people said they expected the
quote-mark option to apply to all text and not just to attribute
values. I have obliged and changed the option accordingly.</p>

<p>Some people have wondered why &quot;&lt;/&quot; causes an
error when present within scripts. The reason is that this
substring is not permitted by the SGML and XML standards. Tidy
now fixes this by inserting a backslash, changing the substring
to &quot;&lt;\/&quot;. Note this is only done for JavaScript and
not for other scripting languages.</p>

<p>Chami reported that onsubmit wasn&#39;t recognized by Tidy -
now fixed. Chris Nappin drew my attention to the fact that script
string literals in attributes weren&#39;t being wrapped correctly
when QuoteMarks was set to no. Now fixed. Christian Zuckschwerdt
asked for support for the POSIX long options format e.g. --help.
I have modified tidy.c to support this for all the long options.
I have kept support for -help and -clean etc.</p>

<p>Craig Horman sent in a routine for checking attribute names
don&#39;t contain invalid characters, such as commas. I have used
this to avoid spurious attribute/value pairs when a quotemark is
misplaced. Darren Forcier is interested in wrapping Tidy up as a
Win32 DLL. Darren asked for Tidy to release its memory resources
for the various tables on exit. Now done, see DeInitTidy() in
tidy.c</p>

<p>Darren also asks about the config file mechanism for declaring
additional tags, e.g. <b>new-blocklevel-tags: cfoutput,
cfquery</b> for use with Cold Fusion. You can add inline and
blocklevel elements but as yet you can&#39;t add empty elements
(similar to br or hr) or to change the content model for the
table, ul, ol and dl elements. Note that the indent option
applies to new elements in the same way as it does for built-in
elements. Tidy will accept the following:</p>

<pre>
&lt;cfquery name=&quot;MyQuery&quot; datasource=&quot;Customer&quot;&gt;
 select CustomerName from foo where x &gt; 1
&lt;/cfquery&gt;

&lt;cfoutput query=&quot;MyQuery&quot;&gt;
  &lt;table&gt;
    &lt;tr&gt;
    &lt;td&gt;#CustomerName#&lt;/TD&gt;
    &lt;/tr&gt;
  &lt;/table&gt;
&lt;/cfoutput&gt;
</pre>

<p>but the next example <b>won&#39;t</b> since you can&#39;t as
yet modify the content model for the table element:</p>

<pre>
&lt;cfquery name=&quot;MyQuery&quot; datasource=&quot;Customer&quot;&gt;
 select CustomerName from foo where x &gt; 1
&lt;/cfquery&gt;

&lt;table&gt;
  &lt;cfoutput query=&quot;MyQuery&quot;&gt;
    &lt;tr&gt;
    &lt;td&gt;#CustomerName#&lt;/TD&gt;
    &lt;/tr&gt;
  &lt;/cfoutput&gt;
&lt;/table&gt;
</pre>

<p>I have been studying richer ways to support modular extensions
to html using assertions and a generalization of regular
expressions to trees. This work has led a tool for generating
DTDs named <b>dtdgen</b> and I am in the process of creating a
further tool for verification. More information is available in
my note on <a href=
"http://www.w3.org/People/Raggett/dtdgen/Docs">Assertion
Grammars</a>. Please contact me if you are interested in helping
with this work.</p>

<p>David Fallon is interested in using Tidy to dynamically repair
markup in an HTML editor as people type. My recommendation is to
take advantage of the tables in tags.c and attrs.c for this, and
to defer to application of the full range of heuristics to such a
time as saving to disk or when explicitly requested. The CM_OPT
property in the tags table indicates that the end tag is
optional, while CM_EMPTY indicates that an element is <i>
empty</i>, i.e. has no content.</p>

<p>Betsy Miller reports: <i>I tried printing the HTML Tidy page
for a class I am teaching tomorrow on HTML, and everything in the
&quot;green&quot; style (all of the examples) print in the
smallest font I have ever seen (in fact they look like tiny
little horizontal lines). Any explanation?</i>.</p>

<p>Yes. This is a problem with Internet Explorer and Style
Sheets. The Tidy page includes a CSS style sheet that tries to
make the size of the font used for the examples 80% smaller than
for normal text. Internet Explorer gets this wrong, picking a
very much smaller font. I am hoping this bug is fixed in the IE
5.0 release. I have changed the style sheet to work around
this.</p>

<p>Francisco Guardiola writes that Tidy wasn&#39;t fixing
frameset documents with body elements unenclosed in noframes
elements. Now fixed. Frederik Fouvry found that comments after
the html end tag generated a warning for content after body. I
can&#39;t reproduce this symptom and assume it was fixed in an
earlier release.</p>

<p>Indrek Toom wants to know how to format tables so that tr
elements indent their content, but td tags do not. The solution
is to use <i>indent: auto</i>. Jelks Cabaniss noted that the
clean option created style rules with tag names in uppercase,
which would cause problems for Extensible HTML (xhtml). This
prompted me to overhaul Tidy to switch to lower case for that tag
tables and literals. I have adopted Jelks&#39; suggestion for
adding support for a doctype property in config files. This
supports <em>omit, auto, strict, loose</em> or a string
specifying the fpi (formal public identifier).</p>

<p>Johannes Koch notes that Tidy doesn&#39;t fix up the doctype
correctly when bursting to slides. He says that if a document
contains the HTML 4.0 strict DT declaration, then the slides also
include the same strict DT declaration, but also contain the
center tag which does not appear in the strict DTD. I have
applied a simple work around, which is to remove the original
doctype when bursting to slides.</p>

<p>I have extended the support for the ASP preprocessing syntax
to cope with the use of ASP within tags for attributes. I have
also added a new option <tt>wrap-asp</tt> to the config file
support to allow you to turn off wrapping within ASP code. Thanks
to Ken Cox for this idea.</p>

<p>Larry Virden asked for a compile-time option for setting the
config file, he says &quot;The reason it would be useful is to be
able to define a set of commonly used additional tags. For
instance, our site is starting to use a lot of ColdFusion. I
would love to be able to put the CF tags into a site wide file so
that users of tidy automatically get them defined&quot;. You can
now do this by defining CONFIG_FILE in platform.h</p>

<p>Lo&#239;c Tr&#233;gan asks: Is there a way to generate a
&quot;light&quot; xml, with no &quot;&lt;!DOCTYPE...&gt;&quot;
and &quot;xlmns=...&quot;? I have tweaked the code to allow the
doctype property to apply when outputting XML, and added a new
property &quot;add-xml-pi&quot; to control whether an
&lt;?xml?&gt; processing instruction is added or not. To generate
a minimal XML document, you can set the xml-out property to yes,
the doctype and add-xml-pi property to no.</p>

<p>Marc Jauvin has been using Windows Application to generate Web
pages and found that some of them generate very
&quot;non-portable&quot; HTML. One of the problems that is often
introduced is the use of &quot;\&quot; in URLs instead of
&quot;/&quot; which confuses Unix Web servers. To deal with this
I have introduced the &quot;fix-backslash&quot; property. This
has been set by default to yes, but can be set to no if that
causes problems.</p>

<p>The new property <tt>indent-attributes</tt> when set to yes
places each attribute on a new line. Note that the attributes are
only indented one space. Paul Ossenbruggen asked for something
slightly different, where the second and subsequent attributes
start on a new line and are indented to line up under the first
attribute. That proved to involve rather more work to implement
than I have time for right now. I plan to work some more on this
for a future release.</p>

<p>Peter Jeremy reported that when an error file is specified to
tidy (-f file), the error file is opened for every HTML file
specified on the command line, but not closed until all HTML
files have been processed. If a large number of files are
specified on the command line (e.g. processing the FreeBSD
handbook), this can overflow the process or system file
descriptor table. I have now fixed this so that the error file is
only opened once.</p>

<p>Rafi Stern notes: I have entered output-xml: yes in my config
file, not output-xhtml. Tidy second guesses me and adds the xmlns
attribute for XHTML at the head of my file, which I then have to
remove as this interferes with my XSLT parser. Fixed along with
the other bugs reported by Rafi.</p>

<p>Steffen Ullrich and Andy Quick both spotted a problem with
attribute values consisting of an empty string, e.g. <tt>
alt=&quot;&quot;</tt>. This was caused by bugs in tidy.c and in
lexer.c, both now fixed. Jussi Vestman noted Tidy had problems
with hr elements within headings. This appears to be an old bug
that came back to life! Now fixed. Jussi also asked for a config
file option for fixing URLs where non-conforming tools have used
backslash instead of forward slash.</p>

<p>An example from Thomas Wolff allowed me to the idea of
inserting the appropriate container elements for naked list items
when these appear in block level elements. At the same time I
have fixed a bug in the table code to infer implicit table rows
for text occurring within row group elements such as thead and
tbody. An example sent in by Steve Lee allowed me to pin point an
endless loop when a head or body element is unexpectedly found in
a table cell.</p>

<h2>15th April 1999</h2>

<p>Another minor release. Jacob Sparre Andersen reports a bug
with &amp;quot; in attribute values. Now fixed. Francisco
Guardiola reports problems when a body element follows the
frameset end tag. I have fixed this with a patch to ParseHTML,
ParseNoFrames and ParseFrameset in parser.c Chris Nappin wrote in
with the suggestion for a config file option for enabling
wrapping script attributes within embedded string literals. You
can now do this using
&quot;wrap-script-strings:&#160;yes&quot;.</p>

<h2>14th April 1999</h2>

<p>Added check for Asp tags on line 2674 in parser.c so that Asp
tags are not forcibly moved inside an HTML element. My thanks to
Stuart Updegrave for this. Fixed problem with &amp; entities.
Bede McCall spotted that &amp;amp; was being written out as
&amp;amp;amp;. The fix alters ParseEntity() in lexer.c</p>

<h2>12th April 1999</h2>

<p>Added a missing &quot;else&quot; on line 241 in config.c
(thanks for Keith Blakemore-Noble for spotting this). Added
config.c and .o to the Makefile (an oversight in the release on
the 8th April).</p>

<h2>8th April 1999</h2>

<h4>Localization:</h4>

<p>All the message text is now defined in localize.c which should
make it a tad easier to localize Tidy for different
languages.</p>

<h4>Config file support:</h4>

<p>I have added support for configuring tidy via a configuration
file. The new code is in config.h which provides a table driven
parser for RFC822 style headers. The new command line option
-config &lt;filename&gt; can be used to identify the config file.
The environment variable &quot;HTML_TIDY&quot; may be used to
name the config file. If defined, it is parsed before scanning
the command line. You are advised to use an absolute path for the
variable to avoid problems when running tidy in different
directories.</p>

<h4>Allan Kuchinsky:</h4>

<p>Reports that the XML DOM parser by Eduard Derksen screws up on
&#160;, naked &amp; and % in URLs as well as having problems with
newlines after the &#39;=&#39; before attribute values.</p>

<p>I have tweaked PrintChar when generating XML to output &#160;
in place of &amp;nbsp; and &amp;amp; in place of &amp;. In
general XHTML when parsed as well-formed XML shouldn&#39;t use
named entities other than those defined in XML 1.0. Note that
this isn&#39;t a problem if the parser uses the XHTML DTDs which
import the entity definitions.</p>

<h4>Allan Odgaard:</h4>

<p>When tidy encounter entities without a terminating semi-colon
(e.g. &quot;&#169;&quot;) then it correctly outputs
&quot;&#169;&quot;, but it doesn&#39;t report an error.</p>

<p>I have added a ReportEntityError procedure to localize.c and
updated ParseEntity to call this for missing semicolons and
unknown entities.</p>

<h4>Andreas Buchholz:</h4>

<p>Tidy warns if table element is missing. This is incorrect for
HTML 3.2 which doesn&#39;t define this attribute.</p>

<p>The summary attribute was introduced in HTML 4.0 as an aid for
accessibility. I have modified CheckTABLE to suppress the warning
when the document type explicitly designates the document as
being HTML 2.0 or HTML 3.2.</p>

<h4>Andy Brown:</h4>

<p>I have renamed the field from class to tag_class as
&quot;class&quot; is a reserved word in C++ with the goal of
allowing tidy to be compiled as C++ e.g. when part of a larger
program.</p>

<p>I have switched to Bool and the values yes and no to avoid
problems with detecting which compilers define bool and those
that don&#39;t.</p>

<p>Andy would prefer a return code or C++ exception rather than
an exit. I have removed the calls to exit from pprint.c and used
a long jump from FatalError() back to main() followed by
returning 2. It should be easy to adapt this to generate a C++
exception.</p>

<p>Sometimes the prev links are inconsistent with next links. I
have fixed some tree operations which might have caused this. Let
me know if any inconsistencies remain.</p>

<h4>Ann Navarro:</h4>

<p>Would like to be able to use:</p>

<pre>
   tidy file.html | more
</pre>

<p>to pause the screen output, and/or full output passing to file
as with</p>

<pre>
   tidy file.html &gt; output.txt
</pre>

<p>Tidy writes markup to stdout and errors to stderr.
&#39;More&#39; only works for stdout so that the errors fly by.
My compromise is to write errors to stdout when the markup is
suppressed using the command line option -e or &quot;markup:
no&quot; in the config file.</p>

<h4>html-kit@chamisplace.com</h4>

<p>Writes asking for a single output routine for Tidy. Acting on
his suggestion, I have added a new routine tidy_out() which
should make it easier to embed HTML Tidy in a GUI application
such as HTML-Kit. The new routine is in localize.c. All input
takes place via ReadCharFromStream() in tidy.c, excepting command
line arguments and the new config file mechanism.</p>

<p>Chami also asks for single routines for initializing and
de-initializing Tidy, something that happens often from the GUI
environment of HTML-Kit. I have added InitTidy() and DeInitTidy()
in tidy.c to try to satisfy this need. Chami now supports an
online interface for Tidy at the URL:</p>

<pre>
   <a href=
"http://www.chamisplace.com/asp/hk.asp">http://www.chamisplace.com/asp/hk.asp</a>
</pre>

<p>He further asks for Tidy to optionally output a length
parameter whenever possible. This could represent the length of
the element, attribute or code block related to the error. An
online validator could then highlight the starting and ending
columns which may be easier for beginners to understand, rather
than pointing to a single character column. I will investigate
this for a future release.</p>

<h4>Chang Hyun Baek:</h4>

<p>Reports a problem when generating XML using -iso2022. Tidy
inserts ?/p&lt; rather than &lt;/p&gt;. I tried Chang&#39;s test
file but it worked fine with in all the right places. Please let
me know if this problem persists.</p>

<h4>Christian Ruetgers:</h4>

<p>When using -indent option Tidy emits a newline before which
alters the layout of some tables.</p>

<p>I note that browsers aren&#39;t conforming to the SGML spec on
generally ignoring a newline immediately after start tags and
immediately before end tags. Netscape does this for pre elements
but not for other tags! My work around is to avoid additional
newlines for the content of th and td elements, except where
their content starts with a block level element. This kind of
thing is getting really hairy!</p>

<h4>Christian Pantel:</h4>

<p>Would like the servlet tag added to tidy. This looks very
similar to applet and used for preprocessing document content
before delivery. Servelet acts as a container for param elements
and fallback content to be shown if the server doesn&#39;t
support servlet. I have added it as a proprietary tag and parse
it in the same way as applet.</p>

<p>Christian also reports that &lt;td&gt;&lt;hr/&gt;&lt;/td&gt;
caused Tidy to discard the &lt;hr/&gt; element. I have fixed the
associated bug in ParseBlock.</p>

<h4>Chuck Baslock:</h4>

<p>Points out that an isolated &amp; is converted to &amp; in
element content and in attribute values. This is in fact correct
and in agreement with the recommendations for HTML 2.0
onwards.</p>

<h4>Craig Horman:</h4>

<p>Reports that Tidy loops indefinitely if a naked LI is found in
a table cell. I have patched ParseBlock to fix this, and now
successfully deal with naked list items appearing in table cells,
clothing them in a ul.</p>

<h4>Craig Johnson:</h4>

<p>Reports that Tidy gets confused by &lt;/comment&gt; before the
doctype. This is apparently inserted by some authoring tool or
other. I have patched Tidy to safely recover from the
unrecognized and unexpected end tag without moving the parse
state into the head or body.</p>

<h4>Daniel Vogelheim:</h4>

<p>Asks for Tidy to recognize obsolete elements such as LISTING
and to replace them by more modern equivalents, in this case pre.
I have added code to issue a warning and replace such elements as
xmp, listing, plaintext by pre, and dir and menu by ul. Daniel
also asks for a means to suppressing warnings, i.e. to only
report errors. I have added the boolean &quot;show-warnings&quot;
to the config file support to deal with this and split off
warnings to ReportWarnings().</p>

<h4>Dan Rudman:</h4>

<p>Would love a version of Tidy written in Java. This is a big
job. I am working on a completely new implementation of Tidy,
this time using an object-oriented approach but I don&#39;t
expect to have this done until later this year. <b>
DEFERRED</b></p>

<h4>David Brooke:</h4>

<p>Reports that when tidying an XMLfile with characters above 127
Tidy is outputting the numeric entity followed by the character.
I have fixed this by a patch to PPrintChar() for XmlTags.</p>

<h4>David Getchell:</h4>

<p>Reports that Tidy thinks an ol list is HTML 4.0 when you use
the type attribute. I have fixed an error in attrs.c to correct
this feature to first appearing in HTML 3.2.</p>

<h4>Drew Adams:</h4>

<p>Reported problems when using comments to hide the contents of
script elements from ancient browsers. I wasn&#39;t able to
reproduce the problem, and guess I fixed it earlier.</p>

<p>Drew also reported a problem which on further investigation is
caused by the very weird syntax for comments in SGML and XML. The
syntax for comments is really error prone:</p>

<pre>
 &lt;!--[text excluding --]--[[whitespace]*--[text excluding --]--]*&gt;
</pre>

<p>This means that &lt;!----&gt; is a complete comment but
&lt;!------&gt; is not since the parser is expecting a matching
terminating -- and as it doesn&#39;t find the -- it ploughs on
and on treating the rest of the markup as a comment unless it
finds another end comment. I have added a rule of thumb (a
heuristic) for detecting this situation. Basically I count the
number of comment groups without other characters and if the
count is &gt; 2 and a &#39;&gt;&#39; is seen, a warning is
generated.</p>

<p>Drew goes on to comment on the -clean option. This made me
take another look at the relative font sizes I am using for the
absolute font sizes for 0 through 6. I have tweaked them to get a
reasonable match before/after applying -clean as viewed on NS4
and IE4. Font size=3 is taken as the normal body font size and as
such the font element is silently dropped unless it also defines
a color.</p>

<p>I have also added InlineStyle to deal with the cases where an
inline element has as its only child a font element. A further
possibility would be to promote style properties common to all
children of an element to the element. I will have to leave this
for future work.</p>

<p>Drew asks why &lt;/ is not allowed in script content. The
answer is that SGML treats &lt;/ as delimiting the end of CDATA
element content, so that it ends prematurely before the
&lt;/script&gt; end tag. Browsers tend not to follow the SGML
standard in this respect, but Tidy is designed to help you do
so.</p>

<h4>Guus Goos:</h4>

<p>Notes that tidy *.html doesn&#39;t work under DOS. This is
because DOS unlike Unix doesn&#39;t expand names with wildcards
to the list of matching file names. This is a right nuisance and
one more reason why Linux is gaining popularity. I plan to
provide a work around in a future release of Tidy. Are there any
free drop-in replacements for the DOS shell that fix this
problem?</p>

<h4>Jack Horsfield:</h4>

<p>Like a number of others would like list items and table cells
to be output compactly where possible. I have added a flag to
avoid indentation of content to tags.c that avoids further
indentation when the content is inline, e.g.</p>

<pre>
 &lt;ul&gt;
   &lt;li&gt;some text&lt;/li&gt;
   &lt;li&gt;
     &lt;p&gt;
        a new paragraph
     &lt;/p&gt;
   &lt;/li&gt;
 &lt;/ul&gt;
</pre>

<p>This behavior is enabled via &quot;smart-indent: yes&quot; and
overrides &quot;indent: no&quot;. Use &quot;indent-spaces:
5&quot; to set the number of spaces used for each level of
indentation.</p>

<h4>Jeff Young:</h4>

<p>Has a few suggestions that will make Tidy work with XSL.
Thanks, I have incorporated all of them into the new release.</p>

<h4>Jelks Cabaniss:</h4>

<p>Reports that the Tidy thinks the end tag is missing if the
script element has no content. I have patched ParseScript to fix
this. Jelks also asks for a way to ask Tidy to hide the contents
of script and style elements; a way to avoid promoting inline
styles with -clean to style rules as a work around for a bug in
IE for URLs with relative URLs; finally, a way to avoid empty
elements being discarded, especially if they define an ID for
scripting. Very reasonable, but I would prefer leave these to a
future release. (This release is big enough right now!).</p>

<p>One thing I can satisfy right away is a mailing list for Tidy.
html-tidy@w3.org has been created for discussing Tidy and I have
placed the details for subscribing and accessing the Web archive
on the Tidy overview page.</p>

<h4>Johannes Koch:</h4>

<p>Reports that Tidy isn&#39;t quite right about when it reports
the doctype as inconsistent or not. I have tweaked HTMLVersion()
to fix this. Let me know if any further problems arise.</p>

<h4>John Tobler:</h4>

<p>Wants to know how to get Tidy to preserve his explicit
entities e.g. &quot; and &#160;. Currently Tidy interprets all
entities as character values and as such has no way to
distinguish whether these were derived from entities or not. To
help John with this release you can use &quot;quote-marks:
yes&quot; in the config file if you want all &quot; marks to
appear as &quot; and &quot;quote-nbsp: yes&quot; if you want
non-breaking spaces to be shown as entities. Note that for XML in
general &#160; is not-predeclared, so you should also use
&quot;numeric-entities: yes&quot;. This doesn&#39;t apply to
XHTML though.</p>

<p>John also reports that the weirdly complex URLs using the
javascript: scheme as used by www.bookmarklets.com can cause Tidy
indigestion. I have made Tidy aware of which attributes are using
Javascript and disabled the missing quote mark heuristic for
these. I have also tweaked the way unknown entities are reported
to say that the markup have contain unescaped ampersands.</p>

<h4>Mathew Cepl:</h4>

<p>Notes that dir and menu are deprecated and not allowed in
HTML4 strict. I have updated the entry in the tags table for
these two. I also now coerce them automatically to ul when -clean
is set.</p>

<h4>Maurice Buxton:</h4>

<p>Reports that some implementations of gcc don&#39;t work with
the current compiler directive Tidy uses to avoid duplicate
typedefs for uint and ulong. I don&#39;t have a truly platform
independent solution for this, so you may need to edit platform.h
if the code doesn&#39;t compile out of the box on your
platform.</p>

<h4>Osma Ahvenlampi:</h4>

<p>Found that Tidy is confused by map elements in the head. Tidy
knows that map is only allowed in the body and thinks the author
has left out the</p>

start tag. Thereafter elements which it knows only belong in the
head are moved to the head, so things should work out ok. Osma
also reports having difficulties with non-breaking spaces, but I
was unable to reproduce these with the new release of Tidy, so
perhaps the problems have been fixed. 

<h4>Paul Ward:</h4>

<p>Reports that Tidy caused Javascript errors when it introduced
linebreaks in Javascript attributes. Tidy goes to some efforts to
avoid this and I am interested in any reports of further problems
with the new release.</p>

<h4>Rafi Stern:</h4>

<p>Would like Tidy to warn when a tag has an extra quote mark, as
in &lt;a href=&quot;xxxxxx&quot;&quot;&gt;. I have patched
ParseAttribute to do this.</p>

<h4>Rene Fritz:</h4>

<p>Reported a space being inserted at the end of lines when a the
text is wrapped at the start of hypertext links. This isn&#39;t
occurring with this release, so I guess the problem was solved a
while back. Rene also suggests that Tidy could be used to add and
remove metadata and attributes etc. for a group of files, e.g. to
add a link to a style sheet or to assert attribution. This sounds
like a good idea for work in the future.</p>

<h4>Shane McCarron:</h4>

<p>Reports that Tidy sometimes wraps text within markup that
occurs in the context of a pre element. I am only able to repeat
this when the markup wraps within start tags, e.g. between
attribute values. This is perfectly legitimate and doesn&#39;t
effect rendering.</p>

<h4>Steven Lobo:</h4>

<p>Notes that Tidy doesn&#39;t remove entities such as &amp;nbsp;
or &amp;copy; which aren&#39;t defined by XML 1.0. That is true -
these entities <b>are</b> fine if you are using XHTML. If you
want to generate generic XML then you need to use the -n option
or to set &quot;numeric-entities: yes&quot; in the config file.
This will then output all such entities in their numeric form or
as direct character values according to the character encoding
flags.</p>

<h4>Steven Pemberton:</h4>

<p>Comments that he would like Tidy to replace naked &amp; in
URLs by &amp;. You can now use &quot;quote-ampersands: yes&quot;
in the config file to ensure this. Note that this is always done
when outputting to XML where naked &#39;&amp;&#39; characters are
illegal.</p>

<p>Steven also asks for a way to allow Tidy to proceed after
finding unknown elements. The issue is how to parse them, e.g. to
treat them as inline or block level elements? The latter would
terminate the current paragraph whereas the former would not.</p>

<p>If treated as inline, presumably, unknown tags should be
treated specially, for instance, normal inline end tags close the
currently open inline element, but this doesn&#39;t feel right
for unknown tags. What should the content model for unknown tags
be - flow? Again its far from obvious. One way to avoid these
difficulties would be to provide a means for authors to declare
unknown tags in the config file.</p>

<p>You can now declare new inline and block-level tags in the
config file, e.g.:</p>

<pre>
define-inline-tags: foo, bar
define-blocklevel-tags: blob
</pre>

<p>The content model for new tags allows for block or inline
content. Steven further comments that some authors use ul without
an li to indent content. Tidy currently coerces these to wrap the
content within an li which alters the rendering. He suggests
using blockquote instead. I have done this, and if you use the
-clean option at the same time, it gets replaced by a div element
with a class and style rule for indenting the content.</p>

<h4>Stuart Updegrave:</h4>

<p>Would like to be able to coerce attributes to uppercase. I
have added support for &quot;uppercase-attributes: yes&quot; for
this. Stuart also asks for Tidy to support Microsoft&#39;s ASP
tags. These are part of Microsoft&#39;s server-side scripting
model (similar to CGI). I have treated ASP tags in the same way
as processing instructions, and they don&#39;t effect the version
of HTML as they are assumed to have been interpreted before
delivery to the client.</p>

<p>Stuart is also interested in having Tidy reading from and
writing back to the Windows clipboard. This sounds interesting
but I have to leave this to a future release.</p>

<h4>Terry Cassidy:</h4>

<p>Points out that Tidy doesn&#39;t like &quot;top&quot; or
&quot;bottom&quot; for the align attribute on the caption
element. I have added a new routine to check the align attribute
for the caption element and cleaned up the code for checking the
document type.</p>

<h4>Xavier Plantefeve:</h4>

<p>Suggests that I should ensure that the options are self
consistent, e.g. if -asxml is set, then this should imply lower
case and override any instruction to omit optional end tags.
Accordingly, I have introduced a new routine AdjustConfig() that
is applied after reading the command line and config files and
before tidying any files.</p>

<p>Xavier wonders whether name attributes should be replaced or
supplemented by id attributes when translating HTML anchors to
XHTML. This is something I am thinking about for a future release
along with supplementing lang attributes by xml:lang
attributes.</p>

<h4>Zdenek Kabelac:</h4>

<p>Asks for headings and paragraphs to be treated specially when
other tags are indented. I have dealt with this via the new
smart-indent mechanism.</p>

<h2>22nd February 1999</h2>

<p>Tidy can now fix up XML empty tags for which the attribute
values are unquoted, e.g. &lt;br clear=all/&gt;. Care is taken to
avoid this being applied to tags with URLs, e.g. &lt;a
href=http://acme.com/&gt; where the / is part of the attribute
value and doesn&#39;t signify an empty tag. Authors are advised
to always quote attribute values to avoid such problems!</p>

<h2>22nd January 1999</h2>

<p>Tidy no longer complains about a missing &lt;/tr&gt; before a
&lt;tbody&gt;. Added link to a free <a href= 
"http://www.chami.com/free/html-kit/">win32 GUI for tidy</a>.</p>

<h2>11th January 1999</h2>

<p>Added a link to the OS/2 distribution of Tidy made available
by Kaz SHiMZ. No changes to Tidy&#39;s source code.</p>

<h2>7th January 1999</h2>

<p>Fixed bug in ParseBlock that resulted in nested table
cells.</p>

<p>Fixed clean.c to add the style property
&quot;text-align:&quot; rather than &quot;align:&quot;.</p>

<p>Disabled line wrapping within HTML alt, content and value
attribute values. Wrapping will still occur when output as
XML.</p>

<h2>16th December 1998</h2>

<p>This release fixes a problem with missing quotemarks in
attribute values introduced in the December 14th release. It also
fixes problems with parsing tables when the table cells include
naked list items and when unexpected end tags are encountered for
td and tr cells. Warnings are now generated for unknown entities
(those not defined by HTML 4.0). It may be worth thinking about a
new option to determine how to handle these, especially for
XML.</p>

<h2>14th December 1998</h2>

<p>Rewrote parser for elements with CDATA content to fix problems
with tags in script content.</p>

<p>New pretty printer for XML mode. I have also modified the XML
parser to recognize xml:space attributes appropriately. I have
yet to add support for CDATA marked sections though.</p>

<p>script and noscript are now allowed in inline content.</p>

<p>To make it easier to drive tidy from scripts, it now returns 2
if any errors are found, 1 if any warnings are found, otherwise
it returns 0. Note tidy doesn&#39;t generate the cleaned up
markup if it finds errors other than warnings.</p>

<p>Fixed bug causing the column to be reported incorrectly when
there are inline tags early on the same line.</p>

<p>Added -numeric option to force character entities to be
written as numeric rather than as named character entities.
Hexadecimal character entities are never generated since Netscape
4 doesn&#39;t support them.</p>

<p>Entities which aren&#39;t part of HTML 4.0 are now passed
through unchanged, e.g. &amp;precompiler-entity; This means that
an isolated &amp; will be pass through unchanged since there is
no way to distinguish this from an unknown entity.</p>

<p>Tidy now detects malformed comments, where something other
than whitespace or &#39;--&#39; is found when &#39;&gt;&#39; is
expected at the end of a comment.</p>

<p>The &lt;br&gt; tags are now positioned at the start of a blank
line to make their presence easier to spot.</p>

<p>The -asxml mode now inserts the appropriate Voyager html
namespace on the html element and strips the doctype. The html
namespace will be usable for rigorous validation as soon as W3C
finishes work on formalizing the definition of document profiles,
see: <a href="http://www.w3.org/TR/WD-html-in-xml/">
WD-html-in-xml</a>.</p>

<h2>13th November 1998 and earlier releases</h2>

<p>Fixed bug wherein &lt;style&#160;type=text/css&gt; was written
out as &lt;style&#160;type=&quot;text/ss&quot;&gt;.</p>

<p>Tidy now handles wrapping of attributes containing JavaScript
text strings, inserting the line continuation marker as needed,
for instance:</p>

<pre>
onmouseover=&quot;window.status=&#39;Mission Statement, \
Our goals and why they matter.&#39;; return true&quot;
</pre>

<p>You can now set the wrap margin with the -wrap option.</p>

<p>When the output is XML, tidy now ensures the content starts
with &lt;?xml version=&quot;1.0&quot;?&gt;.</p>

<p>The Document type for HTML 2.0 is now &quot;-//IETF//DTD HTML
2.0//&quot;. In previous versions of tidy, it was incorrectly set
to &quot;-//W3C//DTD HTML 2.0//&quot;.</p>

<p>When using the -clean option isolated FONT elements are now
mapped to SPAN elements. Previously these FONT elements were
simply dropped.</p>

<p>NOFRAMES now works fine with BODY element in frameset
documents.</p>

<h2>Future releases may address:</h2>

<ul>
<li>Recursion through subdirectories, so you can fix up your
entire web site at one go. This assumes I can find a way that is
portable across a wide range of platforms!</li>

<li>Support for W3C&#39;s <a href= 
"http://www.w3.org/TR/REC-DOM-Level-1/">Document Object Model</a>
(DOM) level one.</li>

<li>Full validation of all attribute values.</li>

<li>Mapping Unicode bidi control characters to HTML tags.</li>

<li>Full support for parsing XML (still somewhat limited).</li>

<li>How to say which XML elements should be printed
&quot;inline&quot;.</li>

<li>Acting on the XML encoding attribute, e.g.
&lt;?xml&#160;encoding=&quot;iso-8859-1&quot;&gt;</li>

<li>Improved mapping from HTML presentation attributes/elements
to CSS.</li>
</ul>
</body>
</html>