1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390
|
#LyX 1.6.2 created this file. For more info see http://www.lyx.org/
\lyxformat 345
\begin_document
\begin_header
\textclass article
\begin_preamble
\usepackage{a4wide}
\usepackage{times}
\end_preamble
\use_default_options false
\language english
\inputencoding auto
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry false
\use_amsmath 1
\use_esint 0
\cite_engine basic
\use_bibtopic false
\paperorientation portrait
\leftmargin 1.5cm
\topmargin 1cm
\rightmargin 1.5cm
\bottommargin 1cm
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\author ""
\author ""
\end_header
\begin_body
\begin_layout Title
The new Unix RTL.
\end_layout
\begin_layout Author
Marco van de Voort (marco@freepascal.org)
\end_layout
\begin_layout Section*
Versions
\end_layout
\begin_layout Standard
Current version: 1.3
\end_layout
\begin_layout Description
1.0 The version of 2005.
No version but the date in the PDF.
\end_layout
\begin_layout Description
1.1 Unversioned PDF with
\begin_inset Quotes eld
\end_inset
June 10th, 2008
\begin_inset Quotes erd
\end_inset
as date in it.
Mostly adds the
\begin_inset Quotes eld
\end_inset
prefix
\begin_inset Quotes erd
\end_inset
section.
\end_layout
\begin_layout Description
1.2 First numbered version.
\end_layout
\begin_layout Description
1.3 Minor changes, unixtype, libc wiki link
\end_layout
\begin_layout Section
Introduction
\end_layout
\begin_layout Standard
As a lot of people already noticed, the Unix rtl has been nearly entirely
changed in the 1.1.x/1.9.x/2.0.x branch
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
These versions are all the same series.
It was called 1.1.x when pre-beta, 1.9.x when in beta stage, and will be 2.0.x
when released.
\end_layout
\end_inset
.
There are still more things to change, but the basic restructuring has
been done I think.
This document tries to explain some of the design considerations behind
these changes.
Recently, the wiki article
\begin_inset CommandInset href
LatexCommand href
name "http://wiki.freepascal.org/libc_unit"
target "http://wiki.freepascal.org/libc_unit"
\end_inset
was written that shares some of the issues in this doc (e.g.
Kylix libc unit issues), and is kept up to date better.
\end_layout
\begin_layout Section
History
\end_layout
\begin_layout Standard
The Unix rtl started life as the Linux rtl.
I don't have the exact date, but the design of the linux unit of the 0.9(9).x
and 1.0.x series dates back to 1996, and was made by Michael van Canneyt
based on the kernels of that era.
(1.1.x, pre-glibc).
This rtl was maintained and slightly expanded during the 1996-2000 period,
but no fundamental rearrangements were made.
\end_layout
\begin_layout Standard
Just before the 1.0 release Marco van de Voort started tinkering with FreeBSD,
and the unit linux was a major problem, at least for his skills then :-)
However FPC was already too deep into the codefreeze that was needed to
stabilize the upcoming 1.0 release to allow a junior member to fully redesign
the Unix rtl.
\end_layout
\begin_layout Standard
That's why 1.0 was released without formal FreeBSD support, and a 1.0 FreeBSD
beta release was delivered a few weeks after the formal release.
The minimal modifications for FreeBSD were merged into the CVS system before
the 1.0.2 release, and the FreeBSD platform was reasonably established and
regarded stable with the release of FPC 1.0.4.
\end_layout
\begin_layout Standard
In hindsight however not forcing more fundamental changes at the 1.0.2 release
point was a pity.
At least a few exotic linux functions should have been banned (sysinfo,
clone), and unix typing should have been introduced, and preferably the
unit should have been renamed to Unix.
Also the syscall interface should have been changed.
However besides the conservatism that resulted from the code freeze, I
had some dobuts about the feasability of the BSD ports then, and didn't
push it hard enough.
\end_layout
\begin_layout Standard
During the 1.0.x lifetime, I regretted this deeply, specially after 1.0.6 when
other Unix ports appeared, and the FreeBSD port turned out to be qualitatively
good.
The things that could be solved with a simple IFDEF when the FreeBSD port
was done, turned out to be annoying and complicated with multiple ports,
and likely to introduce bugs.
Small fixes done to the linux rtl by others constantly broke the BSD ports.
The 1.1 branch was made already a year before 1.0.
Bugfixes and restructures were only partially ported to the 1.1.x branch,
and the OpenBSD/NetBSD/Solaris/QNX/BeOS ports were never ported to 1.1.x.
\end_layout
\begin_layout Standard
Because of these facts, there was a lot of maintenance work to do for 1.1.x,
and I decided to combine the needed maintenance and updating work with
the postponed restructure described above, and along the way tackle as
many other problems as possible.
\end_layout
\begin_layout Section
What's wrong with the situation in 1.0?
\end_layout
\begin_layout Standard
Well, there are a lot of reasons actually.
Some important ones:
\end_layout
\begin_layout Enumerate
The linux unit was originally targeted at linux only, 1.x kernels even.
Some details
\begin_inset Quotes eld
\end_inset
bled
\begin_inset Quotes erd
\end_inset
through in the units interface.
\end_layout
\begin_layout Enumerate
Quite a lot of different groups of functions with different portability
aspects are stuffed together in one unit.
This makes porting the complete unit nearly impossible, and also poses
some challenges to keep the unit long term compatible on Linux.
\end_layout
\begin_layout Enumerate
The linux unit doesn't have any form of (Unix) typing.
Parameters types are translated to TP integer or longint, using the size
in bytes they had in linux 1.0.x This creates portability problems to other
Unices, linux on other architectures, and makes it harder to fix the unit
for newer linux versions.
\end_layout
\begin_layout Enumerate
The errorhandling of the linux unit is an own invention, and not compatible
with libc, without any major benefits.
This complicates the situation when the base library bases on libc.
\end_layout
\begin_layout Enumerate
The name is wrong, at least for the current FPC.
It doesn't make sense to import an Linux unit under FreeBSD to access Unix
functions.
Also, in which unit do Linux specific functions end up?
\begin_inset Quotes eld
\end_inset
Linux
\begin_inset Quotes erd
\end_inset
would be logical, but is already taken.
\end_layout
\begin_layout Enumerate
Syscalls that are both used in unit linux and unix are duplicated.
Some other units also include these.
This adds a small overhead only (typically a few hunderds till several
kb) under linux, but can dramatically increase if wrappers become complex.
See also next point.
\end_layout
\begin_layout Enumerate
The then-current readdir situation on Linux was bad.
Each readdir to get an entry is a syscall, which can be slow.
This can sped up by moving parts of the readdir call to userland, and only
calling the kernel once in every so and so many blocks (call getdents or
getdirentries).
Linux implements this too, since the old libc->glibc change, but FPC hadn't
caught up yet.
The *BSD ports did this from the start, but it is handcoded, and not a
translated libc version, which might cause problems with unusual filesystemdriv
ers.
(due to AMD64/Linux rtl work, I believe this has been remedied by Peter)
\end_layout
\begin_layout Enumerate
(see also 6) The structure of the includefiles was quite linux centric,
and not very flexible.
System and linux/unix unit are too rigidly entangled.
\end_layout
\begin_layout Enumerate
Functions weren't named consistently.
Some have fd- prefix, some none, some have a slightly different name from
libc etc etc.
(hmm, this was partially correct in hindsight.
fd* functions use the C file type, while the normal (without prefix) are
syscalls and use a kernel handle as first argument)
\end_layout
\begin_layout Enumerate
(minor) The parameter passing of the syscall interface was system dependant.
(Linux: record, BSD: pseudo procedural), this is bad because the syscall
interface was exported too.
\end_layout
\begin_layout Standard
These reasons are made worse because 2.0 was supposed to support several
architectures, and probably more OSes.
During the 1.0 lifetime, the linux unit was ported to *BSD and BeOS, and
that already stretched the design to its limits.
2.0 was expected to grow beyond 20 OS-architecture combinations, so that
made it much worse.
Portability aspects became more important if we wanted to avoid having
2
\begin_inset Quotes eld
\end_inset
good
\begin_inset Quotes erd
\end_inset
platforms, and the rest outdated builds that are only partially implemented.
\end_layout
\begin_layout Standard
These reasons except number two could be fixed by some major, but doable
refactoring of unit linux, and renaming it to unix (as was done originally).
However reason two couldn't be fixed this way.
\end_layout
\begin_layout Standard
Since full compability would be broken by all those other changes anyway,
it was decided to do a full redesign, and start from the bottom up, and
take care of all these issues, with special attention to ease of maintenance,
portability (read: separating portable from unportable code).
The design must also scale enough to last for while, 2.0 shipped in 2005,
so the 1.0.x series roughly had a lifespan of about 5 years.
So a fundamental RTL design must be at least as durable.
\end_layout
\begin_layout Subsection
Why is it necessary to split up the unit?
\end_layout
\begin_layout Standard
The main reasons are related to portability and maintenance.
It's easier to do a new port (only the necessary units will be implemented),
units will less often be
\begin_inset Quotes eld
\end_inset
incomplete
\begin_inset Quotes erd
\end_inset
for some targets.
\end_layout
\begin_layout Standard
An important sideeffect is that future source will show more clearly in
the USES clause what unix functionality is actually used.
Use Termio or Syscall is more clear than
\begin_inset Quotes eld
\end_inset
linux
\begin_inset Quotes erd
\end_inset
.
People often think that a single unit is easier than many, but this isn't
the case anymore if it stuffed from top till bottom with IFDEFs, and some
functions are not defined on some platforms.
\end_layout
\begin_layout Section
What are the basic idea's behind the new 1.1.x/1.9/2.0.x RTL?
\end_layout
\begin_layout Enumerate
Introduce Unix typing, so dev_t, off_t etc.
\end_layout
\begin_layout Enumerate
Fix the errorhandling to be compatible with normal Unix (posix) errno.
\series bold
\emph on
(
\emph default
Threadsafe)
\end_layout
\begin_layout Enumerate
At least keep a possible implementation on top of libc in mind while designing
the new RTL.
The libraries must be recompilable with a define to keep them syscall free.
\end_layout
\begin_layout Enumerate
No more duplication of code.
Currently code is duplicated between RTL and the unix/linux unit.
\end_layout
\begin_layout Enumerate
Split up and rename the unit into parts.
\end_layout
\begin_deeper
\begin_layout Enumerate
Baseunix which contains the reasonably portable calls (selection loosely
based on POSIX)
\end_layout
\begin_layout Enumerate
Termio which contains the
\begin_inset Quotes eld
\end_inset
termio
\begin_inset Quotes erd
\end_inset
calls.
\end_layout
\begin_layout Enumerate
The syscalls moved to the syscall unit.
\end_layout
\begin_layout Enumerate
The inport, outport calls move to the x86 unit.
\end_layout
\begin_layout Enumerate
Some very Linux specific calls move to unit Linux.
This includes calls like Clone and SysInfo
\end_layout
\begin_layout Enumerate
Unixutil which contains a few calls that are not Unix specific (usually
more general C interfacing).
A good place still has to be found for these
\end_layout
\begin_layout Enumerate
Unix pretty much contains a cleaned up version of the rest.
\end_layout
\begin_layout Enumerate
If the number of function-categories expands, add additional units instead
of adding it to an existing one.
E.g.
users,sockets,netdb cwstring etc.
\end_layout
\end_deeper
\begin_layout Enumerate
Functions that have an equivalent in libc are renamed to fp<libcname>.
All non fp functions that were added to ease the transition were deprecated
in 2.2
\end_layout
\begin_layout Enumerate
Introducing a modern readdir will be done too, but as one of the last things
to do, since it can be done
\begin_inset Quotes eld
\end_inset
under the hood
\begin_inset Quotes erd
\end_inset
.
\end_layout
\begin_layout Enumerate
Restructuring the includefiles, and detangling the includefiles, and redividing
the contents into a platformspecific and -independant parts.
\end_layout
\begin_layout Enumerate
The linux syscalls were changed to the BSD way, instead of something that
can only be expressed in assembler, the BSDs internally have a pseudo procedura
l syntax for syscalls.
(which is quite generic, probably NetBSD's influence).
This spells the end for the syscallreg record that was linux AND x86 centric.
\end_layout
\begin_layout Subsection
Phasing of the changes.
\end_layout
\begin_layout Standard
The restructuring of the code was done in several phases, because the 1.1
branch should remain compilable, so that the compiler developers could
keep on working on it.
Usually after each phase there was some pauze for stabilising and clean-ups.
Roughly these phases were followed:
\end_layout
\begin_layout Enumerate
Renaming the linux unit to unix was the first step This sounds trivial,
but in practice it turned out to be adding {$ifdef ver1_0} uses linux{$else}
uses unix{$endif} for two days.
(called
\emph on
Renamefest
\emph default
in cvs logs)
\end_layout
\begin_layout Enumerate
Restructuring of the syscall interface.
This affected both unit Unix and System.
All was changed to use the BSD structure as much as possible.
\end_layout
\begin_layout Enumerate
At the roughly the same time, the unix typing was introduced.
\end_layout
\begin_layout Enumerate
These all needed a lot of cleanup.
The BSD ports turned out to be so familiar, that I roughly redivided the
BSD rtl between a generic Unix, generic BSD and OS specific part.
The BSD rtls share a lot more code now.
\end_layout
\begin_layout Enumerate
The system unit was cleaned of linuxisms (mainly sysunix.inc), and parts
were made more OS specific
\end_layout
\begin_layout Enumerate
A rough first implementation of the baseunix unit was made, using via
\emph on
external alias
\emph default
exported syscalls from system.
All the rearranging of the includefiles was quite a lot of work.
First for FreeBSD, then for Linux.
\end_layout
\begin_layout Enumerate
The complete CVS was checked, and changed to use functions from baseunix
instead of unit unix.
Again (for compiler, fcl, packages, ide) under $IFDEF VER1_0.
\emph on
(Renamefest II
\emph default
in CVS logs)
\end_layout
\begin_layout Enumerate
Functions both in baseunix and unix were removed from unit unix.
Unit baseunix was also extended a bit in this phase.
\end_layout
\begin_layout Enumerate
Unit unix was cleaned up and split up into multiple units (still in progress)
\end_layout
\begin_layout Enumerate
A possibility to recompile unix rtl using libc
\end_layout
\begin_layout Enumerate
Cleanup, redividing unix unit over platform (in)dependant includefiles.
(mostly done)
\end_layout
\begin_layout Enumerate
Darwin port, beos port, more non x86 ports.
\end_layout
\begin_layout Subsection
Unix errorhandling
\end_layout
\begin_layout Standard
The rules of Unix errorhandling are quite easy:
\end_layout
\begin_layout Itemize
Each function call indicates somehow if an error occurs.
Usually by returning -1.
For other functions check the manpages.
(typically these functions return a different type then a (C) integer).
\end_layout
\begin_layout Itemize
You are only allowed to read the error variable (errno, cerrno, see below)
if the function indicates an error.
\end_layout
\begin_layout Standard
Besides compability there is another nice thing about this scheme: if an
error occurs, one can simply bail out of the function with -1 in some situation
s, like in the next example:
\end_layout
\begin_layout LyX-Code
Function somefunc:cint; // a
\begin_inset Quotes eld
\end_inset
unix
\begin_inset Quotes erd
\end_inset
function.
\end_layout
\begin_layout LyX-Code
Var st : Stat;
\end_layout
\begin_layout LyX-Code
Begin
\end_layout
\begin_layout LyX-Code
If FpStat('/',st)=-1 Then
\end_layout
\begin_layout LyX-Code
exit(-1); // exit, errno is already set by fpstat.
\end_layout
\begin_layout LyX-Code
...
more code...
\end_layout
\begin_layout LyX-Code
If FpRmdir('/')=-1 Then
\end_layout
\begin_layout LyX-Code
exit(-1); // exit, errno is already set by fprmdir.
\end_layout
\begin_layout LyX-Code
...
more code
\end_layout
\begin_layout LyX-Code
somefunc:=0;
\end_layout
\begin_layout LyX-Code
end;
\end_layout
\begin_layout LyX-Code
etc etc.
\end_layout
\begin_layout Subsubsection
The FPC errorhandling situation, errno and cerrno
\end_layout
\begin_layout Standard
FPC normally does its own system calls, and doesn't always to link to libc,
which is why the FPC rtl needs an own, independant errorvariable.
However when linking to libc or other libraries that use libc it needs
access to the libc error variable too.
In theory, we could let FPC's syscall write to libc's errno when libc is
(also) used, but since that could introduce subtle but hard to trace compabilit
y problems, it was decided to keep both errorvariables separate at all times,
except when FPC doesn't do syscalls internally at all.
\end_layout
\begin_layout Standard
FPC's own errornumber is called errno and is accessable via unit baseunix,
libc's errno is accesable via unit initc, and called cerrno.
If FPC uses libc for OS interfacing, then both errno's will point to the
libc errno.
\end_layout
\begin_layout Standard
What does this mean in practice? You need to know if the function you are
calling is from a unit that bases on libc calls or can also be based on
(FPC internal) syscalls.
Then select the errorcode (errno,cerrno) accordingly.
So if you use unix, linux or similar units, you should get errno (baseunix.fpget
errno/fpseterrno), if you want to for e.g.
unit inet (a typically libc using unit), you need cerrno (initc.fpgetCerrno/fpse
tCerrno)
\end_layout
\begin_layout Standard
Don't worry about that a syscall using unit uses libc when compiled with
FPC_USE_LIBC, that is taken care of properly.
(when FPC_USE_LIBC, get/seterrno also update libc's errno)
\end_layout
\begin_layout Subsection
Libc or syscall?
\end_layout
\begin_layout Standard
From time to time, people are asking why FPC isn't using libc, and resorts
to syscalls.
\end_layout
\begin_layout Standard
There are several reasons for this, but the most important ones are the
constant small incompabilities in (Linux) glibc, and the large amount of
glibc versions in circulation.
(again, mainly for Linux).
This includes distributions that compile libc with special options and
then work around this in headers for C users.
\end_layout
\begin_layout Standard
Moving to use libc by default would mean more than one binary distribution
per platform (mainly for Linux, but maybe also for other *nix OSes), without
much gain.
(the binaries would become slightly larger even when dynlinked with libc,
contrary to what you would expect, which is about a 10-40kb.
This is probably due to larger libc stubs and relocation data, if PIC is
used when linking to libc, the difference might be larger even).
Statically linked to libc the binaries are huge.
\end_layout
\begin_layout Standard
Not being libc based also avoids some minor binary loader incompabilities
that creep up, even if the libc is statically linked.
\end_layout
\begin_layout Standard
Another reason is that FPC programs have structures for use with certain
functions (like struct STAT) defined in the Pascal rtl, while one calls
the C function directly in libc.
A C program that calls the same libc function, always uses the right stat
because it uses headers supplied with the OS, at least as long as field
renaming is consistent
\begin_inset Foot
status collapsed
\begin_layout Plain Layout
And unfortunately automatic unattended conversion of C headers is not really
doable.
\end_layout
\end_inset
.
And you'll get a warning if it isn't.
But FPC always uses the same one in the RTL.
This can be problematic if there are multiple libc's in circulation that
use different versions of the structure.
(like a 64-bit filesystem version of STAT, and an ordinary one).
The kernel doesn't have this, since incompatible versions of the call always
get a different syscall number.
Sometimes this is possible for libc too (e.g.
by always using stat32 or so), but the libc situation is generally a bit
more difficult.
In the stat example, some distributions didn't support stat32 (to force
quicker migration to 64-bit fs)
\end_layout
\begin_layout Standard
However all this doesn't meant that an compile option for a libc based rtl
isn't nice, since linking to libc can be useful for
\end_layout
\begin_layout Itemize
porting purposes (to get the compiler working on a platform for the first
time), platforms that are poorly maintained.
(QNX, BeOS)
\end_layout
\begin_layout Itemize
Darwin (Mac OS X), an OS where the syscalls are said to be a bit more in
a state of flux.
\end_layout
\begin_layout Itemize
debugging purposes, switch to libc and see if the problem disappears.
This works both ways (switch to syscall to detect slight libc incompabilities)
\end_layout
\begin_layout Itemize
saving space, e.g programs like Lazarus will link to libc no matter what.
Having the RTL link to libc might save a few tens of kb's per application.
The exact savings in such case still have to be tested.
\end_layout
\begin_layout Itemize
Some functions can be
\begin_inset Quotes eld
\end_inset
enhanced
\begin_inset Quotes erd
\end_inset
in libc.
Specially for security and nameresolving related functionality.
\end_layout
\begin_layout Standard
Moreover when done during a large restructure and considered during the
design (errno handling), introducing libc support isn't really a lot of
work.
(initial implementation, generic parts+FreeBSD, about 6-7 hours)
\end_layout
\begin_layout Standard
A solution would be a GUI installer (e.g.
in Lazarus) that bootstraps FPC, and allows configuring by simply toggling
switches.
However such an app is a lot of work, and keeps always a bit of a DIY shine.
The FreeBSD ports system is also a natural fit, if somebody with enough
knowledge of it would step up.
\end_layout
\begin_layout Subsubsection
Basic libc Implementation
\end_layout
\begin_layout Standard
The units that are primarily affected by libc are system, baseunix and unix.
This because these contain a lot of functions that are also in libc, or
access these via assembler aliases.
Secondary units are nearly all units that are based on syscall, like sockets,
ipc etc
\end_layout
\begin_layout Standard
A global define FPC_USE_LIBC is introduced that signals
\begin_inset Quotes eld
\end_inset
use base functions from libc
\begin_inset Quotes erd
\end_inset
.
(-Ur might be necessary to avoid recompilation).
The syscall primitives remain available via unit syscall (and units other
than baseunix and unix should use unit syscall and not the aliases)
\end_layout
\begin_layout Standard
The 1.0.x compability unit
\begin_inset Quotes eld
\end_inset
oldlinux
\begin_inset Quotes erd
\end_inset
isn't touched, and always uses syscalls.
\end_layout
\begin_layout Subsubsection
pipe functions, popen/pclose, a problem?
\end_layout
\begin_layout Standard
At first it looked that the pipe functions popen/pclose would become a problem.
The FILE type used by these records is the libc internal file structure.
Internally these are backed by plain files in libc, and the implementation
of these functions in FPC is trivial (using FPC's own internal file record).
\end_layout
\begin_layout Standard
A solution proposed by another coremember could be to try keeping the pointer
type opague and retrieve the kernel filehandle with fileno() to be able
to overload the popen functions with proper pascal filetypes.
At least on the platforms where fileno() is a function (and not only a
macro).
For the closing operation, the FILE pointer should be stored somewhere
in the pascal filerecord too.
\end_layout
\begin_layout Subsection
__errno, __error, _errno_location, h_errno etc.
\end_layout
\begin_layout Standard
C is an ancient language which is pretty much frozen due to the enormous
amounts of Unix code, and doesn't have an in language threadvar system.
However (c)errno is an important global variable that must be threadsafe.
This is solved in libc by using some form of macro that usually transforms
an errno access to a function call that returns a pointer to the actual
errno (right threadinstance) Macro's don't exist after preprocessing, let
alone compilation.
So when linking to libc we have to poke in the internals, and somehow use
the function that returns the pointer to errno directly.
This situation is far from ideal, but the problem is made worse by Unix
API designers who simply aren't aware of the existance of other languages.
\end_layout
\begin_layout Standard
The problem is that the name isn't uniform over platforms, even the free
ones.
FreeBSD calls it __error, NetBSD __errno and Linux __errno_location.
The initc.setcerrno/initc.getcerrno routines wrap this difference.
\end_layout
\begin_layout Standard
h_errno is the symbol in the libc library for the non threadsafe variant,
and was used in 1.0.x.
However this isn't threadsafe, and newer glibc libraries seem to omit it.
By default, 1.1.x will use the threadsafe variants, but support for h_errno
is still under {$IFDEF }in the initc unit in case you need to work with
older libc's.
\end_layout
\begin_layout Standard
In general, try to avoid to update C style error variables directly, always
use either set/getcerrno or get/seterrno.
(the symbols errno and cerrno are ok, these call get/set(c)errno internally)
\end_layout
\begin_layout Subsection
Exec() functions
\end_layout
\begin_layout Standard
The exec() functions have been replaced by the fpexec functions.
Moreover, platform independant alternatives like TProcess and ExecuteProcess()
are more mature now.
The old 1.0.x linux.exec() functions remained in 2.0.x as legacy functions,
but were removed starting with 2.2.0
\end_layout
\begin_layout Standard
The main idea behind all new functions is the use of
\begin_inset Quotes eld
\end_inset
array of ansistring
\begin_inset Quotes erd
\end_inset
for the argument of the execl functions.
This means a programmer can specify the arguments himself, and are then
(with zero copy) passed to the OS.
The new way decreases the amount of string operations, and avoids the problems
with arguments and filenames that contain spaces that the old functions
had.
The old functions have been fixed for the most basic quote problems though.
\end_layout
\begin_layout Subsubsection
SysUtils.Executeprocess
\end_layout
\begin_layout Standard
The new execute functions are used in SysUtils.Executeprocess, which is the
new platform independant way of running a program.
(comparable to dos.exec, but without the 255 char limit)
\end_layout
\begin_layout Standard
Slowly the unit Dos interface is getting increasingly uncomfortable because
of shortstrings and dosisms.
In general, currently it is recommended to use sysutils as much as possible.
\end_layout
\begin_layout Subsection
The
\begin_inset Quotes eld
\end_inset
FP
\begin_inset Quotes erd
\end_inset
prefix
\end_layout
\begin_layout Standard
During the past years, I've been pestered about both the need for, and choice
of a prefix again and again.
The new Unix rtl was written pretty much from scratch, but due to similarity
in design requirements resembled Carl's POSIX unit pretty to an high degree.
The improved Unix typing was the biggest difference, as well as the splitting
up in .inc file that resulted from supporting multiple platforms (posix
only supported BeOS afaik).
\end_layout
\begin_layout Standard
The prefix was IIRC already in Carl's predecessor the POSIX unit, but there
it was
\begin_inset Quotes eld
\end_inset
POSIX_
\begin_inset Quotes erd
\end_inset
.
The reason for the prefix was pretty much to have one uniform rule to transform
the
\begin_inset Quotes eld
\end_inset
C
\begin_inset Quotes erd
\end_inset
name to the FPC one.
No prefix was dangerous because of clashes with the (then still supported)
Linux unit, and long term also for other functions like write and read
and with OS specific units.
\end_layout
\begin_layout Standard
I didn't want anything with
\begin_inset Quotes eld
\end_inset
POSIX
\begin_inset Quotes erd
\end_inset
in the Unix rtl, because I didn't want to commit outright to POSIX compability
due to the possible issues with macroed functionality, and definition on
the libc level (vs kernel level).
IOW, be close to POSIX, but never guarantee it.
\end_layout
\begin_layout Standard
In early versions of the , the prefix was
\begin_inset Quotes eld
\end_inset
unx_
\begin_inset Quotes erd
\end_inset
, but this was considered ugly (and not pascal due to the underscore).
\end_layout
\begin_layout Standard
Who and when actually
\begin_inset Quotes eld
\end_inset
fp
\begin_inset Quotes erd
\end_inset
was decided I don't know.
Probably during the BBQ at Rosa and Joerg's place in France, where Carl,
Michael and I shouted a bit about the Unix RTL, or the correspondance to
work out the details afterwards.
It could have been somebody else on IRC even who suggested it.
(Florian, Peter or Jonas being the main possibilities)
\end_layout
\begin_layout Standard
However in retrospect I still stand by the need to add an prefix, and a
short prefix is fine.
People often bang on about the
\begin_inset Quotes eld
\end_inset
confusion
\begin_inset Quotes erd
\end_inset
it will cause, but that would have been much worse IMHO, when one had to
explain the non prefixed case: the exceptions, the name clashes with the
libc and linux and other platform specific units.
Yes, the transition
\begin_inset Foot
status open
\begin_layout Plain Layout
The pains mostly were the
\begin_inset Quotes eld
\end_inset
renamefests
\begin_inset Quotes erd
\end_inset
and a fat year long dual maintainenance to keep code working with both
the1.0.x and the 1.9.x branch till 2.0 came out.
Not only for the FPC project, but also for Lazarus.
However with over 5 years between the 1.0 and 2.0 release, there was no way
to do it more gradual without compromising 1.0.x internal compability.
(though IMHO that wouldn't have been a bad thing)
\end_layout
\end_inset
hurt, but IMHO we are in way a better supportable place now.
\end_layout
\begin_layout Standard
In my opinion the only way without a prefix would be a Modula-2 like extension
that forced all relevant identifiers from the baseunix,unix and socket
units to mandatorily prefix with unix name (EXPORT QUALIFIED)
\end_layout
\begin_layout Section
RTL layout
\end_layout
\begin_layout Standard
The following picture tries to explain some of the unit dependancies in
the Unix rtl, of course like all documentation, it is probably already
outdated :-)
\end_layout
\begin_layout Standard
\begin_inset Graphics
filename deeperrtl.png
scale 40
\end_inset
\end_layout
\begin_layout Subsection
Includefiles
\end_layout
\begin_layout Subsubsection
Why so many includefiles and ifdefs?
\end_layout
\begin_layout Standard
There are many reasons why the FPC rtl is organised as it is.
Some of the reasons are:
\end_layout
\begin_layout Enumerate
The includefiles allow sharing of code used in multiple places, and that
eases maintaining.
\end_layout
\begin_layout Enumerate
A higher granularity of the source helps working with CVS'/SVN somewhat.
There is less chance that two people work on the same file and have to
merge their changes, a problem with units that are thousands of lines.
\end_layout
\begin_layout Enumerate
The implementation of a system unit uses a lot of OS dependant types, records
and functions that are also used in other units.
Includefiles and some tricks allow to reuse the declarations, usually without
increasing the size of the binaries.
Particularly the Unix system unit exports syscalls via an external alias
mechanism.
See the separate paragraph about this subject.
The main reason for this is to precisely control how many and which symbols
the system unit exports, since these are always visible.
\end_layout
\begin_layout Enumerate
Exactly what includefiles are OS dependant, and which not, is susceptable
to change in the long run.
Moving an inc file is easier than totally revising the ifdef system of
a huge unit.
\end_layout
\begin_layout Enumerate
The system must allow to make exceptions.
This is why key units (like System, Baseunix, and in the future unix) are
system dependant, but include generic parts.
The idea is that a porter can say
\begin_inset Quotes eld
\end_inset
I want to implement these parts in a generic way by including the generic
includefiles
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
I want to override this functionality with my own code
\begin_inset Quotes erd
\end_inset
\end_layout
\begin_layout Enumerate
It allows for the situation where Pascal/Delphi tradition stuffs all related
headers in one unit, and still have a file per C header file, which eases
header maintenance.
\end_layout
\begin_layout Subsubsection
\begin_inset Quotes eld
\end_inset
\noun on
improper
\noun default
\begin_inset Quotes erd
\end_inset
exporting from the Unix system unit.
\end_layout
\begin_layout Standard
As said in one of the previous paragraphs, the unix system unit exports
some OS dependant functions via the [public, alias: 'xxx']; construct.
This construct is used to declare names without mangling.
This means that some os specific functions in system get a name in a namespace
outside the pascal realm, so that other units can import them, like you
would from a DLL or from external code.
The functions are not expored by normal pascal declarations, thus keeping
the interface of the system unit OS-independant.
The
\begin_inset Quotes eld
\end_inset
client
\begin_inset Quotes erd
\end_inset
unit is mainly BaseUnix, but unit Unix also reuses a few functions from
system, and exports them.
\end_layout
\begin_layout Standard
This is all done to avoid duplication of functions between system on one
side and baseunix/unix on the otherside.
This saves a few tens of kbs.
The types (in ptypes.inc/ctypes.inc) are still imported twice, once in the
implementation of system, once in the unixtype unit.
\end_layout
\begin_layout Subsection
Unixtype
\end_layout
\begin_layout Standard
The unit unixtype was introduced pretty late in the rearchitecting proces.
Initially baseunix imported ptypes.inc and ctypes, but some platforms needed
base unix types below this level (e.g.
in header units that were used to implement baseunix).
At first these units simply also included ptypes.inc, but this led to type
incompability problems once more platforms were implemented
\begin_inset Foot
status open
\begin_layout Plain Layout
The restructure was mostly carried out on FreeBSD, which pretty much only
has 32-bit types in the kernel interface.
Contrary to linux/x86 that also has 16-bit types
\end_layout
\end_inset
.
The only solution was to move all types to a separate unit.
Since we still wanted to export all unix symbols from baseunix, after some
heated discussion, all types in ptypes.inc were aliased.
(see aliasptp.inc that aliases ptypes.inc and aliasctp.inc that aliases ctypes.inc)
\end_layout
\begin_layout Section
Remaining problems
\end_layout
\begin_layout Standard
Besides already named problems (e.g.
popen), there are some todo's left.
Most of these surfaced while porting Kylix apps, and there were some situations
where a FPC substitute wasn't easily found:
\end_layout
\begin_layout Enumerate
64-bit file access.
The best way to do this, is to simply only have a 64-bit interface, and
translate this internally on the few platforms that don't do 64-bit access.
\end_layout
\begin_layout Enumerate
Access to security data (/etc/passwd /etc/groups files etc).
Should be extracted and abstract to a separate unit, _with_ a FPC_USE_LIBC
option, so that via that avenue users can make sure their apps access via
libc, and tie in with all kinds of authentication systems.
\end_layout
\begin_layout Enumerate
Improve DNS resolving and accessing.
Netdb is not perfect yet, and needs a FPC_USE_LIBC option.
\end_layout
\begin_layout Enumerate
Kylixcomp unit for
\begin_inset Quotes eld
\end_inset
easy
\begin_inset Quotes erd
\end_inset
substitutes for certain constants that ease libc->baseunix porting that
we don't want to expose in the proper RTL.
(these are hardly used in practice)
\end_layout
\begin_layout Enumerate
the unicode primitives are also among the most used functions in unit libc.
The widestring manager has some support for these.
\end_layout
\begin_layout Enumerate
A lot of the transitional functionality still has to be phased out.
See separate paragraph.
\end_layout
\begin_layout Subsection
Solved problems
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for dynamic loading of libraries (dynlibs)
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for basic user/group querying (v2.2.2
\begin_inset Quotes eld
\end_inset
users
\begin_inset Quotes erd
\end_inset
package)
\end_layout
\begin_layout Enumerate
unit libc is no longer needed for Iconv calls (v2.2.4 iconvenc)
\end_layout
\begin_layout Subsection
Deprecating transitional functionality
\end_layout
\begin_layout Standard
A start has been made to remove the helpers and transitional functionality,
and a some calls are marked and documented deprecated in 2.2 and 2.2.2 (mantis
#0011119), and will be removed in 2.3/2.4.
This is a bit hampered by the fact that not all symbols can be marked with
deprecated yet.
\end_layout
\begin_layout Standard
Some of the deprecated functionality is listed below:
\end_layout
\begin_layout Enumerate
1.0.x fields of the Linux stat record that require a ifdef.
\end_layout
\begin_layout Enumerate
non fp socket functions.
These were buggy in some cases (formal parameter bug?)
\end_layout
\begin_layout Enumerate
Some non fp functions in the unix rtl
\end_layout
\end_body
\end_document
|