File: devguide.tex

package info (click to toggle)
genometools 1.6.5%2Bds-2.2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 50,548 kB
  • sloc: ansic: 271,875; ruby: 30,334; python: 5,106; sh: 3,083; makefile: 1,211; perl: 219; pascal: 159; haskell: 37; sed: 5
file content (2000 lines) | stat: -rw-r--r-- 79,981 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
\documentclass[11pt,final]{article}
\usepackage{a4wide}
\usepackage{graphicx}              % figures
\usepackage{listings}              % source code
\usepackage{hyperref}              % navigation
\usepackage{booktabs}
\usepackage[small]{caption}
\usepackage{color}
\usepackage{xspace}
\usepackage{mathptmx}
\setcounter{tocdepth}{1}

\parindent0mm                      % this looks better in short paragraphs
\definecolor{lightgray}{rgb}{.95,.95,.95}
\definecolor{middlegray}{rgb}{.55,.55,.55}
\lstset{
  basicstyle=\footnotesize\ttfamily,
  keywordstyle=\bfseries\ttfamily,
  commentstyle=\color{middlegray}\ttfamily,
  tabsize=2,
  numbers=none,
  numberstyle=\tiny,
  numberblanklines=false,
  stepnumber=1,
  numbersep=10pt,
  language=C,
  xleftmargin=0pt,
  backgroundcolor=\color{lightgray}
}

\newcommand{\classname}[1]{\emph{#1}}
\newcommand{\keyword}[1]{\lstinline{#1}}
\newcommand{\Gt}[0]{\emph{GenomeTools}\xspace}

\title{The \Gt Developer's Guide}
\author{Sascha Steinbiss, Gordon Gremme and Stefan
        Kurtz\thanks{please send comments to:
        \texttt{sascha@steinbiss.name}}}
\date{18/02/2016}

\pdftrailerid{}
\begin{document}
\maketitle
\tableofcontents


\section{Introduction}
This document describes design properties and coding guidelines for the \Gt
genome analysis system. The goal of the \Gt environment is to provide a
well understandable, comprehensive and most importantly reusable set of classes
and modules to aid in the development of C-based bioinformatics applications.

The expected gain in productivity is only possible to achieve if all
components of the \Gt behave in a similar way -- or so to say -- in a way which
is least surprising to the user (which is, in this case, a programmer).
Thus we ask all developers contributing code to the \Gt to adhere to a common
set of rules which make it easier for others to reuse the products of
everyone's hard work.


\section{Object-oriented design}
\subsection{Classes}

The central component type in \Gt is the \emph{class}. Structuring the C code
into classes and modules gives us a unified design approach which simplifies
thinking about design issues and avoids the code base becoming monolithic, a
problem often encountered in C programs.

\subsubsection{Simple classes}

For most classes, a simple class suffices. A simple class is a class which does
not inherit from other classes and from which no other classes inherit. Using
mostly simple classes avoids the problems of large class hierarchies, namely
the interdependence of classes which inherit from one another. The major
advantage of simple classes over simple C structs is information hiding.

\subsubsection{Implementing simple classes}
\label {simpleclasses}

We describe now how to implement a simple class using the string class
\keyword{str.[ch]} of \Gt as an example. The interface to a class is always
given in the \keyword{.h} header file (\keyword{str.h} in our example). To
achieve information hiding the header file cannot contain implementation details
of the class. The implementation can always be found in the corresponding
\keyword{.c} file (\keyword{str.c} in our example). Therefore, we start with
the following C construct to define our \keyword{GtStr} class in
\keyword{str.h}:

\begin{lstlisting}
typedef struct GtStr GtStr;
\end{lstlisting}

This seldomly used feature of C introduces a new data type named
\classname{GtStr} which is a synonym for the \keyword{struct GtStr} data type,
which needs not to be known at this point. In the scope of the header file, the
new data type \keyword{GtStr} cannot be used, since its size is unknown to the
compiler at this point. Nevertheless, pointers of type \keyword{GtStr} can still
be defined, because in C all pointers have the same size, regardless of their
type. Using this fact, we can declare a constructor function:

\begin{lstlisting}
GtStr*          gt_str_new(void);
\end{lstlisting}

which returns a new string object, and a destructor function

\begin{lstlisting}
void          gt_str_delete(GtStr*);
\end{lstlisting}

which destroys a given string object. This gives us the basic structure of the
string class header file: A new data type (which represents the class and its
objects), a constructor function, and a destructor function.

\begin{lstlisting}
#ifndef STR_H
#define STR_H

/* the string class, string objects are strings which grow on demand */
typedef struct GtStr GtStr;

GtStr*        gt_str_new(void);
void          gt_str_delete(GtStr*);

#endif
\end{lstlisting}

Now we look at the implementation side of the story, which can be found in the
\keyword{str.c} file. At first, we include the \keyword{str.h} header file to
make sure that the newly defined data type is known:

\begin{lstlisting}
#include "str.h"
\end{lstlisting}

Then we define \keyword{struct GtStr} which contains the actual data of a
string object (the \emph{member variables} in object orientation lingo).

\begin{lstlisting}
struct GtStr {
  char *cstr;           /* the actual string (always '\0' terminated) */
  GtUword length; /* currently used length (without trailing '\0') */
  size_t allocated;     /* currently allocated memory */
};
\end{lstlisting}

Finally, we code the constructor

\begin{lstlisting}
GtStr* gt_str_new(void)
{
  GtStr *s = gt_malloc(sizeof (GtStr));    /* create new string object */
  s->cstr = gt_calloc(1, sizeof (char));   /* init the string with '\0' */
  s->length = 0;                           /* set the initial length */
  s->allocated = 1;                        /* set the initially
                                              allocated space */
  return s;                                /* return the new string object */
}
\end{lstlisting}

and the destructor

\begin{lstlisting}
void gt_str_delete(GtStr *s)
{
  if (!s) return;           /* return without action if 's' is NULL */
  gt_free(s->cstr);         /* free the stored the C string */
  gt_free(s);               /* free the actual string object */
}
\end{lstlisting}

Our string class implementation so far looks like this

\begin{lstlisting}
#include "core/ma_api.h"
#include "core/str_api.h"

struct GtStr {
  char *cstr;           /* the actual string (always '\0' terminated) */
  GtUword length; /* currently used length (without trailing '\0') */
  size_t allocated;     /* currently allocated memory */
};

GtStr* gt_str_new(void)
{
  GtStr *s = gt_malloc(sizeof (GtStr));    /* create new string object */
  s->cstr = gt_calloc(1, sizeof (char));   /* init the string with '\0' */
  s->length = 0;                           /* set the initial length */
  s->allocated = 1;                        /* set the initially
                                              allocated space */
  return s;                                /* return the new string object */
}

void gt_str_delete(GtStr *s)
{
  if (!s) return;           /* return without action if 's' is NULL */
  gt_free(s->cstr);         /* free the stored the C string */
  gt_free(s);               /* free the actual string object */
}
\end{lstlisting}

Since this string objects are pretty much useless so far, we define a couple
more (object) methods in the header file \keyword{str.h} and the respective
implementations in \keyword{str.c}.

Because C does not allow the traditional \keyword{object.methodname()} syntax
often used in object-oriented programming, we use the convention to pass the
object always as the first argument to the function
(\keyword{methodname(object, ...)}).

To make it clear that a function is a method of a particular class
\emph{classname}, we prefix the method name with \keyword{gt_<classname>_}.
That is, we get \keyword{gt_<classname>_methodname(object, ...)} as the generic
form of method names in C. The constructor is always called
\keyword{gt_<classname>_new()} and the destructor
\keyword{gt_<classname>_delete()}. See \keyword{str.c} for examples.

\subsubsection{Class scaffold code generation}
The boilerplate code needed to create the structure of a new class (header and
C source) can be generated automatically to avoid typical copy-and-paste errors.
In the \keyword{scripts/} subdirectory of the \Gt directory tree, there is a
helper script to create header files and a C source file with an implementation
scaffold for a given class name. Run \keyword{scripts/codegen help} to get more
information about its usage.

\subsection{Interfaces}

Interfaces allow several classes with possibly different implementations to
share a common set of methods that can be called independently of the actual
implementing class. Each implementing class must adhere to the interface method
signature (that is, the return type and the number and types of parameters) but
is otherwise free to implement the method as liked.

In addition to the common interface functions, a class can also have its own
specific functions. To call an interface function, the object can simply be
cast to the interface type, and to call an implementation-specific function,
we cast it to the implementing type.
The following section describes the technique used to implement interfaces in C
such that objects can be cast from the interface type to the specific type
without problems.

Let's imagine we want an interface called \keyword{GtExample} which has a
method \keyword{gt_example_run(GtExample*)}. This corresponds to the following
header file \keyword{example.h} (source files are also available in the \Gt
source distribution):

\begin{lstlisting}
#ifndef EXAMPLE_H
#define EXAMPLE_H

typedef struct GtExample GtExample;

int  gt_example_run(GtExample*);
void gt_example_delete(GtExample*);

#endif
\end{lstlisting}

Note that there is no \keyword{gt_example_new()} constructor function, as the
constructors will be specific to the implementing classes.
Otherwise, this header is not much different to the header files for a simple
class. To make the methods implementable by more than one class,
we need a \emph{class object} describing the interface-to-implementation
mappings, that is, the specific functions to be called in the implementing
class. This class definition is given in a \keyword{example_rep.h} header file,
where ``rep'' stands for ``representation'':

\begin{lstlisting}
#ifndef EXAMPLE_REP_H
#define EXAMPLE_REP_H

#include <string.h>
#include "core/example.h"

typedef struct GtExampleClass GtExampleClass;

struct GtExampleClass {
  size_t size;
  int  (*run)(GtExample*);
  void (*delete)(GtExample*);
};

struct GtExample {
  const GtExampleClass *c_class;
};

GtExample* gt_example_create(const GtExampleClass*);
void*      gt_example_cast(const GtExampleClass*, GtExample*);

#endif
\end{lstlisting}

The \keyword{GtExampleClass} stores a function pointer to the specific function
implementing the \keyword{gt_example_run()} interface method. We also define a
\keyword{delete} function which is called when the implementing class needs to
do additional cleanup when an object of it is deleted. Given a
\keyword{GtExampleClass} filled with appropriate function pointers which
match the signatures, the \keyword{gt_example_create()} function then creates
an object which can be cast to both the interface type \keyword{GtExample*} as
well as the implementing type. To accomplish this, the size of the
implementing class is needed. The reason behind this will be explained below.

Note that this header file is meant to be private, that is, it should only be
included by code files which need to know about the interface-to-implementation
mappings. It is then straightforward to write the \keyword{example.c} which both
\begin{itemize}
\item
returns an object of the interface type, allocating memory to hold both a
pointer to a \keyword{GtExampleClass} object (needed for calling methods in the
interface context), and
\item
implements the interface methods by wrapping the implementation-specific
function pointers given in the \keyword{GtExampleClass} object:
\end{itemize}

\begin{lstlisting}
#include "example_rep.h"       /* we need access to the class struct */
#include "core/ma_api.h"           /* we need to allocate memory */

GtExample* gt_example_create(const GtExampleClass *ec)
{
  GtExample *e = gt_calloc(1, ec->size);   /* allocate memory */
  e->c_class = ec;                         /* assign interface */
  return e;
}

int gt_example_run(GtExample *e)
{
  gt_assert(e && e->c_class && e->c_class->run);
  return e->c_class->run(e);               /* call implementation-specific
                                              function */
}

void gt_example_delete(GtExample *e)
{
  if (!e) return;
  gt_assert(e && e->c_class);
  if (e->c_class->delete != NULL) {
    e->c_class->delete(e);                 /* delete implementation-specific
                                              members */
  }
  gt_free(e);                              /* delete interface */
}
\end{lstlisting}

Now, let us have a look at how the implementing classes are written. Let's
imagine we want class \keyword{GtExampleA} to implement the \keyword{GtExample}
interface. Of course, we need a class header file \keyword{example_a.h}
containing a constructor and destructor, just as described in
section~\ref{simpleclasses}:

\begin{lstlisting}
#ifndef EXAMPLE_A_H
#define EXAMPLE_A_H

typedef struct GtExampleA GtExampleA;

GtExample*  gt_example_a_new();

#endif
\end{lstlisting}

An implementation of the \keyword{GtExampleA} class in the \keyword{example_a.c}
source file then contains the code for the specific methods and their assignment
to the interface mapping. First, we need to include the headers to be able to
register our implementation-specific functions in the mapping struct:

\begin{lstlisting}
#include "example_a.h"
#include "example_rep.h"
\end{lstlisting}

Then, we define our \keyword{GtExampleA} class as usual, but leave enough space
for an instance of the interface class at the beginning of our definition:

\begin{lstlisting}
struct GtExampleA {
  GtExample parent_instance;
  GtUword my_property;
};
\end{lstlisting}

By placing an instance of the interface at the beginning of our implementation,
we allow the same pointer (to the beginning of the data structure) to be cast to
\begin{enumerate}
\item
a pointer to a \keyword{GtExample} interface implementation, so it can be used
safely with the \keyword{gt_example_*()} interface methods, restricting access
to the interface members only, and
\item
a pointer to the \keyword{GtExampleA} data structure, which can safely be used
with the \keyword{gt_example_a_*()} methods, ignoring the interface part and
allowing access to the implementation member variables only.
\end{enumerate}
Figure~\ref{fig:interfacememlayout} illustrates this concept.

\begin{figure}
\begin{center}
\includegraphics[width=.7\textwidth]{mlayout}
\end{center}
\caption{Memory layout used in the \keyword{GtExampleA} object starting at the
 memory location \keyword{e} implementing the \keyword{GtExample} interface.}
\label{fig:interfacememlayout}
\end{figure}

In the rest of \keyword{example_a.c}, we then code our implementation of the
\keyword{run} interface method:

\begin{lstlisting}
static int gt_example_a_run(GtExample *e) /* hidden from outside  */
{
  GtExampleA *ea = (GtExampleA*) e;       /* downcast to specific type */
  printf("%lu", ea->my_property);         /* run functionality */
  return 0;
}
\end{lstlisting}

Note that we cast our generic \keyword{GtExample*} pointer into a more specific
\keyword{GtExampleA*} pointer. We can do this because we can now be sure that
this function has been called on an object of the \keyword{GtExampleA} class.

We can be sure because we have registered this method as an implementation of
the \keyword{run} interface method by assigning it to the function pointer
variable in the \keyword{GtExampleClass} structure:

\begin{lstlisting}
/* map static local method to interface */
const GtExampleClass* gt_example_a_class(void)
{
  static const GtExampleClass ec = { sizeof (GtExampleA),
                                     gt_example_a_run,
                                     NULL };
  return &ec;
}
\end{lstlisting}

Note that we assign NULL to the \keyword{delete} function slot, because we do
not allocate any memory inside the implementing class we need to free later.
Have a look at the \keyword{example_b.*} files in the \Gt source distribution
for an alternative implementation which allocates additional memory.

We can then use the \keyword{GtExampleClass} returned by this function to write
the \keyword{GtExampleA} constructor, which uses \keyword{gt_example_create()}
to allocate the needed space, initializes the private members and returns the
object:

\begin{lstlisting}
GtExample* gt_example_a_new(void)
{
  GtExample *e = gt_example_create(gt_example_a_class());
  GtExampleA *ea = (GtExampleA*) e;       /* downcast to specific type */
  ea->my_property = 3;                    /* access private implementation
                                             member */
  return e;
}
\end{lstlisting}

Now consider another implementation, \keyword{GtExampleB} which also implements
this interface by creating a \keyword{GtExampleClass} with different implementation-
specific function pointers (see the \keyword{example_b.*} files in the
distribution).

Combining these implementation with the interface headers now allows us to do
the following:

\begin{lstlisting}
#include "example.h"       /* include the interface header */
#include "example_a.h"     /* include the implementation header */
#include "example_b.h"     /* include another implementation header */

int main(int argc, char *argv[])
{
  GtExample *my_e = gt_example_a_new();       /* create GtExampleA object, but
                                                 with interface type */
  gt_example_run(my_e);                       /* call an interface method */
  gt_example_delete(my_e);

  GtExample *my_e = gt_example_b_new();       /* create GtExampleB object, but
                                                 with interface type */
  gt_example_run(my_e);                       /* call an interface method */
  gt_example_delete(my_e);

  return 0;
}
\end{lstlisting}

That is, we can access two implementations via a common set of interface
methods.

\subsection{Modules}

Modules bundle related functions which do not belong to a class. Examples:
\begin{itemize}
\item
\keyword{dynalloc.h}, the low level module for dynamic allocation,
e.g.\@ used to implement arrays in \keyword{array.c} and the
above-mentioned strings
\item
\keyword{sig.h}, bundles signal related functions (high level)
\item
\keyword{xansi.h}, contains wrappers for the standard ANSI C library
\item
\keyword{xposix.h}, contains wrappers for POSIX functions we use
\end{itemize}

When designing new code, it is not very often the case that one has to introduce
new modules. Usually defining a new class is the better approach.

\subsection{Unit tests}

Classes and modules should contain a \keyword{gt_<classname>_unit_test} function
which performs a unit test of the class/module and returns 0 in case of success
and -1 in case of failure. More information about how to write unit tests can
be found in section~\ref{unittests}.

\subsection{Tools}

A \emph{tool} is the most high-level type of component \Gt has to offer. Tools
are command line interface (CLI) applications linked into the single \keyword{gt}
binary. They make use of helper classes like the \keyword{GtOptionParser} to
make development of command line tools easier. Having a common interface for
option parsing and error reporting ensures a consistent user experience across
all \Gt tools, as they behave the same way when invoked from the command line.

There are two possible code paths for defining and implementing a tool; only
the newer approach will be described here.
Simply put, a tool is just another object which needs to implement a special
interface, providing callbacks for the \Gt runtime to call at predefined times
during the tool's invocation. An example for a simple tool can
be found in the \keyword{tools} subdirectory in the \keyword{gt_template.[ch]}
files.

First, a tool needs to define a structure to store its arguments. That is,
every command line parameter needs to be represented by a member in the struct
to store its value. For example, for a tool taking a boolean and a string
parameter, we would need the following:

\begin{lstlisting}
typedef struct {
  bool bool_option;
  GtStr  *str_option;
} ExampleToolArguments;
\end{lstlisting}

The tool also requires an initializer function which prepares the argument
structure for value assignment. For example, the \keyword{GtStr} in above
example must be instantiated:

\begin{lstlisting}
static void* gt_example_tool_arguments_new(void)
{
  ExampleToolArguments *arguments = gt_calloc(1, sizeof *arguments);
  arguments->str_option = gt_str_new();
  return arguments;
}
\end{lstlisting}

as well as a destructor function which deletes the argument objects, if
necessary, and then frees the memory used for the argument struct:

\begin{lstlisting}
static void gt_example_tool_arguments_delete(void *tool_arguments)
{
  ExampleToolArguments *arguments = tool_arguments;
  if (!arguments) return;
  gt_str_delete(arguments->str_option);
  gt_free(arguments);
}
\end{lstlisting}

The argument structure is filled by an \emph{option parser}. An option parser
is an object which gets passed an argument list, identifying parameter names
and values and assigning them to the correct variables. It also handles
the creation of a convenient help output by documenting the purpose of each
option and its valid value range. See the interface documentation in
\keyword{src/core/option.h} for a list of possible option types.
A tool must contain a function returning an option parser object:

\begin{lstlisting}
static GtOptionParser* gt_example_tool_option_parser_new(void *tool_arguments)
{
  ExampleToolArguments *arguments = tool_arguments;
  GtOptionParser *op;
  GtOption *option;
  gt_assert(arguments);

  /* initialize with one-liner */
  op = gt_option_parser_new("[option ...] [file]",
                            "This is an example tool for demonstration "
                            "purposes.");

  /* -bool */
  option = gt_option_new_bool("bool",
                              "this is the boolean option",
                              &arguments->bool_option,
                              false);        /* default value */
  gt_option_parser_add_option(op, option);

  /* -string */
  option = gt_option_new_string("string",
                                "pass any string here",
                                arguments->str_option,
                                NULL);       /* default value */
  gt_option_parser_add_option(op, option);

  return op;
}
\end{lstlisting}

The option parser already performs initial validation of the parameters. For
example, it makes sure that a numeric parameter is not given a string value,
that unsigned values are always positive or that probabilities stay between 0
and 1.
In an error case, tool invocation is stopped and the appropriate error message
is printed to \keyword{stderr}.

For more sophisticated error checking, for example involving several parameters
and their values at once, is is possible to write an argument checking function,
which can set an error message in a \keyword{GtError} object (see~\ref{errors})
and return a non-zero return value if an error was found:

\begin{lstlisting}
static int gt_example_tool_arguments_check(GT_UNUSED int rest_argc,
                                           void *tool_arguments,
                                           GT_UNUSED GtError *err)
{
  ExampleToolArguments *arguments = tool_arguments;
  int had_err = 0;
  gt_error_check(err);
  gt_assert(arguments);

  /* we assume that the string parameter must not be empty */
  if (gt_str_length(arguments->str_option) == 0) {
    gt_error_set(err, "parameter 'string' must not be empty!");
    had_err = -1;
  }

  return had_err;
}
\end{lstlisting}

In most cases, however, this function is not necessary and needs not be
implemented.

The most important function which must be implemented in a tools is the runner.
The runner calls the code that actually performs the tool's function and is the
equivalent to the \keyword{main} function in a traditional C program.
Its signature is very similar to a typical C \keyword{main} function as well,
being passed the number of arguments \keyword{argc} and an array of argument
strings \keyword{argv}.:

\begin{lstlisting}
static int gt_example_tool_runner(int argc, const char **argv,
                                 int parsed_args,
                                 void *tool_arguments,
                                 GT_UNUSED GtError *err)
\end{lstlisting}

In addition, it receives the number of arguments (\keyword{parsed_args}) which
were already parsed by the option parser, thus specifying an offset in the
argument array from which the rest of the arguments begin.
That is, if the parameter string was

\begin{lstlisting}
-bool true -string foo bar baz
\end{lstlisting}

then  \keyword{parsed_args} would be 4, as the \keyword{-bool} and
\keyword{-string} options and their values have already been parsed,
leaving \keyword{argv[parsed_args] = 'bar'} and
\keyword{argv[parsed_args+1] = 'baz'} to be handled by the runner.

The rest of the runner function could look like this:

\begin{lstlisting}
static int gt_example_tool_runner(int argc, const char **argv,
                                 int parsed_args,
                                 void *tool_arguments,
                                 GT_UNUSED GtError *err)
{
  ExampleToolArguments *arguments = tool_arguments;
  int had_err = 0;
  gt_error_check(err);
  gt_assert(arguments);

  if (arguments->bool_option)
    printf("the bool option was set\n");
  printf("the string was '%s', gt_str_get(arguments->bool_option));

  return had_err;
}
\end{lstlisting}

Finally, the functions described above are registered in the new tool object
by using \keyword{gt_tool_new()} to create a new \keyword{GtTool} instance
passing pointers to all the static callback functions.

\begin{lstlisting}
GtTool* gt_example_tool(void)
{
  return gt_tool_new(gt_example_tool_arguments_new,
                     gt_example_tool_arguments_delete,
                     gt_example_tool_option_parser_new,
                     gt_example_tool_arguments_check,
                     gt_example_tool_runner);
}
\end{lstlisting}

Let's assume that we have saved the implementation above in
\keyword{tools/gt_example_tool.c}. We then make the \keyword{gt_example_tool()}
function public by adding a \keyword{tools/gt_example_tool.c} header:

\begin{lstlisting}
#ifndef GT_EXAMPLE_TOOL_H
#define GT_EXAMPLE_TOOL_H

#include "core/tool_api.h"

/* the example tool */
GtTool* gt_example_tool(void);

#endif
\end{lstlisting}

This function can then be added to the \Gt toolbox by adding the following lines
to \keyword{gtt.c}:

\begin{lstlisting}
...
#include "tools/gt_example_tool.h"
...
GtToolbox* gtt_tools(void)
{
  ...
  gt_toolbox_add_tool(tools, "example", gt_example_tool());
  ...
}
\end{lstlisting}

After compilation, we can then run our tool by calling

\begin{lstlisting}[language=sh]
$ gt example -bool true -string foo bar baz
\end{lstlisting}%$

\section{Directory structure}
All of these directories are given as subdirectories of the root directory of
the \Gt source distribution.
\begin{itemize}
\item[\texttt{bin/}] This subdirectory contains the \Gt binary executable \texttt{gt}
     as dynamic and static variants as well as the example executables built
     from \texttt{src/examples}. This directory is only populated after a
     make run. Running \texttt{make cleanup} will remove its contents.
\item[\texttt{doc/}] This subdirectory contains documentation such as this
     developer's guide, license information, format specifications, and the
     user manuals for the software tools included with the \Gt .
\item[\texttt{gtdata/}]
     This subdirectory contains data needed for the \Gt to run which are not
     compiled into the \Gt binary itself, such as
     \begin{itemize}
       \item texts for the tool on-line help (in \texttt{gtdata/doc}),
       \item Lua code for documentation generation (in \texttt{gtdata/modules}),
       \item ontology definition files (in \texttt{gtdata/obo\_files}),
       \item \emph{AnnotationSketch} default (e.g.\@ a default style file, in
              \texttt{gtdata/sketch}), and
       \item alphabet definition files for character mappings
             (in \texttt{gtdata/trans}).
     \end{itemize}
\item[\texttt{gtpython/}]
     This subdirectory contains the Python bindings to selected parts of the \Gt
     library, as well as the Python test suite. See the \texttt{README} file in
     this directory for installation instructions.
\item[\texttt{gtruby/}]
     This subdirectory contains the Ruby bindings to selected parts of the \Gt
     library. See the \texttt{README} file in this directory for installation
     instructions.
\item[\texttt{gtscripts/}]
     This subdirectory contains a number of Lua scripts written using the \Gt
     Lua bindings; most prominently the \texttt{gtdoc.lua} script to generate
     the documentation. These scripts can be run using the \texttt{gt}
     executable, which is a Lua interpreter as well, by giving the script name
     instead of a tool name.
\item[\texttt{lib/}]
     This subdirectory contains the \Gt static and dynamic libraries when built.
\item[\texttt{obj/}]
     This subdirectory contains object files as they are created during \Gt
     compilation.
\item[\texttt{scripts/}]
     This subdirectory contains useful scripts for \Gt developers.
\item[\texttt{src/}]
     This subdirectory contains the main \Gt source tree. In particular, there
     is a number of subdirectories:
     \begin{itemize}
       \item the \texttt{src/annotationsketch} subdir contains
             \emph{AnnotationSketch} code for genome annotation drawing,
       \item the \texttt{src/core} subdir contains general code, i.e.\
             basic data structures, memory management, file access, encoded
             sequences, sequence parsers, tool runtime, option parser,
             multithreading, etc.,
       \item the \texttt{src/examples} subdir contains simple example
             applications built on \Gt (streams, or a GUI app),
       \item the \texttt{src/extended} subdir contain code for annotation
             handling and parsing, stream processing, alignment, chaining, etc.,
       \item the \texttt{src/external} subdir contains third-party source code
             which is distributed with the \Gt source and built alongside the
             \Gt ,
       \item the \texttt{src/gth} subdir with \emph{GenomeThreader} code,
       \item the \texttt{src/gtlua} subdir with Lua bindings for some of the \Gt
             classes and modules,
       \item the \texttt{src/ltr} subdir with LTR retrotransposon prediction and
             annotation code,
       \item the \texttt{src/match} subdir with code for index structure
             construction and access, short read mapping, matching algorithms
             etc.,
       \item the \texttt{src/mgth} subdir contains \emph{MetaGenomeThreader}
             code,
       \item the \texttt{src/patches} subdirectory with platform-specific
             patches, and
       \item the \texttt{tools} subdir with code for all the tools included with
             \Gt .
     \end{itemize}
\item[\texttt{testdata/}]
     This subdirectory contains test data used in the testsuite. Please refrain
     from storing large files ($>1$MB) in this directory, but use the
     \keyword{gttestdata} repo instead (see~\ref{gttestdata}). Special
     subdirectories:
     \begin{itemize}
       \item The \texttt{testdata/gtscripts} subdir contains test scripts used
       in the Lua test cases,
       \item the \texttt{testdata/gtruby} subdir contains test scripts used
       in the Ruby test cases, and
       \item the \texttt{testdata/gtpython} subdir contains test scripts used
       in the Python test cases.
     \end{itemize}
\item[\texttt{testsuite/}]
     This subdirectory contains the test suite definitions as Ruby files as
     well as the test engine and temporary data created using test runs.
     After starting a test suite run, the \keyword{testsuite/stest_testsuite}
     subdirectory then contains a directory named \keyword{test<n>} for each
     test, where \keyword{<n>} is the test number. See~\ref{testdefinitions}
     for more details.
\item[\texttt{www/}]
     This subdirectory contains the content of the \Gt website.
\end{itemize}


\section{Public APIs}

In \Gt , we distinguish between \emph{public} and \emph{non-public} application
programming interfaces (APIs). The API describes the classes and modules
belonging to the \Gt and their methods and functions, in particular their
signatures; that is, their name, return value, and number and types of their
parameters.

The public API is a subset of the \Gt library which is intended to be used by
developers which do not belong to the \Gt core development team, and is fairly
high-level at this point. To ensure compatibility with future versions of the
\Gt library, the public API is supposed to be subject to as little change as
possible. That is, interface changes should be made very sparsely, and interface
design should `look forward' to make such changes unneccessary. For example,
interface functions which could fail in theory should receive error handling
facilities in their signature (such as a return code and a \keyword{GtError}
object), even if their current (and maybe only) implementation cannot fail. This
leaves room for implementations that \emph{may} fail without having to break the
API when the new implementation finds its way into the \Gt .

All public API functions for a given class must be declared in a prototype
header file named \keyword{<class>_api.h}. This header file must only include
other public API headers. All of the public API header files are packaged to be
distributed with the \Gt tarball and are installed into the given include path.
That is, the functions defined in them are later accessible by including
\keyword{genometools.h} only.

It is important to note that all functions in the public header files must be
properly documented (see section~\ref{documentation}).

\section{Coding style}

\subsection{General rules}
\begin{itemize}
\item
No line in the source code must be longer than 80 characters.
This allows proper formatting of the code.
\item
There must be not more than one consecutive empty line in the source code.
\item
Trailing spaces are disallowed.
\item
There must not be a comma at the beginning of a line.
\item
Unless it is at the end of a line, a comma should be followed by a space.
Example:

\begin{lstlisting}
cmpfunc(gt_array_get(a, idx), gt_array_get(b, idx));
\end{lstlisting}

instead of

\begin{lstlisting}
cmpfunc(gt_array_get(a,idx),gt_array_get(b,idx));
\end{lstlisting}

\item
The symbols `\keyword{=}', `\keyword{==}' und `\keyword{!=}' should be
enclosed by spaces. That is, write

\begin{lstlisting}
i = 0;
\end{lstlisting}

instead of

\begin{lstlisting}
i=0;
\end{lstlisting}
\item
There must be a space between the keywords \keyword{for}, \keyword{if},
\keyword{sizeof}, \keyword{switch}, \keyword{while} and \keyword{do} and the
following parenthesis.
\item
The opening braces (\keyword{\{}) after the keywords \keyword{if},
\keyword{else}, \keyword{for}, \keyword{do}, and \keyword{while} should be on
the same line as the keyword.
\item
The curly braces following an \keyword{if} or \keyword{else} expression should
be omitted if the expression and the (single) following statement both fit on
a single line.
\item
The keyword \keyword{else} should be placed on a separate line.
\item
Semantic blocks (statements inside loops, function definitions, etc.) must be
indented by exactly two spaces w.r.t.\@ the enclosing block.
This explicitly means \emph{no tabs}, configure your editor!

Here is an example:
\begin{lstlisting}
bool
gt_array_equal(const GtArray *a, const GtArray *b, GtCompare cmpfunc)
{
  GtUword idx, size_a, size_b;
  int cmp;
  gt_assert(gt_array_elem_size(a) == gt_array_elem_size(b));
  size_a = gt_array_size(a);
  size_b = gt_array_size(b);
  if (size_a < size_b)
    return false;
  if (size_a > size_b)
    return false;
  for (idx = 0; idx < size_a; idx++) {
    cmp = cmpfunc(gt_array_get(a, idx), gt_array_get(b, idx));
    if (cmp != 0)
      return false;
  }
  return true;
}
\end{lstlisting}
\item
Use the \keyword{scripts/src_check} and \keyword{scripts/src_clean} scripts
regularly to check your source code for style violations.
\item
Use the \keyword{scripts/pre-commit} git hook to automatically run a
\keyword{src_check}
before each commit. The commit will be canceled if errors are found.\\
To enable the git hook, copy the file \keyword{scripts/pre-commit} into the
\keyword{.git/hooks} subdirectory of your \Gt repository.
\item
Static variables inside functions are not allowed. An exception are class
structs, which must be static.
\item
All functions except those which should be callable publicly should be declared
as \keyword{static}. All non-static functions must be documented in a
header file.
Think twice before making a function public. Its interface should be clean
enough to be understood by someone who does not know implementation details!
\item
If a \Gt module or class exists for your particular need, use it instead of
using more low-level means (e.g.\@ try to use \keyword{GtFile} and friends for
file access instead of \keyword{fopen()}/\keyword{fclose()}/\dots\@ directly).
Consult the documentation and the header files!
\end{itemize}

\subsection{Global variables}
\begin{itemize}
\item
Generally, global variables are not allowed.
\item
There are exceptions in very rare cases, in which must be made sure that the
content in question is initialized, synchronized for multithreaded use and
properly cleaned up. Do not add global variables without talking to one of the
core developers!
\end{itemize}

\subsection{Types}
\begin{itemize}
\item
Use unsigned types whereever possible. We need to process large amount
of data and we may need every bit to process it.
\item
Use \keyword{GtUword} for sequence positions/lengths/offsets/\dots .
The \keyword{GtUword} type equals the word size on all common systems
(i.e., it is 32-bit wide on 32-bit systems and 64-bit wide on 64-bit systems)
which makes it ideal for most use cases.
\item
Use \keyword{GtStr} and \keyword{GtArray} instead of manipulating byte arrays
directly if possible, especially when returning strings or item collections
from a function.
\end{itemize}

\subsection{Naming rules}
\begin{itemize}
\item
Class names must begin with \keyword{Gt}, e.g. \keyword{GtArray},
\keyword{GtNodeStream}.
\item
Class names may use camel
case\footnote{\url{http://en.wikipedia.org/wiki/CamelCase}} if multiple words
are required, e.g.\@ \keyword{GtNodeStream}.
\item
Source and header files for a class must start with the class name in
lowercase, without the \keyword{Gt} prefix. If camel case is used in the class
name, use underscores in the respective file name, e.g.\
\keyword{GtNodeStream} $\to$ \keyword{node_stream.[ch]}.
\item
Variable names and function names must be all lower-case.
\item
Variable names and function names should use underscores to separate words
(e.g. use the function name \keyword{get_first_five_chars} instead of
\keyword{getfirstfivechars}).
\item
The names of public (i.e.\@ non-static) functions must be prefixed by the string
`\keyword{gt_}' to avoid namespace clashes when linking the \Gt library with
third-party code.
\item
Class or module names must follow the `\keyword{gt_}' part in the function name.
\item
In method signatures, the object on which the method is called must always be
the first argument of the method.
That is, method \keyword{method} in class \keyword{GtClass} must be defined as:
\begin{lstlisting}
<type> gt_class_method(GtClass*, <params>);
\end{lstlisting}
\end{itemize}

\subsection{Copyright lines}
\label{copyright}
\begin{itemize}
\item
Every header and C source file must begin with a comment containing author and
license information:
\begin{lstlisting}
/*
Copyright (c) 2007-2010 Gordon Gremme <gordon@gremme.org>
Copyright (c) 2007-2008 Center for Bioinformatics, University of Hamburg

Permission to use, copy, modify, and distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
\end{lstlisting}
\item
Each developer with substantial contributions (for example, by implementing new
features, refactoring, or fixing bugs which require large rewrites) should
give his/her name and email address, updating the given year ranges in the
process.
\item The "Center for Bioinformatics" copyright line is only required for
developers employed by it, because only in this case the University gains any
copyright.
\end{itemize}

\subsection{Header files}
\begin{itemize}
\item
Use conditional inclusion to avoid including the same header file multiple
times. Between the copyright header and the beginning of the declarations, put
the following:
\begin{lstlisting}
#ifndef FILENAME_H
#define FILENAME_H
\end{lstlisting}
and put an
\begin{lstlisting}
#endif
\end{lstlisting}
at the end of the header file.
This causes the preprocessor to include the section between the
\keyword{#ifndef} and \keyword{#endif} only once.
\item
The identifier to be \keyword{#define}d must be the filename of the header
file in uppercase with all non-alphanumeric characters replaced by underscores.
\end{itemize}

\subsection{Comments and documentation}
\label{documentation}
\begin{itemize}
\item
Comments are written in plain C style (\keyword{/* ... */}). C++-style
comments (\keyword{// ...}) are disallowed.
\item
The public API files (header files ending in \keyword{*_api.h}) are examined for
automatic generation of API documentation. The following annotations are
supported:
\begin{itemize}
\item
Each type (e.g.\@ class) should be directly preceded by a comment describing the
purpose of the class. Example:

\begin{lstlisting}
/* <GtArray*> objects are generic arrays for elements of a certain
   size which grow on demand. */
typedef struct GtArray GtArray;
\end{lstlisting}

\item
Each function (method) should be directly preceded by a comment describing the
purpose of the function, its preconditions, return value, and parameters.
Example:

\begin{lstlisting}
/* Add element <elem> to <array>. The size of <elem> must equal the
   given element size when the <array> was created and is determined
   automatically with the <sizeof> operator. */
#define       gt_array_add(array, elem) \
              gt_array_add_elem(array, &(elem), sizeof (elem))
/* Add element <elem> with size <size_of_elem> to <array>.
   <size_of_elem> must equal the given element size when the <array>
   was created. Usually, this method is not used directly and the
   macro <gt_array_add()> is used instead. */
void          gt_array_add_elem(GtArray *array, void *elem,
                                size_t size_of_elem);
\end{lstlisting}
\item
Code keywords (parameters, class names, references to other functions) can be
marked by putting them between angled brackets (\keyword{<}\dots\keyword{>}).
Also, keywords can be marked as strong (bold) by putting them between
three underscores (\keyword{___}\dots\keyword{___}) or emphasized (italic) by
using two underscores (\keyword{__}\dots\keyword{__}).
\end{itemize}
\item
The comment line should briefly describe
\begin{itemize}
\item
what the method does (e.g.\@ ``Calculates and returns X\dots'',
``Delivers the next element in order\dots'', ``Adds element X\dots''),
\item
what \emph{all} parameters are supposed to be (ideally given in the context of
what the method does),
\item
what the return value is,
\item
potential side-effects, and
\item
if the function receives or returns a pointer, whether ownership is taken or
retained for the accepted or returned memory (i.e.\@ ``takes ownership of
parameter X'' or ``X must be deleted by the caller'').
\end{itemize}
\item
Do not group multiple functions beneath one comment line, even if they are
largely similar and differ only in minor details. If required, repeat the
comment above each function to ensure that both get an entry in the generated
documentation.
\item
Stars (\keyword{*}) are \emph{not} to be continued on every line of the comment,
as many editors like to do by default. Doing so anyway will lead to interspersed
star symbols in the generated output.
\item
Please refrain from using other markup formats such as Doxygen or Javadoc style.
It may look alright in the plain text, but will most certainly come out weird in
the generated LaTeX or HTML documents.
\item
Use the \keyword{docs} target in the \Gt Makefile to build the API
documentation, which is then found in \keyword{www/genometools.org/htdocs}
subdirectory (libgenometools.html).
\end{itemize}

\subsection{Function pointers}
\begin{itemize}
\item
Function pointer declarations used in public headers (e.g.\@ to be used as method
arguments) should always be \keyword{typedef}'ed to an identifier prefixed with
\keyword{Gt}. An example:
\begin{lstlisting}
typedef int  (*GtCompareWithData)(const void*, const void*, void *data);
\end{lstlisting}

The function pointer type can then be used in a sort function this way:

\begin{lstlisting}
void gt_qsort_r(void *a, size_t n, size_t es, void *data,
                GtCompareWithData cmp);
\end{lstlisting}

instead of:

\begin{lstlisting}
void gt_qsort_r(void *a, size_t n, size_t es, void *data,
                int (*cmp)(const void*, const void*, void *data));
\end{lstlisting}

which makes the headers much harder to read.
\item
If callback functions need additional data to work, provide an additional
\keyword{void*} pointer to pass external data to the callback function
(as, for example, done with the \keyword{data} parameter in the function
\keyword{gt_qsort_r()} above). Do not use global variables for that purpose!
\end{itemize}

\subsection{\texttt{\#define}s}
\begin{itemize}
\item
Identifiers introduced by a \keyword{\#define} statement should be completely
written in upper case letters. This also holds for the arguments of
macros.
\item
Identifiers introduced by public \keyword{\#define}s should be prefixed with
\keyword{GT_}.
\item
In public headers, \keyword{\#define} statements should only set constants.
Macros defining code parts should only be used -- very sparingly -- locally
within a class or module implementation.
\end{itemize}

\subsection{Information hiding}
In \Gt, we try to adhere to object-oriented design guidelines. An integral
part of these is the enforcement of \emph{information hiding} or
\emph{encapsulation}, that is, making access to an object's internal state
only possible through accessor functions.

From this, it follows directly that the use of `open' structs, that is, C
structures defined in public headers, is strongly discouraged! Structures (and
structure member accesses) should only be implemented within a C file. This
applies to both structures used to implement classes as well as structures used
as auxiliary data structures.

A notable exception are structs implementing classes which only act as
containers, i.e.\@ in which the only methods manipulating the members would be
setter and getter methods (see \keyword{core/range_api.h}).
However, even those structures should provide accessor, constructor and
destructor methods so they can safely be created, accessed and deleted from
scripting language bindings lacking support for proper C structure access.



\section{Error handling}
We distinguish between \emph{programming errors} and \emph{run-time errors}.
Programming errors occur when the developer uses resources in an inherently
erroneous way, e.g.\@ passing a null pointer where a valid pointer is expected,
passing incorrect length values to string comparison functions, or accessing
uninitialised memory. This often happens because of incomplete documentation,
or by running into ``corner cases'' which were overlooked. If undetected, they
can lead to nasty bugs that are hard to track down.

Other special cases are expected to come up sooner or later. For example, a user
may specify a file name for input or output which is not readable or writable,
or which does not exist. Typically, it is not intended to terminate program
execution at that point, but to react in a graceful way (e.g.\@ by creating a new
file, reporting a proper error message, or asking for a different file name).
These are run-time errors.

\subsection{Programming errors}
\begin{itemize}
\item
Use \keyword{gt_assert()} to check invariants in your program, that is, any
conditions that you hold to be true for the following code to work properly.
For example, in a function getting a pointer as a parameter which is to be
dereferenced later, you should assert that the pointer is not \keyword{NULL}
prior to dereferencing.
Similarly, in a function that is supposed to never return a negative number,
one should use \keyword{gt_assert()} to check this condition directly before the
\keyword{return} statement.
\item
If the expression given to \keyword{gt_assert()} as a parameter evaluates to
\keyword{false} (that is, 0, \keyword{false}, or \keyword{NULL}), the program
will abort.
\item
Assertions can be disabled at compile time by passing the \keyword{assert=no}
switch to the \keyword{make} call. If variables or parameters are only used in
an assertion, disabling assertions may trigger this error message:
\begin{lstlisting}
error: unused parameter 'x'
\end{lstlisting}
or
\begin{lstlisting}
error: unused variable 'y'
\end{lstlisting}
Include \keyword{core/unused_api.h} and prefix the declaration of the offending
identifier with \keyword{GT_UNUSED} to inform the compiler that this variable is
intentionally unused.
\end{itemize}

\subsection{Run-time errors}
\label{errors}
\begin{itemize}
\item
Functions that are allowed to fail at run-time must return a negative error
code or \texttt{NULL}. The code for successful execution should be 0 or a
pointer different from \texttt{NULL}. Positive return values may be used as
results.
Such functions should also receive a \keyword{GtError} object as their last
parameter, in order to store and propagate error messages.
If a function may return an error code or \texttt{NULL}, \emph{always} check for
this and handle the case accordingly.
\item
Use the \keyword{GtError} class (see \keyword{core/error_api.h}) for storing
error messages and error status:
\begin{lstlisting}
char* get_first_five_chars(const char *str, GtError *err)
{
  char *ret;
  if (strlen(str) < 5) {
    gt_error_set(err, "string '%s' is shorter than 5 characters", str);
    return NULL;
  }
  ret = gt_calloc(6, sizeof (char));
  strncpy(ret, str, 5);
  return ret;
}
\end{lstlisting}
\item
Use a variable called \keyword{had_err} which is initialized to 0 and then
assigned an error status such that an error is set when \keyword{had_err}$\neq 0$:
\begin{lstlisting}
int had_err = 0;
const char *prefix;
GtError *err = gt_error_new();
if ((prefix = get_first_five_chars("foo", err))) {
  /* go on with next step */
}
else
  had_err = -1;
if (!had_err) {
  ...
}
\end{lstlisting}
\keyword{-1} should be used to store an error in the \keyword{had_err} variable.
It is very idiomatic to write \keyword{if (had_err)...} or
\keyword{if (!had_err)...}.
\item
Catch run-time errors and create error messages as close to their source as possible.
\item
The error object should always be the last parameter and should be named
\keyword{err}.
\end{itemize}

\section{Memory management}
\subsection{Allocation/deallocation}
\begin{itemize}
\item
Space allocation is only allowed using the \keyword{gt_malloc()},
\keyword{gt_calloc()}, \keyword{gt_realloc()} functions in
\keyword{core/ma_api.h}. Use \keyword{gt_free()} to deallocate memory.
These methods are analog to \keyword{malloc(3)}, \keyword{calloc(3)} and
\keyword{realloc(3)} from the C standard library, except that they never return
\keyword{NULL} upon failure.
\item
The \keyword{gt_ma_get_space_peak()}, \keyword{gt_ma_show_space_peak()} and
\keyword{gt_ma_check_space_leak()} functions in \keyword{core/ma_api.h} can be
used to evaluate memory usage and check for memory leaks.
\end{itemize}

\subsection{Reference counting}
Sometimes it is desired to have an object referenced by more than one other
object, avoiding to \keyword{gt_free()} the object's memory until the last
reference to the object has been dropped. In \Gt, \emph{reference counting} is
used to implement this behaviour. That is, each object keeps the number of
objects still keeping a reference on it in a local member variable. This is done
by each referencing object calling the object's \keyword{ref()} method to
announce that they now keep a reference, thus increasing the reference count.
When the object reference is no longer needed, the usual \keyword{delete()}
method is used. The \keyword{delete()} method checks and decreases the
reference count and defers free'ing the object's memory until the object is not
referenced by any other object any more. This makes reference counting a simple
form of garbage collection.

To add reference counting to a class, perform the following steps:
\begin{itemize}
\item Add an \keyword{unsigned int} counter variable called
\keyword{reference_count} to the private member variables of the class; this
variable must be initialized to 0 in the constructor.
\item Consider a class \emph{GtFoo} to which we want to add reference counting
capabilities. Then add a method \keyword{gt_<classname>_ref()}, in this case
\keyword{gt_foo_ref()} to the interface of the class:

\begin{lstlisting}
GtFoo* gt_foo_ref(GtFoo *f)
{
  gt_assert(f);
  f->reference_count++;
  return f;
}
\end{lstlisting}

\item
In the destructor, check the reference count and only free the memory when
necessary:

\begin{lstlisting}
void gt_foo_delete(GtFoo *f)
{
  if (!f) return;
  if (f->reference_count) {
    f->reference_count--;
    return;
  }
  gt_free(f);
}
\end{lstlisting}

\end{itemize}

With reference counted classes, always use the \keyword{ref()} method when
storing a reference to an object. That is, instead of writing

\begin{lstlisting}
void gt_bar_set_foo(GtBar *b, GtFoo *f)
{
  b->value = f;
}
\end{lstlisting}

use

\begin{lstlisting}
void gt_bar_set_foo(GtBar *b, GtFoo *f)
{
  b->value = gt_foo_ref(f);
}
\end{lstlisting}

Always remember to call \keyword{gt_foo_delete()} when the reference is
no longer needed! For example, in the assignment above, a good place to do this
is the destructor of the \keyword{GtBar} class.

\subsection{Library initialization/finalization}
Within the \Gt code, there are a number of global or static data which must be
properly initialized before using any \Gt functionality, and space for which
must be properly freed when done using the \Gt . This is usually done by the
runtime by calling initializers when a tool is run using the \keyword{gt}
binary.

Now consider that \Gt can also be used as a library, called
\keyword{libgenometools}. That means, it is possible to link an external code
with the static or shared object file and call functions from there, without
going through the tool runtime. It is now crucial that the necessary
initializations have taken place before using functions, and that the required
cleanup is done at the end.

There are two functions in the \keyword{core/init.[ch]} module in \Gt used to
accomplish this:
\begin{itemize}
\item
\keyword{gt_lib_init()}, which initializes all static data and should be called
before any other \Gt function, and
\item
\keyword{gt_lib_clean()}, which frees all static data. It returns 0 if no
memory map, file pointer, or memory has been leaked and a value other than 0
otherwise.
\end{itemize}
It is also possible to make the cleanup happen automatically when the program
using  the library exits. This is done by calling
\keyword{gt_lib_reg_atexit_func()} which registers an exit handler with the OS
which will call \keyword{gt_lib_clean()} automatically.

\section{Threads}
The \Gt contain functions allowing developers to write their programs in a
multi-threaded way, by wrapping the POSIX threads library \keyword{libpthread}.
See \keyword{core/thread\_api.h} for more information.

Any function with this signature:

\begin{lstlisting}
void* (*GtThreadFunc)(void *data);
\end{lstlisting}

can be enabled to be run concurrently by simply calling

\begin{lstlisting}
void *mythread(void *data)
{
 ...
}
gt_multithread(mythread, NULL, err);
\end{lstlisting}

Some more useful information:
\begin{itemize}
\item
For synchronization during parallel execution of multiple threads, \Gt provides
classes for mutexes and read-write-locks. See \keyword{core/thread\_api.h} for a
description of the interface of the \keyword{GtMutex} and \keyword{GtRWLock}
classes.
\item
If code must be conditionally compiled depending on thread support, use
\keyword{#ifdef} and friends with the \keyword{GT_THREADS_ENABLED} flag, which
is set by the compiler via a \keyword{-D} option when threading support is
enabled.
\item
Threading support is enabled at compile time by passing \keyword{threads=yes} to
the \keyword{make} call. If threading support is not enabled, all
\keyword{gt_multithread()} functions will run the thread function sequentially.
\item
If threading support is enabled, the number of concurrent jobs can be given
using the \keyword{-j} parameter to the \keyword{gt} binary. That is, to have
all multithreaded parts in the tool to be run use three threads at once, call
the tool with

\begin{lstlisting}
$ gt -j 3 <toolname> ...
\end{lstlisting}%$

\end{itemize}


\section{Testing and Debugging}
\subsection{Testing on the code level -- the unit tests}
\label{unittests}

Unit test check whether classes and their methods behave correctly when used in
a correct manner. The corresponding functions must be defined in the class
implementation file (so they get access to the private member variables of the
tested class) and adhere to the following interface:

\begin{lstlisting}
int  (*UnitTestFunc)(GtError *err);
\end{lstlisting}

They must return 0 if the test was successful and -1 if the test has failed.
The \keyword{gt_ensure} helper macro makes writing unit tests easier. To use it,
\keyword{#include} the file \keyword{core/ensure_api.h}. Then write your unit test:

\begin{lstlisting}
int gt_class_unit_test(GtError *err) /* must be called 'err'! */
{
  int had_err = 0;                   /* must be called 'had_err'! */
  gt_error_check(err);               /* will abort if error was already set */

  gt_ensure(1 + 1 == 2);    /* will succeed */
  gt_ensure(1 + 1 == 3);    /* will fail */

  return had_err;
}
\end{lstlisting}

Similarly to \keyword{gt_assert()}, if the expression given to
\keyword{gt_ensure()} as a parameter evaluates to false, the test will fail with
an error message, giving the location at which the first condition failed.
Because \keyword{gt_ensure()} is implemented as a macro relying on certain
naming conventions, it is mandatory that the \keyword{int} error indicator
variabel and the \keyword{GtError} object within the test function are called
\keyword{had_err} and \keyword{err}.

The unit tests are added to the test suite in the function
\keyword{gtt_unit_tests()} in \keyword{gtt.c} and loaded into the \Gt runtime
in the function \keyword{gtr_register_components()} in \keyword{gtr.c}.
That is, if your unit test function is \keyword{gt_class_unit_test()}, then you
should \keyword{#include} the header \keyword{class.h} (which contains the
function prototype) in \keyword{gtt.c} and add the following line in
\keyword{gtt_unit_tests()}:
\begin{lstlisting}
gt_hashmap_add(unit_tests, "example class", gt_class_unit_test);
\end{lstlisting}

The tests registered in this hash table can be executed on the command line
with:

\begin{lstlisting}[language=sh]
$ gt -test
\end{lstlisting}%$

It is also possible to run a single test from the test suite by using the
\keyword{-only} option:

\begin{lstlisting}[language=sh]
$ gt -test -only 'example class'
\end{lstlisting}%$

\subsection{Testing on the tool level -- the test suite}

While the unit tests check the correctness of the classes and modules on the
code level, the Ruby-based test suite is used to run tests on the tools
themselves. That means that they run tools with example data or invalid
parameters and check whether they behave correctly by looking at error levels,
error messages, and comparing output with reference data.

Test data and reference data are stored in the \keyword{testdata/} directory
of the \Gt source tree. Note that this directory is for smaller test data only.
Large files, such as whole chromosome annotations go into another repository
(see~\ref{gttestdata}).

\subsubsection{Test definitions}
\label{testdefinitions}

Tests are defined in the \keyword{gt_<toolname>_include.rb} files in the
\keyword{testsuite/} directory. They contain test definitions written in a
Ruby-based domain specific language. Here is an example of a simple test:

\begin{lstlisting}[language=Ruby]
Name "gt cds test (description range)"
Keywords "gt_cds usedesc"
Test do
  run_test "#{$bin}gt cds -usedesc -seqfile " +
           "#{$testdata}gt_cds_test_descrange.fas " +
           "#{$testdata}gt_cds_test_descrange.in"
  run "diff #{$last_stdout} #{$testdata}/gt_cds_test_descrange.out"
end
\end{lstlisting}%$

This test runs the \keyword{gt cds} command with example data and compares its
output on stdout with a reference file.

Every test case must have a \keyword{name} and can have a set of \emph{keywords}
associated with it, allowing for selective running of a subset of tests from
all testsuites. Keywords are separated by spaces.
The actual test code is given in the \keyword{Test} environment.
Within this environment, one may use the following constructs to define test
conditions:

\begin{itemize}
\item
\keyword{run_test(runstring, options)}, where
\begin{itemize}
\item
\keyword{runstring} is the tool commandline to run, and
\item
\keyword{options} are a hash specifying test constraints.
The key \keyword{:retval} specifies the expected error code for this run (0 is
the default). The option \keyword{:maxtime} specifies the maximal time in
seconds that the started program may run before it is killed, resulting in a
failed test (60 is the default). This allows one to detect infinite loops
without stopping the testing progress.
\end{itemize}
This command runs the command specified in \keyword{runstring}, and causes the
test to fail if the returned error code does not equal the expected one.
\item
\keyword{grep(file, pattern)} which searches for \keyword{pattern} in the
file \keyword{file}, failing the test if there is no match. The pattern can be
given as a regular expression.
\item
\keyword{run_ruby(rubyscript, options)} and
\keyword{run_python(pythonscript, options)} can be used to run tests on
external Ruby and Python scripts.
\item
Any Ruby code, such as custom functions, can be run inside the \keyword{Test}
environment. To fail a test case manually, use the \keyword{failtest(msg)}
command, where \keyword{msg} is the error message to fail with.
\end{itemize}
Inside test suite definitions, some useful paths are predefined to be
conveniently used in test runs (like \keyword{$bin} and \keyword{$testdata} in
the example above):
\begin{itemize}
\item
\keyword{$testdata}, the path to the \keyword{testdata/} directory,
\item
\keyword{$gttestdata}, the path to the location of the \keyword{gttestdata}
repository (see~\ref{gttestdata}),
\item
\keyword{$bin}, the path to the \Gt \keyword{bin/} directory,
\item
\keyword{$cur}, the path to the working directory of the testsuit, e.g.\@ the
directory from which \keyword{testsuite/testsuite.rb} was run,
\item
\keyword{$transdir}, the path to the \keyword{gtdata/trans} directory,
\item
\keyword{$obodir}, the path to the \keyword{gtdata/obo_files} directory,
\item
\keyword{$gtruby}, the path to the \keyword{gtruby/} directory,
\item
\keyword{$gtpython}, the path to the \keyword{gtpython/} directory,
\end{itemize}
It is also possible to get the standard output and standard error contents of
the last command run by referring to the files specified by
\keyword{last_stdout} and \keyword{last_stderr}.

Furthermore, each test commandline (let's say the $i$-th one in the test)
creates a set of \keyword{run_}$i$ (contains the actual command which was run),
\keyword{stdout_}$i$ (contains the standard output) an \keyword{stderr_}$i$
(contains the standard error output) files in the test directory (which is
\keyword{testsuite/stest_testsuite/test}$n$\keyword{/}, where $n$ is the test
number (printed in front of each test name).

\subsubsection{The \keyword{gttestdata} repository}
\label{gttestdata}

Large test or reference data must not be placed into the \Gt \keyword{testdata/}
directory because they would increase the size of the \Gt distribution too much.
For such data there is a separate repository, which is available via Git:

\begin{lstlisting}[language=sh]
$ git clone git://genometools.org/gttestdata.git
\end{lstlisting}%$

The location of the \keyword{gttestdata} repository must be given when running
the testsuite (see below). If it is not given, make sure that test which depend
on large test data are disabled (e.g. by placing them in an
`\keyword{if $gttestdata}' clause).%$

\subsubsection{Running the testsuite}
A comprehensive \Gt test run, containing both the unit tests and the tool tests,
can be initiated by issuing \keyword{make test} in the \Gt directory.
The following \keyword{make} switches influence the test runs:
\begin{itemize}
\item
\keyword{memcheck=yes} enables memory access checking via \emph{valgrind},
\item
\keyword{testthreads=<n>} enables multithreaded testing with \keyword{<n>}
threads in parallel to speed up test runs,
\item
\keyword{testrange=<range>} only runs tests with numbers within the given range,
which has to be given in Ruby Syntax i.e. \keyword{i..j}. It is also possible to
provide a list of numbers (divided by space, so use "" to encapsulate).
\item
\keyword{gttestdata=<path>} tells the test suite to look for large test data in
\keyword{<path>}. This must be where a copy of the \keyword{gttestdata}
repository is installed.

\end{itemize}

The tool tests can also be run using the \keyword{testsuite/testsuite.rb}
script. Use the \keyword{-keywords <keywords>} parameter to only run these
tests tagged with the given keywords. OR and AND operators can be used to
specify the tests in a more detailed way. The \keyword{-select <n>} parameter
can be used to run only the one test with number \keyword{n} (use
\keyword{-select <m..n>} for ranges), and the \keyword{-threads <n>} will run
the testsuite with \keyword{n} threads in parallel.

To make tests depending on randomized values reproducible, the test suite will
pick a RNG seed before starting any tests and will run all \keyword{gt}
invocations with the environment variable \keyword{GT\_SEED} set to this seed
value. This seed value is also output by \keyword{testsuite/testsuite.rb}. This
makes sure that all tests are run with a common random seed, instead of picking
a new one each time \keyword{gt} is run in a test.

\subsection{Header inclusion dependencies}
Often function prototypes in the \Gt header files use types declared in
another header files. By mistake, it is possible to forget
\keyword{#include}'ing the header files where the type is defined in the header
using it. Note that this problem may never surface if the forgotten header
is included in every C source file which includes the header file with the
missing include statement. To address this, the script
\keyword{scripts/src_check_header.rb} tries to include each header file given
as a command line argument by itself in a C file and compile it.
If dependencies are missing, the check will abort and output the compiler error
message so the problem can be fixed.

\subsection{Debug symbols}
Compilation with debug symbols is enabled by default.
To make sure that line numbers are correct when using a debugger, e.g.
\keyword{gdb}, use the \keyword{opt=no} option in the \keyword{make} call to
disable compile-time optimization. The \keyword{opt} option is enabled by
default.

\subsection{Profiling}
To enable the generation of profiling output in the compiled binaries, use the
\keyword{prof=yes} option in the \keyword{make} call. The \keyword{prof} option
is disabled by default. Enabling this option makes the \Gt binary create a
\keyword{gmon.out} file during each run, which can then be used for analysis
using \emph{gprof}\footnote{See
\url{http://sourceware.org/binutils/docs/gprof/index.html} for further
information.}.

\subsection{Logging}

Use the \keyword{gt_log_*()} functions in \keyword{core/log_api.h} to log debug
messages to the screen or files. Output of debugging information defined using
these functions can then be enabled or disabled via the \keyword{-debug} option
of the \keyword{gt} binary. That is, to run tool \keyword{mytool} with debug
output enabled, run

\begin{lstlisting}
$ gt -debug mytool
\end{lstlisting}%$

\section{Additional \keyword{make} parameters}

\subsection{Additional targets}
Simply running \emph{make} builds both the \keyword{gt} executable and the
\Gt shared library as 64 bit binaries. However, there are also other targets
(besides the ones mentioned in the respective sections above) which can be
built using the \Gt Makefile:
\begin{itemize}
\item
\keyword{docs}, which builds API documentation as web pages in \keyword{www} and
as \LaTeX\@ source in \keyword{doc/},
\item
\keyword{manuals}, which, in addition to the files created by \keyword{docs},
also creates manuals for some published tools in \Gt (see
\keyword{doc/manuals}),
\item
\keyword{install}, which installs the compiled \Gt binaries, libraries and
headers into the directory specified by the \keyword{prefix=<path>} option,
\item
\keyword{dist}, which creates a tarball with a binary \Gt distribution, which
will then reside in the \keyword{dist/} subdirectory of the \Gt root,
\item
\keyword{srcdist}, which creates a tarball with a source \Gt distribution, which
will then reside in the working directory,
\item
\keyword{spgt}, which checks selected files in the \keyword{core/} and
\keyword{match/} subdirectories using the splint static
checker\footnote{\url{http://www.splint.org}}, using the rule set
\keyword{testdata/SKsplintoptions},
\item
\keyword{clean}, which removes all files created during the build and test
processes, except the \keyword{lib} and \keyword{bin} directories, and
\item
\keyword{cleanup}, which even removes these.
\end{itemize}

\subsection{Additional options}
There are additional \keyword{make} options which are also mentioned in the
README file and which influence how the \Gt binaries are built:
\begin{itemize}
\item
Use \keyword{amalgamation=yes} to compile \Gt as an \emph{amalgamation}. That
means, all \Gt C source files are concatenated into a big source file, which is
then compiled. This approach allows the compiler to perform more extensive
optimizations during the compilation and may result in better performance.
It is encouraged to check regularly whether compiling \Gt as an amalgamation
still works, as name clashes in static functions can sometimes occur which
compile fine when in separate files, but lead to errors in the amalgamation.
This option is disabled by default.
\item
Use \keyword{errorcheck=no} to make the compilation process not stop when a
warning is encountered. This option should only be used if necessary (e.g.\@
when building \Gt on Windows). This option is enabled by default.
\item
Use \keyword{cairo=no} to disable Cairo support in the \emph{AnnotationSketch}
component of \Gt . This is useful on systems on which there is no Cairo library
present, and \emph{AnnotationSketch} is not needed. This option is enabled by
default.
\item
The option \keyword{sharedlib=no} disables building of a \Gt shared library.
This option is enabled by default.
\item
The option \keyword{static=yes} tries to link all dependencies of \Gt
statically. This option is disabled by default.
\item
The option \keyword{useshared=yes} ensures the \Gt build process does not use
the copies of the external \Gt dependencies included with the \Gt distribution
but rather relies on them being available system-wide on the build system.
This is a recommended option for building on a system where the building user
controls package management and can install/update system-wide libraries at will.
This option is disabled by default.
\item
The option \keyword{32bit=yes} (or likewise \keyword{64bit=no}) makes the build
system create a 32-bit version of the \Gt binaries. This option is disabled by
default.
\end{itemize}

\section{Contributing code}

For \Gt development, we use the distributed versioning system Git\footnote{For
a good introduction to the use of the Git software itself, see the Git web site
(\url{http://git-scm.com}) or read the following guide:
Travis Swicegood. \emph{Pragmatic Version Control Using Git}.
Pragmatic Bookshelf, ISBN 1934356158. We \emph{strongly} encourage future \Gt
developers to familiarize themselves with Git before developing with the intent
of submission!} to track changes and
exchange new code. Thus a Git repository is necessary to both:

\begin{itemize}
\item
obtain the latest development version of the \Gt , and
\item
contribute to the \Gt by submitting new code to the maintainers.
\end{itemize}

Be aware that, in this guide, we will not explain Git basic concepts, or how
individual Git commands work in detail. Instead, we will shortly state what
strategy is most effective when working in \Gt development.

\subsection{Getting started}
To get started with \Gt development, we recommend the following:
\begin{enumerate}
  \item Familiarize yourself with the \Gt development process at \url{http://genometools.org/contract.html}.
  \item Install the Git version control system.
  \item Read the Git documentation.
  \item Register a user account on GitHub (\url{https://github.com}).
  \item Fork the \Gt Git repository on GitHub at \url{https://github.com/genometools/genometools}.
  \item Clone a local version of your forked repo:
    \begin{lstlisting}[language=sh]
$ git clone git://github.com/<YourName>/genometools.git
    \end{lstlisting}%$
  \item Start hacking on your own feature branch:
    \begin{lstlisting}[language=sh]
$ cd genometools
$ git checkout -b my_feature_branch_name
    \end{lstlisting}
 \item Have fun!
\end{enumerate}

\subsection{Basic Git configuration}

Please set your username and email address correctly. If unconfigured, they are
often based on the hostname of the workstation where a commit is done. This may
not be -- and almost never is --  correct in typical development environments
(i.e.\@ \keyword{user@workstation.zbh.uni-hamburg.de} instead of
\keyword{user@maildomain.org}).

Use the \keyword{git config} commands while in your \Gt Git repository to set
them to a correct value:

\begin{lstlisting}
$ git config user.name "Hans Mustermann"
$ git config user.email "mustermann@maildomain.org"
\end{lstlisting}

\subsection{Tips for successful source management}

\begin{itemize}
\item
Develop each major feature or try out bigger changes in a separate branch
(the so-called \emph{feature branch})
dedicated only to that aspect. That makes it easier to combine or discard
branches later on, without having to meddle with individual commits too much
if something goes wrong. Creating, merging and deleting branches is cheap in
Git!
\item
Always leave your \emph{master} branch untouched so code pulled from upstream
(e.g.\@ the official \Gt repository) does not get merged by accident.
\item
Branch off new feature branches from the \emph{master} branch only. That makes
it easy to chain branches later via \keyword{git rebase} in any order.
\item
Try to keep commits atomic. Every commit should either add a single feature or
fix a single bug. That makes two things easier:
\begin{enumerate}
  \item
  Locating the exact commit which introduces a bug, e.g.\@ using
  \keyword{git bisect}. If there are too many changes in one commit, bugs
  become more tedious to track down.
  \item
  Reverting single commits if new features introduce bugs.
\end{enumerate}
If you made several incomplete commits and want to reorder or combine them into
one afterwards, use interactive rebasing via \keyword{git rebase
-i}\footnote{See \url{http://book.git-scm.com/4_interactive_rebasing.html} for
an explanation.}.
\item
Needless to say, every commit should compile cleanly. Again, bisecting can become
very tedious if the code has to be fixed at each stop to get it to even compile.
\item
In the first line of the commit message, give a short description of the
change contained in the commit.  Please use active, present tense,
e.g.\@ ``add feature X'' or ``allow X to do Y''.
Commit messages for commits that touch scripting
language bindings should be prefixed with the language in question,
e.g.\@ ``gtpython: add bindings for GtFoo class''.
\end{itemize}

\subsection{Submission of contributions}

This section describes how to get your contributions noticed, reviewed and
integrated into the main \Gt codebase.

\subsubsection{Source code submission}

To get your source code (which we assume to reside in your personal forked
GitHub repository) to be considered for inclusion into the \Gt official source
tree, file an issue in the \Gt issue
tracker\footnote{\url{https://github.com/genometools/genometools/issues}}
describing your proposed changes. Then issue a pull request from your
repository against the official \Gt repository. A maintainer will review your
contribution and merge it. After providing a working patch or feature, you will
eventually obtain maintainer status yourself and will be able (but not required
to) to review and merge pull requests from other contributors.

\textbf{Important:} Always rebase your code against the current
official \Gt master before requesting a pull (see above). Also, please check
whether your code compiles cleanly, even with the \keyword{amalgamation=yes}
and \keyword{assert=no} parameters enabled which may influence compilation
success.

\subsubsection{Test data submission}

For submissions to the \keyword{gttestdata} repository, the same rules apply
as for source code. Please provide a repository from which to pull a branch
which has been rebased against the current \keyword{gttestdata} master before.
Before adding any more test data to the repository, please make sure that
the new data is absolutely necessary. That is, existing large sequence should be
reused, for example when testing a sequence parser or the like.

\subsubsection{Licensing}

Note that the \Gt are free software, i.e.\@ an open-source project.
All code distributed with the \Gt is published under the ICS license,
which can be viewed at \url{http://genometools.org/license.html}. Submission of
code for inclusion into the \Gt implies your permission to publish your code
under this license. We will not accept contributions lacking proper
copyright information at the top of each source file (see~\ref{copyright})!

\end{document}