File: signatures.tex

package info (click to toggle)
clamav 0.98.7+dfsg-0+deb6u2
  • links: PTS, VCS
  • area: main
  • in suites: squeeze-lts
  • size: 60,204 kB
  • ctags: 49,129
  • sloc: cpp: 267,090; ansic: 152,211; sh: 35,196; python: 2,630; makefile: 2,220; perl: 1,690; pascal: 1,218; lisp: 184; csh: 117; xml: 38; asm: 32; exp: 4
file content (823 lines) | stat: -rw-r--r-- 37,663 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
\documentclass[a4paper,titlepage,12pt]{article}
\usepackage{amssymb}
\usepackage{pslatex}
\usepackage[dvips]{graphicx}
\usepackage{wrapfig}
\usepackage{url}
\date{}

\begin{document}

    \begin{center}
	\huge Creating signatures for ClamAV\\
	\vspace{2cm}
    \end{center}

    \noindent
    \section{Introduction}
    CVD (ClamAV Virus Database) is a digitally signed container that
    includes signature databases in various text formats. The header
    of the container is a 512 bytes long string with colon separated fields:
    \begin{verbatim}
ClamAV-VDB:build time:version:number of signatures:functionality
level required:MD5 checksum:digital signature:builder name:build
time (sec)
    \end{verbatim}
    \verb+sigtool --info+ displays detailed information about a given CVD file:
    \begin{verbatim}
zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd
File: main.cvd
Build time: 09 Dec 2007 15:50 +0000
Version: 45
Signatures: 169676
Functionality level: 21
Builder: sven
MD5: b35429d8d5d60368eea9630062f7c75a
Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8
JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY
eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh
Verification OK.
    \end{verbatim}
    The ClamAV project distributes a number of CVD files, including
    \emph{main.cvd} and \emph{daily.cvd}.

    \section{Debug information from libclamav}
    In order to create efficient signatures for ClamAV it's important
    to understand how the engine handles input files. The best way
    to see how it works is having a look at the debug information from
    libclamav. You can do it by calling \verb+clamscan+ with the
    \verb+--debug+ and \verb+--leave-temps+ flags. The first switch
    makes clamscan display all the interesting information from
    libclamav and the second one avoids deleting temporary files so
    they can be analyzed further. The now important part of the info
    is:
    \begin{verbatim}
$ clamscan --debug attachment.exe
[...]
LibClamAV debug: Recognized MS-EXE/DLL file
LibClamAV debug: Matched signature for file type PE
LibClamAV debug: File type: Executable
    \end{verbatim}
    The engine recognized a windows executable.
    \begin{verbatim}
LibClamAV debug: Machine type: 80386
LibClamAV debug: NumberOfSections: 3
LibClamAV debug: TimeDateStamp: Fri Jan 10 04:57:55 2003
LibClamAV debug: SizeOfOptionalHeader: e0
LibClamAV debug: File format: PE
LibClamAV debug: MajorLinkerVersion: 6
LibClamAV debug: MinorLinkerVersion: 0
LibClamAV debug: SizeOfCode: 0x9000
LibClamAV debug: SizeOfInitializedData: 0x1000
LibClamAV debug: SizeOfUninitializedData: 0x1e000
LibClamAV debug: AddressOfEntryPoint: 0x27070
LibClamAV debug: BaseOfCode: 0x1f000
LibClamAV debug: SectionAlignment: 0x1000
LibClamAV debug: FileAlignment: 0x200
LibClamAV debug: MajorSubsystemVersion: 4
LibClamAV debug: MinorSubsystemVersion: 0
LibClamAV debug: SizeOfImage: 0x29000
LibClamAV debug: SizeOfHeaders: 0x400
LibClamAV debug: NumberOfRvaAndSizes: 16
LibClamAV debug: Subsystem: Win32 GUI
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 0
LibClamAV debug: Section name: UPX0
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x1e000 0x1e000
LibClamAV debug: VirtualAddress: 0x1000 0x1000
LibClamAV debug: SizeOfRawData: 0x0 0x0
LibClamAV debug: PointerToRawData: 0x400 0x400
LibClamAV debug: Section's memory is executable
LibClamAV debug: Section's memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 1
LibClamAV debug: Section name: UPX1
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x9000 0x9000
LibClamAV debug: VirtualAddress: 0x1f000 0x1f000
LibClamAV debug: SizeOfRawData: 0x8200 0x8200
LibClamAV debug: PointerToRawData: 0x400 0x400
LibClamAV debug: Section's memory is executable
LibClamAV debug: Section's memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 2
LibClamAV debug: Section name: UPX2
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x1000 0x1000
LibClamAV debug: VirtualAddress: 0x28000 0x28000
LibClamAV debug: SizeOfRawData: 0x200 0x1ff
LibClamAV debug: PointerToRawData: 0x8600 0x8600
LibClamAV debug: Section's memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
    \end{verbatim}
    The section structure displayed above suggests the executable is
    packed with UPX.
    \begin{verbatim}
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x8470 (33904)
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming
                 compression
LibClamAV debug: UPX: bad magic - scanning for imports
LibClamAV debug: UPX: PE structure rebuilt from compressed file
LibClamAV debug: UPX: Successfully decompressed with NRV2B
LibClamAV debug: UPX/FSG: Decompressed data saved in
                 /tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede
LibClamAV debug: ***** Scanning decompressed file *****
LibClamAV debug: Recognized MS-EXE/DLL file
LibClamAV debug: Matched signature for file type PE
    \end{verbatim}
    Indeed, libclamav recognizes the UPX data and saves the decompressed
    (and rebuilt) executable into \verb+/tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede+.
    Then it continues by scanning this new file:
    \begin{verbatim}
LibClamAV debug: File type: Executable
LibClamAV debug: Machine type: 80386
LibClamAV debug: NumberOfSections: 3
LibClamAV debug: TimeDateStamp: Thu Jan 27 11:43:15 2011
LibClamAV debug: SizeOfOptionalHeader: e0
LibClamAV debug: File format: PE
LibClamAV debug: MajorLinkerVersion: 6
LibClamAV debug: MinorLinkerVersion: 0
LibClamAV debug: SizeOfCode: 0xc000
LibClamAV debug: SizeOfInitializedData: 0x19000
LibClamAV debug: SizeOfUninitializedData: 0x0
LibClamAV debug: AddressOfEntryPoint: 0x7b9f
LibClamAV debug: BaseOfCode: 0x1000
LibClamAV debug: SectionAlignment: 0x1000
LibClamAV debug: FileAlignment: 0x1000
LibClamAV debug: MajorSubsystemVersion: 4
LibClamAV debug: MinorSubsystemVersion: 0
LibClamAV debug: SizeOfImage: 0x26000
LibClamAV debug: SizeOfHeaders: 0x1000
LibClamAV debug: NumberOfRvaAndSizes: 16
LibClamAV debug: Subsystem: Win32 GUI
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 0
LibClamAV debug: Section name: .text
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0xc000 0xc000
LibClamAV debug: VirtualAddress: 0x1000 0x1000
LibClamAV debug: SizeOfRawData: 0xc000 0xc000
LibClamAV debug: PointerToRawData: 0x1000 0x1000
LibClamAV debug: Section contains executable code
LibClamAV debug: Section's memory is executable
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 1
LibClamAV debug: Section name: .rdata
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x2000 0x2000
LibClamAV debug: VirtualAddress: 0xd000 0xd000
LibClamAV debug: SizeOfRawData: 0x2000 0x2000
LibClamAV debug: PointerToRawData: 0xd000 0xd000
LibClamAV debug: ------------------------------------
LibClamAV debug: Section 2
LibClamAV debug: Section name: .data
LibClamAV debug: Section data (from headers - in memory)
LibClamAV debug: VirtualSize: 0x17000 0x17000
LibClamAV debug: VirtualAddress: 0xf000 0xf000
LibClamAV debug: SizeOfRawData: 0x17000 0x17000
LibClamAV debug: PointerToRawData: 0xf000 0xf000
LibClamAV debug: Section's memory is writeable
LibClamAV debug: ------------------------------------
LibClamAV debug: EntryPoint offset: 0x7b9f (31647)
LibClamAV debug: Bytecode executing hook id 257 (0 hooks)
attachment.exe: OK
[...]
    \end{verbatim}
    No additional files get created by libclamav. By writing
    a signature for the decompressed file you have more chances
    that the engine will detect the target data when it gets
    compressed with another packer.

    This method should be applied to all files for which you want
    to create signatures. By analyzing the debug information you
    can quickly see how the engine recognizes and preprocesses
    the data and what additional files get created. Signatures
    created for bottom-level temporary files are usually more
    generic and should help detecting the same malware in
    different forms.

    \section{Signature formats}

    \subsection{Hash-based signatures}
    The easiest way to create signatures for ClamAV is to use filehash checksums,
    however this method can be only used against static malware.
    \subsubsection{MD5 hash-based signatures}
    To create a
    MD5 signature for \verb+test.exe+ use the \verb+--md5+ option of sigtool:
    \begin{verbatim}
zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
zolw@localhost:/tmp/test$ cat test.hdb 
48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
    \end{verbatim}
    That's it! The signature is ready for use:
    \begin{verbatim}
zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe 
test.exe: test.exe FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Scanned directories: 0
Engine version: 0.92.1
Scanned files: 1
Infected files: 1
Data scanned: 0.02 MB
Time: 0.024 sec (0 m 0 s)
    \end{verbatim}
    You can change the name (by default sigtool uses the name of the file)
    and place it inside a \verb+*.hdb+ file. A single database file can
    include any number of signatures. To get them automatically loaded
    each time clamscan/clamd starts just copy the database file(s) into
    the local virus database directory (eg. /usr/local/share/clamav).

    \emph{The hash-based signatures shall not be used for text files,
    HTML and any other data that gets internally preprocessed before
    pattern matching. If you really want to use a hash signature in
    such a case, run clamscan with --debug and --leave-temps flags
    as described above and create a signature for a preprocessed file
    left in /tmp. Please keep in mind that a hash signature will stop
    matching as soon as a single byte changes in the target file.}

    \subsubsection{SHA1 and SHA256 hash-based signatures}
    ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums.
    The format is the same as for MD5 file checksum. 
    It can differentiate between them based on the length of the hash string
    in the signature. For best backwards compatibility, these should be
    placed inside a \verb+*.hsb+ file. The format is:
    \begin{verbatim}
HashString:FileSize:MalwareName
    \end{verbatim}

    \subsubsection{PE section based hash signatures}
    You can create a hash signature for a specific section in a PE file.
    Such signatures shall be stored inside \verb+.mdb+ files in the
    following format:
    \begin{verbatim}
PESectionSize:PESectionHash:MalwareName
    \end{verbatim}
    The easiest way to generate MD5 based section signatures is to extract
    target PE sections into separate files and then run sigtool with the
    option \verb+--mdb+

    ClamAV 0.98 has also added support for SHA1 and SHA256 section based
    signatures. The format is the same as for MD5 PE section based signatures.
    It can differentiate between them based on the length of the hash string
    in the signature. For best backwards compatibility, these should be
    placed inside a \verb+*.msb+ file.

    \subsubsection{Hash signatures with unknown size}
    ClamAV 0.98 has also added support for hash signatures where the size
    is not known but the hash is. It is much more performance-efficient to
    use signatures with specific sizes, so be cautious when using this
    feature. For these cases, the '*' character can be used in the size
    field. To ensure proper backwards compatibility with older versions of
    ClamAV, these signatures must have a minimum functional level of 73 or
    higher. Signatures that use the wildcard size without this level set
    will be rejected as malformed.
    \begin{verbatim}
Sample .hsb signature matching any size
HashString:*:MalwareName:73

Sample .msb signature matching any size
*:PESectionHash:MalwareName:73
    \end{verbatim}

    \subsection{Body-based signatures}
    ClamAV stores all body-based signatures in a hexadecimal format. In this
    section by a hex-signature we mean a fragment of malware's body converted
    into a hexadecimal string which can be additionally extended using various
    wildcards.

    \subsubsection{Hexadecimal format}
    You can use \verb+sigtool --hex-dump+ to convert any data into a hex-string:
    \begin{verbatim}
zolw@localhost:/tmp/test$ sigtool --hex-dump
How do I look in hex?
486f7720646f2049206c6f6f6b20696e206865783f0a
    \end{verbatim}

    \subsubsection{Wildcards}
    ClamAV supports the following extensions for hex-signatures:
    \begin{itemize}
	\item \verb+??+\\
	Match any byte.
	\item \verb+a?+\\
	Match a high nibble (the four high bits).\\ \textbf{IMPORTANT NOTE:}
	The nibble matching is only available in libclamav with the
	functionality level 17 and higher therefore please only use it with
	.ndb signatures followed by ":17" (MinEngineFunctionalityLevel,
	see \ref{ndb}).
	\item \verb+?a+\\
	Match a low nibble (the four low bits).
	\item \verb+*+\\
	Match any number of bytes.
	\item \verb+{n}+\\
	Match $n$ bytes.
	\item \verb+{-n}+\\
	Match $n$ or less bytes.
	\item \verb+{n-}+\\
	Match $n$ or more bytes.
	\item \verb+{n-m}+\\
	Match between $n$ and $m$ bytes ($m > n$).
	\item \verb+(aa|bb|cc|..)+\\
	Match aa or bb or cc..
	\item \verb+!(aa|bb|cc|..)+\\
	Match any byte except aa and bb and cc.. (ClamAV$\ge$0.96)
	\item \verb+(aaaa|bbbb|cccc|..)+\\
	Match alternative strings aaaa or bbbb or cccc. Alternative strings must have identical lengths.
	\item \verb+!(aaaa|bbbb|cccc|..)+\\
	Match any string except aaaa and bbbb and cccc. Alternative strings must have identical lengths.
        (ClamAV$\ge$0.98.2)
	\item \verb+HEXSIG[x-y]aa+ or \verb+aa[x-y]HEXSIG+\\
	Match aa anchored to a hex-signature, see
	\url{https://bugzilla.clamav.net/show_bug.cgi?id=776} for
	discussion and examples.
	\item \verb+(B)+\\
	Match word boundary (including file boundaries).
	\item \verb+(L)+\\
	Match CR, CRLF or file boundaries.
    \end{itemize}
    The range signatures \verb+*+ and \verb+{}+ virtually separate
    a hex-signature into two parts, eg. \verb+aabbcc*bbaacc+ is treated
    as two sub-signatures \verb+aabbcc+ and \verb+bbaacc+ with any number
    of bytes between them. It's a requirement that each sub-signature
    includes a block of two static characters somewhere in its body.

    \subsubsection{Basic signature format}
    The simplest (and now deprecated) signature format is:
    \begin{verbatim}
MalwareName=HexSignature
    \end{verbatim}
    ClamAV will scan the entire file looking for HexSignature. All
    signatures of this type must be placed inside \verb+*.db+ files.

    \subsubsection{Extended signature format}\label{ndb}
    The extended signature format allows for specification of additional
    information such as a target file type, virus offset or engine version,
    making the detection more reliable. The format is:
    \begin{verbatim}
MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]
    \end{verbatim}
    where \verb+TargetType+ is one of the following numbers specifying
    the type of the target file:
    \begin{itemize}
	\item 0 = any file
	\item 1 = Portable Executable, both 32- and 64-bit.
	\item 2 = file inside OLE2 container (e.g. image, embedded executable,
	VBA script). The OLE2 format is primarily used by MS Office and MSI
	installation files.
	\item 3 = HTML (normalized: whitespace transformed to spaces, tags/tag
	attributes normalized, all lowercase), Javascript is normalized too:
	all strings are normalized (hex encoding is decoded), numbers are
	parsed and normalized, local variables/function names are normalized
	to 'n001' format, argument to eval() is parsed as JS again,
	unescape() is handled, some simple JS packers are handled,
	output is whitespace normalized.
	\item 4 = Mail file
	\item 5 = Graphics
	\item 6 = ELF
	\item 7 = ASCII text file (normalized)
	\item 8 = Unused
	\item 9 = Mach-O files
	\item 10 = PDF files
	\item 11 = Flash files
	\item 12 = Java class files
    \end{itemize}
    And	\verb+Offset+ is an asterisk or a decimal number \verb+n+ possibly
    combined with a special modifier:
    \begin{itemize}
	\item \verb+*+ = any
	\item \verb+n+ = absolute offset
	\item \verb+EOF-n+ = end of file minus \verb+n+ bytes
    \end{itemize}
    Signatures for PE, ELF and Mach-O files additionally support:
    \begin{itemize}
	\item \verb#EP+n# = entry point plus n bytes (\verb#EP+0# for \verb+EP+)
	\item \verb#EP-n# = entry point minus n bytes
	\item \verb#Sx+n# = start of section \verb+x+'s (counted from 0)
	data plus \verb+n+ bytes
	\item \verb#SEx# = entire section \verb+x+ (offset must lie within section
	boundaries)
	\item \verb#SL+n# = start of last section plus \verb+n+ bytes
    \end{itemize}
    All the above offsets except \verb+*+ can be turned into
    \textbf{floating offsets} and represented as \verb+Offset,MaxShift+ where
    \verb+MaxShift+ is an unsigned integer. A floating offset will match every
    offset between \verb+Offset+ and \verb#Offset+MaxShift#, eg. \verb+10,5+
    will match all offsets from 10 to 15 and \verb#EP+n,y# will match all
    offsets from \verb#EP+n# to \verb#EP+n+y#. Versions of ClamAV older than
    0.91 will silently ignore the \verb+MaxShift+ extension and only use
    \verb+Offset+.\\

    \noindent
    Optional \verb+MinFL+ and \verb+MaxFL+ parameters can restrict the signature
    to specific engine releases. All signatures in the extended format must be
    placed inside \verb+*.ndb+ files.

    \subsubsection{Logical signatures}\label{ndb}
    Logical signatures allow combining of multiple signatures in extended
    format using logical operators. They can provide both more detailed and
    flexible pattern matching. The logical sigs are stored inside \verb+*.ldb+
    files in the following format:
    \begin{verbatim}
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
Subsig1;Subsig2;...
    \end{verbatim}
    where:
    \begin{itemize}
	\item \verb+TargetDescriptionBlock+ provides information about the
	engine and target file with comma separated \verb+Arg:Val+ pairs,
	currently (as of 0.95.1) only \verb+Target:X+ and \verb+Engine:X-Y+
	are supported.
	\item \verb+LogicalExpression+ specifies the logical expression
	describing the relationship between \verb+Subsig0...SubsigN+.\\
	\textbf{Basis clause:} 0,1,...,N decimal indexes are SUB-EXPRESSIONS
	representing \verb+Subsig0, Subsig1,...,SubsigN+ respectively.\\
	\textbf{Inductive clause:} if \verb+A+ and \verb+B+ are
	SUB-EXPRESSIONS and \verb+X, Y+ are decimal numbers then
	\verb+(A&B)+, \verb+(A|B)+, \verb+A=X+, \verb+A=X,Y+, \verb+A>X+,
	\verb+A>X,Y+, \verb+A<X+ and \verb+A<X,Y+ are SUB-EXPRESSIONS
	\item \verb+SubsigN+ is n-th subsignature in extended format possibly
	preceded with an offset. There can be specified up to 64 subsigs.
    \end{itemize}
    Keywords used in \verb+TargetDescriptionBlock+:
    \begin{itemize}
	\item \verb+Target:X+: Target file type
	\item \verb+Engine:X-Y+: Required engine functionality (range; 0.96)
	\item \verb+FileSize:X-Y+: Required file size (range in bytes; 0.96)
	\item \verb+EntryPoint+: Entry point offset (range in bytes; 0.96)
	\item \verb+NumberOfSections+: Required number of sections in executable (range; 0.96)
	\item \verb+Container:CL_TYPE_*+: File type of the container which stores the scanned file
	\item \verb+IconGroup1+: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
	\item \verb+IconGroup2+: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
    \end{itemize}
    Modifiers for subexpressions:
    \begin{itemize}
	\item \verb+A=X+: If the SUB-EXPRESSION A refers to a single signature
	then this signature must get matched exactly X times; if it refers to
	a (logical) block of signatures then this block must generate exactly
	X matches (with any of its sigs).
	\item \verb+A=0+ specifies negation (signature or block of signatures
	cannot be matched)
	\item \verb+A=X,Y+: If the SUB-EXPRESSION A refers to a single signature
	then this signature must be matched exactly X times; if it refers to
	a (logical) block of signatures then this block must generate X matches
	and at least Y different signatures must get matched.
	\item \verb+A>X+: If the SUB-EXPRESSION A refers to a single signature
	then this signature must get matched more than X times; if it refers to
	a (logical) block of signatures then this block must generate more
	than X matches (with any of its sigs).
	\item \verb+A>X,Y+: If the SUB-EXPRESSION A refers to a single signature
	then this signature must get matched more than X times; if it refers to
	a (logical) block of signatures then this block must generate more than
	X matches and at least Y different signatures must be matched.
	\item \verb+A<X+ and \verb+A<X,Y+ as above with the change of "more"
	to "less".
    \end{itemize}
    Examples:
    \begin{verbatim}
Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
6616e;deadbeef

Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
46566616e  

Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
46566616e;deadbeef

Sig4;Target:1;Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100
f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
(63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
    \end{verbatim}
    Macro subsignatures(clamav-0.96): \verb+${min-max}MACROID$+:
    \begin{itemize}
      \item Macro subsignatures are used to combine a number of existing extended
      signatures (\verb+.ndb+) into a on-the-fly generated alternate string logical
      signature (\verb+.ldb+).
    \end{itemize}
    Example:
    \begin{verbatim}
      test.ldb:
        TestMacro;Target:0;0&1;616161;${6-7}12$

      test.ndb:
        D1:0:$12:626262
        D2:0:$12:636363
        D3:0:$30:626264
    \end{verbatim}
    The example logical signature \verb+TestMacro+ is functionally equivalent to:\\
    \verb+TestMacro;Target:0;0;616161{3-4}(626262|636363)+
    \begin{itemize}
	\item \verb+MACROID+ points to a group of signatures; there can be at most 32 macro groups.
      \begin{itemize}
      \item In the example, \verb+MACROID+ is \verb+12+ and both \verb+D1+ and \verb+D2+ are members 
        of macro group \verb+12+. \verb+D3+ is a member of separate macro group \verb+30+.
      \end{itemize}
    \item \verb+{min-max}+ specifies the offset range at which one of the group signatures should match;
      the offset range is relative to the starting offset of the preceding subsignature. This means a
      macro subsignature cannot be the first subsignature.
      \begin{itemize}
      \item In the example, \verb+{min-max}+ is \verb+{6-7}+ and it is relative to the start of a \verb+616161+ match.
      \end{itemize}
	\item For more information and examples please see \url{https://wwws.clamav.net/bugzilla/show_bug.cgi?id=164}.
    \end{itemize}

    \subsection{Icon signatures for PE files}
    ClamAV 0.96 includes an approximate/fuzzy icon matcher to help
    detecting malicious executables disguising themselves as innocent
    looking image files, office documents and the like.

    Icon matching is only triggered via .ldb signatures using the special
    attribute tokens \verb+IconGroup1+ or \verb+IconGroup2+. These identify
    two (optional) groups of icons defined in a .idb database file. The
    format of the .idb file is:
    \begin{verbatim}
ICONNAME:GROUP1:GROUP2:ICON_HASH
    \end{verbatim}
    where:
    \begin{itemize}
	\item \verb+ICON_NAME+ is a unique string identifier for a specific
	icon,
	\item \verb+GROUP1+ is a string identifier for the first group of
	icons (\verb+IconGroup1+)
	\item \verb+GROUP2+ is a string identifier for the second group of
	icons (\verb+IconGroup2+),
	\item \verb+ICON_HASH+ is a fuzzy hash of the icon image
    \end{itemize}
    The \verb+ICON_HASH+ field can be obtained from the debug output of
    libclamav. For example:
    \begin{verbatim}
LibClamAV debug: ICO SIGNATURE:
ICON_NAME:GROUP1:GROUP2:18e2e0304ce60a0cc3a09053a30000414100057e
000afe0000e 80006e510078b0a08910d11ad04105e0811510f084e01040c080
a1d0b0021000a39002a41
    \end{verbatim}

    \subsection{Signatures for Version Information metadata in PE files}
    Starting with ClamAV 0.96 it is possible to easily match certain
    information built into PE files (executables and dynamic link libraries).
    Whenever you lookup the properties of a PE executable file in windows,
    you are presented with a bunch of details about the file itself.

    These info are stored in a special area of the file resources which goes
    under the name of \verb+VS_VERSION_INFORMATION+ (or versioninfo for short).
    It is divided into 2 parts. The first part (which is rather uninteresting)
    is really a bunch of numbers and flags indicating the product and file
    version. It was originally intended for use with installers which, after
    parsing it, should be able to determine whether a certain executable or
    library are to be upgraded/overwritten or are already up to date. Suffice
    to say, this approach never really worked and is generally never used.

    The second block is much more interesting: it is a simple list of key/value
    strings, intended for user information and completely ignored by the OS.
    For example, if you look at ping.exe you can see the company being \emph{"Microsoft
    Corporation"}, the description \emph{"TCP/IP Ping command"}, the internal name
    \emph{"ping.exe"} and so on... Depending on the OS version, some keys may be given
    peculiar visibility in the file properties dialog, however they are internally
    all the same.

    To match a versioninfo key/value pair, the special file offset anchor \verb+VI+ was
    introduced.  This is similar to the other anchors (like \verb+EP+ and \verb+SL+)
    except that, instead of matching the hex pattern against a single offset, it checks
    it against each and every key/value pair in the file. The \verb+VI+ token doesn't
    need nor accept a \verb#+/-# offset like e.g. \verb#EP+1#. As for the hex signature
    itself, it's just the utf16 dump of the key and value. Only the \verb+??+ and
    \verb+(aa|bb)+ wildcards are allowed in the signature. Usually, you don't need to
    bother figuring it out: each key/value pair together with the corresponding VI-based
    signature is printed by \verb+clamscan+ when the \verb+--debug+ option is given.

    For example \verb+clamscan --debug freecell.exe+ produces:
    \begin{verbatim}
[...]
Recognized MS-EXE/DLL file
in cli_peheader
versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
cli_peheader: parsing version info @ rva 9608 (1/1)
VersionInfo (d2de): 'CompanyName'='Microsoft Corporation' -
VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
630072006f0073006f0066007400200043006f00720070006f0072006100740
069006f006e000000
VersionInfo (d32a): 'FileDescription'='Entertainment Pack
FreeCell Game' - VI:460069006c006500440065007300630072006900700
0740069006f006e000000000045006e007400650072007400610069006e006d
0065006e00740020005000610063006b0020004600720065006500430065006
c006c002000470061006d0065000000
VersionInfo (d396): 'FileVersion'='5.1.2600.0 (xpclient.010817
-1148)' - VI:460069006c006500560065007200730069006f006e00000000
0035002e0031002e0032003600300030002e003000200028007800700063006
c00690065006e0074002e003000310030003800310037002d00310031003400
380029000000
VersionInfo (d3fa): 'InternalName'='freecell' - VI:49006e007400
650072006e0061006c004e0061006d006500000066007200650065006300650
06c006c000000
VersionInfo (d4ba): 'OriginalFilename'='freecell' - VI:4f007200
6900670069006e0061006c00460069006c0065006e0061006d0065000000660
0720065006500630065006c006c000000
VersionInfo (d4f6): 'ProductName'='Sistema operativo Microsoft
Windows' - VI:500072006f0064007500630074004e0061006d00650000000
000530069007300740065006d00610020006f00700065007200610074006900
76006f0020004d006900630072006f0073006f0066007400ae0020005700690
06e0064006f0077007300ae000000
VersionInfo (d562): 'ProductVersion'='5.1.2600.0' - VI:50007200
6f006400750063007400560065007200730069006f006e00000035002e00310
02e0032003600300030002e0030000000
[...]
    \end{verbatim}
Although VI-based signatures are intended for use in logical signatures you can test them
using ordinary \verb+.ndb+ files. For example:
    \begin{verbatim}
my_test_vi_sig:1:VI:paste_your_hex_sig_here
    \end{verbatim}
Final note. If you want to decode a VI-based signature into a human readable form you can use:
    \begin{verbatim}
echo hex_string | xxd -r -p | strings -el
    \end{verbatim}
For example:
    \begin{verbatim}
$ echo 460069006c0065004400650073006300720069007000740069006f006e
000000000045006e007400650072007400610069006e006d0065006e007400200
05000610063006b0020004600720065006500430065006c006c00200047006100
6d0065000000 | xxd -r -p | strings -el
FileDescription
Entertainment Pack FreeCell Game
    \end{verbatim}

    \subsection{Trusted and Revoked Certificates}
    Clamav 0.98 checks signed PE files for certificates and verifies each
    certificate in the chain against a database of trusted and revoked
    certificates. The sinagure format is
\begin{verbatim}
Name;Trusted;Subject;Serial;Pubkey;Exponent;CodeSign;TimeSign;CertSign;
NotBefore;Comment[;minFL[;maxFL]]
\end{verbatim}
    where the corresponding fields are:
    \begin{itemize}
        \item \verb+Name:+ name of the entry
        \item \verb+Trusted:+ bit field, specifying whether the cert is
            trusted. 1 for trusted. 0 for revoked
        \item \verb+Subject:+ sha1 of the Subject field in hex
        \item \verb+Serial:+ the serial number as clamscan --debug --verbose
            reports
        \item \verb+Pubkey:+ the public key in hex
        \item \verb+Exponent:+ the exponent in hex. Currently ignored and
            hardcoded to 010001 (in hex)
        \item \verb+CodeSign:+ bit field, specifying whether this cert
            can sign code. 1 for true, 0 for false
        \item \verb+TimeSign:+ bit field. 1 for true, 0 for false
        \item \verb+CertSign:+ bit field, specifying whether this cert
            can sign other certs. 1 for true, 0 for false
        \item \verb+NotBefore:+ integer, cert should not be added before
            this variable. Defaults to 0 if left empty
        \item \verb+Comment:+ comments for this entry
    \end{itemize}
    The signatures for certs are stored inside \verb+.crb+ files.

    \subsection{Signatures based on container metadata}
    ClamAV 0.96 allows creating generic signatures matching files stored
    inside different container types which meet specific conditions.
    The signature format is
\begin{verbatim}
VirusName:ContainerType:ContainerSize:FileNameREGEX:
FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
Res1:Res2[:MinFL[:MaxFL]]
\end{verbatim}
    where the corresponding fields are:
    \begin{itemize}
	\item \verb+VirusName:+ Virus name to be displayed when signature matches
	\item \verb+ContainerType:+ one of \verb+CL_TYPE_ZIP+, \verb+CL_TYPE_RAR+,
	\verb+CL_TYPE_ARJ+,\\
	\verb+CL_TYPE_CAB+, \verb+CL_TYPE_7Z+, \verb+CL_TYPE_MAIL+, \verb+CL_TYPE_(POSIX|OLD)_TAR+,\\
	\verb+CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)+ or \verb+*+ to match
	any of the container types listed here
	\item \verb+ContainerSize:+ size of the container file itself (eg. size of
	the zip archive) specified in bytes as absolute value or range \verb+x-y+
	\item \verb+FileNameREGEX:+ regular expression describing name of the target file
	\item \verb+FileSizeInContainer:+ usually compressed size; for MAIL, TAR and CPIO ==
	\verb+FileSizeReal+; specified in bytes as absolute value or range
	\item \verb+FileSizeReal:+ usually uncompressed size; for MAIL, TAR and CPIO ==
	\verb+FileSizeInContainer+; absolute value or range
	\item \verb+IsEncrypted+: 1 if the target file is encrypted, 0 if it's not and
	\verb+*+ to ignore
	\item \verb+FilePos+: file position in container (counting from 1); absolute value
	or range
	\item \verb+Res1+: when \verb+ContainerType+ is \verb+CL_TYPE_ZIP+ or
	\verb+CL_TYPE_RAR+ this field is treated as a CRC sum of the target file
	specified in hexadecimal format; for other container types it's ignored
	\item \verb+Res2+: not used as of ClamAV 0.96
    \end{itemize}
    The signatures for container files are stored inside \verb+.cdb+ files.

    \subsection{Signatures based on ZIP/RAR metadata (obsolete)}
    The (now obsolete) archive metadata signatures can be only applied
    to ZIP and RAR files and have the following format:
\begin{verbatim}
virname:encrypted:filename:normal size:csize:crc32:cmethod:
fileno:max depth
\end{verbatim}
    where the corresponding fields are:
    \begin{itemize}
	\item Virus name
	\item Encryption flag (1 -- encrypted, 0 -- not encrypted)
	\item File name (this is a regular expression - * to ignore)
	\item Normal (uncompressed) size (* to ignore)
	\item Compressed size (* to ignore)
	\item CRC32 (* to ignore)
	\item Compression method (* to ignore)
	\item File position in archive (* to ignore)
	\item Maximum number of nested archives (* to ignore)
    \end{itemize}
    The database file should have the extension of \verb+.zmd+ or
    \verb+.rmd+ for zip or rar metadata respectively.

    \subsection{Whitelist databases}
    To whitelist a specific file use the MD5 signature format and place
    it inside a database file with the extension of \verb+.fp+.
    To whitelist a specific file with the SHA1 or SHA256 file hash signature
    format, place the signature inside a database file with the extension
    of \verb+.sfp+.\\

    \noindent
    To whitelist a specific signature from the database you just add
    its name into a local file called local.ign2 stored inside the
    database directory. You can additionally follow the signature name
    with the MD5 of the entire database entry for this signature, eg:
\begin{verbatim}
Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
\end{verbatim}
    In such a case, the signature will no longer be whitelisted when
    its entry in the database gets modified (eg. the signature gets
    updated to avoid false alerts).

    \subsection{Signature names}
    ClamAV uses the following prefixes for signature names:
    \begin{itemize}
	\item \emph{Worm} for Internet worms
	\item \emph{Trojan} for backdoor programs
	\item \emph{Adware} for adware
	\item \emph{Flooder} for flooders
        \item \emph{HTML} for HTML files
        \item \emph{Email} for email messages
        \item \emph{IRC} for IRC trojans
	\item \emph{JS} for Java Script malware
	\item \emph{PHP} for PHP malware
	\item \emph{ASP} for ASP malware
	\item \emph{VBS} for VBS malware
	\item \emph{BAT} for BAT malware
	\item \emph{W97M}, \emph{W2000M} for Word macro viruses
	\item \emph{X97M}, \emph{X2000M} for Excel macro viruses
	\item \emph{O97M}, \emph{O2000M} for generic Office macro viruses
	\item \emph{DoS} for Denial of Service attack software
	\item \emph{DOS} for old DOS malware
	\item \emph{Exploit} for popular exploits
	\item \emph{VirTool} for virus construction kits
	\item \emph{Dialer} for dialers
	\item \emph{Joke} for hoaxes
    \end{itemize}
    Important rules of the naming convention:
    \begin{itemize}
	\item always use a -zippwd suffix in the malware name for signatures
	      of type zmd,
	\item always use a -rarpwd suffix in the malware name for signatures
	      of type rmd,
	\item only use alphanumeric characters, dash (-), dot (.), underscores
	      (\_) in malware names, never use space, apostrophe or quote mark.
    \end{itemize}

    \section{Special files}

    \subsection{HTML}
    ClamAV contains a special HTML normalisation code which helps to detect
    HTML exploits. Running \verb+sigtool --html-normalise+ on a HTML file
    should generate the following files:
    \begin{itemize}
	\item nocomment.html - the file is normalized, lower-case, with all
	comments and superflous white space removed
	\item notags.html - as above but with all HTML tags removed
    \end{itemize}
    The code automatically decodes JScript.encode parts and char ref's (e.g.
    \verb+&#102;+). You need to create a signature against one of the created
    files. To eliminate potential false positive alerts the target type should
    be set to 3.

    \subsection{Text files}
    Similarly to HTML all ASCII text files get normalized (converted
    to lower-case, all superflous white space and control characters removed,
    etc.) before scanning. Use \verb+clamscan --leave-temps+ to obtain
    a normalized file then create a signature with the target type 7.

    \subsection{Compressed Portable Executable files}
    If the file is compressed with UPX, FSG, Petite or other PE packer
    supported by libclamav, run \verb+clamscan+ with
    \verb+--debug --leave-temps+. Example output for a FSG compressed file:
    \begin{verbatim}
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression
LibClamAV debug: FSG: found old EP @119e0
LibClamAV debug: FSG: Unpacked and rebuilt executable saved in
/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c
    \end{verbatim}
    Next create a type 1 signature for \verb+/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c+

\end{document}