File: How-to-write-a-parser.md

package info (click to toggle)
plaso 20201007-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 519,924 kB
  • sloc: python: 79,002; sh: 629; xml: 72; sql: 14; vhdl: 11; makefile: 10
file content (129 lines) | stat: -rw-r--r-- 3,845 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# How to write a parser

## Introduction

This page is intended to give you an introduction into developing a parser for
Plaso.

* First a step-by-step example is provided to create a simple binary parser
for the Safari Cookies.binarycookies file.
* At bottom are some common troubleshooting tips that others have run into
before you.

This page assumes you have at least a basic understanding of programming in
Python and use of git.

## Format

Before you can write a binary file parser you will need to have a good
understanding of the file format. A description of the
Safari Cookies.binarycookies format can be found
[here](https://github.com/libyal/dtformats/blob/master/documentation/Safari%20Cookies.asciidoc).

## Parsers vs. Plugins
Before starting work on a parser, check if Plaso already has a parser that
handles the underlying format of the file you're parsing. Plaso currently
supports plugins for the following file formats:
* Bencode
* Compound zip files
* Web Browser Cookies
* ESEDB
* OLECF
* Plist
* SQLite
* [Syslog](How-to-write-a-Syslog-plugin.md)
* Windows Registry

If the artifact you're trying to parse is in one of these formats, you need to
write a plugin of the appropriate type, rather than a parser.

For our example, however, the Safari Cookies.binarycookies file is in its own
binary format, so a separate parser is appropriate.

## Test data

First we make a representative test file and add it to the `test_data/`
directory, in our example:
```
test_data/Cookies.binarycookies
```

**Make sure that the test file does not contain sensitive or copyrighted
material.**

## Parsers, formatters, events and event data

* parser; a subclass of [FileObjectParser](../api/plaso.parsers.html#plaso.parsers.interface.FileObjectParser)
 that extracts events from the content of a file.
* formatter (or event formatter); a subclass of
[EventFormatter](../api/plaso.formatters.html#plaso.formatters.interface.EventFormatter) which generates a human readable
description of the event data.
* event; a subclass of [EventObject](../api/plaso.containers.html#plaso.containers.events.EventObject) which represents
[an event](Scribbles-about-events.md#what-is-an-event)
* event data; a subclass of [EventData](../api/plaso.containers.html#plaso.containers.events.EventData) which represents
data related to the event.

### Writing the parser

#### Registering the parser

Add an import for the parser to:

```
plaso/parsers/__init__.py
```

It should look like this:

~~~~python
from plaso.parsers import safari_cookies
~~~~

When plaso.parsers is imported this will load the safari_cookies module
`safari_cookies.py`.

The parser class `BinaryCookieParser` is registered using
`manager.ParsersManager.RegisterParser(BinaryCookieParser)`.

```
plaso/parsers/safari_cookies.py
```

~~~~python
# -*- coding: utf-8 -*-
"""Parser for Safari Binary Cookie files."""

from plaso.parsers import interface
from plaso.parsers import manager


class BinaryCookieParser(interface.FileObjectParser):
  """Parser for Safari Binary Cookie files."""

  NAME = 'binary_cookies'
  DATA_FORMAT = 'Safari Binary Cookie file'

  def ParseFileObject(self, parser_mediator, file_object, **kwargs):
    """Parses a Safari binary cookie file-like object.

    Args:
      parser_mediator (ParserMediator): parser mediator.
      file_object (dfvfs.FileIO): file-like object to be parsed.

    Raises:
      UnableToParseFile: when the file cannot be parsed, this will signal
          the event extractor to apply other parsers.
    """
    ...


manager.ParsersManager.RegisterParser(BinaryCookieParser)
~~~~

### Writing the message formatter

The event message format is defined in `data/formatters/*.yaml`.

For more information about the configuration file format see:
[message formatting](../user/Output-and-formatting.html#message-formatting)