File: README.md

package info (click to toggle)
golang-github-logrusorgru-grokky 0.0~git20180829.47edf01-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid, trixie
  • size: 248 kB
  • sloc: makefile: 3
file content (213 lines) | stat: -rw-r--r-- 6,838 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# grokky

[![GoDoc](https://godoc.org/github.com/logrusorgru/grokky?status.svg)](https://godoc.org/github.com/logrusorgru/grokky)
[![WTFPL License](https://img.shields.io/badge/license-wtfpl-blue.svg)](http://www.wtfpl.net/about/)
[![Build Status](https://travis-ci.org/logrusorgru/grokky.svg)](https://travis-ci.org/logrusorgru/grokky)
[![Coverage Status](https://coveralls.io/repos/logrusorgru/grokky/badge.svg?branch=master)](https://coveralls.io/r/logrusorgru/grokky?branch=master)
[![GoReportCard](https://goreportcard.com/badge/logrusorgru/grokky)](https://goreportcard.com/report/logrusorgru/grokky)
[![Gitter](https://img.shields.io/badge/chat-on_gitter-46bc99.svg?logo=data:image%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGhlaWdodD0iMTQiIHdpZHRoPSIxNCI%2BPGcgZmlsbD0iI2ZmZiI%2BPHJlY3QgeD0iMCIgeT0iMyIgd2lkdGg9IjEiIGhlaWdodD0iNSIvPjxyZWN0IHg9IjIiIHk9IjQiIHdpZHRoPSIxIiBoZWlnaHQ9IjciLz48cmVjdCB4PSI0IiB5PSI0IiB3aWR0aD0iMSIgaGVpZ2h0PSI3Ii8%2BPHJlY3QgeD0iNiIgeT0iNCIgd2lkdGg9IjEiIGhlaWdodD0iNCIvPjwvZz48L3N2Zz4%3D&logoWidth=10)](https://gitter.im/logrusorgru/grokky?utm_source=share-link&utm_medium=link&utm_campaign=share-link)

Package grokky is a pure Golang Grok-like patterns library, which can
help you to parse log files and other. This is based on
[RE2](https://en.wikipedia.org/wiki/RE2_%28software%29)
regexp that
[much more faster](https://swtch.com/~rsc/regexp/regexp1.html)
than
[Oniguruma](https://en.wikipedia.org/wiki/Oniguruma) in some cases.
Check out the "much more faster" article to understand the difference.

The library was disigned for creating many patterns and using it many
times. The behavior and capabilities are slightly different from the
original library. The goals of the library are:
1. simplicity,
2. fast,
3. ease of use.

# Also

See also another golang implementation
[vjeantet/grok](https://github.com/vjeantet/grok) that
is closer to the
[original](https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html)
library.

The difference:

1. The grokky allows named captures only. Any name of a pattern is
  just name of a pattern and nothing more. You can treat is as an
  alias for regexp. It's impossible to use a name of a pattern as a
  capture group.  In some cases the grooky is similar to the grok that
  created as `g, err :=
  grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})`.

2. The grokky prefered top named group. If you have two patterns. And
  the second pattern has same named group and nested into first. Then
  the named group of the first pattern will be used. The grok uses last
  (closer to tail) group in any cases. But the grok also has
  `ParseToMultiMap` method. To see the difference explanation get the
  package (using `go get -t`) and run the following command
  `go test -v -run the_difference github.com/logrusorgru/grokky`. Or check
  out [source code of the test](https://github.com/logrusorgru/grokky/blob/master/bench_test.go#L134).

3. The grokky was designed as a factory of patterns. E.g. compile once and use
  many times.

# Get it

```
go get -u -t github.com/logrusorgru/grokky
```

Run test case

```
go test github.com/logrusorgru/grokky
```

Run benchmark comparsion with vjeantet/grok

```
go test -bench=.* github.com/logrusorgru/grokky
```


# Example


```go

package main

import (
	"github.com/logrusorgru/grokky"
	"fmt"
	"log"
	"time"
)

func createHost() grokky.Host {
	h := grokky.New()
	// add patterns to the Host
	h.Must("YEAR", `(?:\d\d){1,2}`)
	h.Must("MONTHNUM2", `0[1-9]|1[0-2]`)
	h.Must("MONTHDAY", `(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]`)
	h.Must("HOUR", `2[0123]|[01]?[0-9]`)
	h.Must("MINUTE", `[0-5][0-9]`)
	h.Must("SECOND", `(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?`)
	h.Must("TIMEZONE", `Z%{HOUR}:%{MINUTE}`)
	h.Must("DATE", "%{YEAR:year}-%{MONTHNUM2:month}-%{MONTHDAY:day}")
	h.Must("TIME", "%{HOUR:hour}:%{MINUTE:min}:%{SECOND:sec}")
	return h
}

func main() {
	h := createHost()
	// compile the pattern for RFC3339 time
	p, err := h.Compile("%{DATE:date}T%{TIME:time}%{TIMEZONE:tz}")
	if err != nil {
		log.Fatal(err)
	}
	for k, v := range p.Parse(time.Now().Format(time.RFC3339)) {
		fmt.Printf("%s: %v\n", k, v)
	}
	//
	// Yes, it's better to use time.Parse for time values
	// but this is just example.
	//
}

```

# Performance note

Don't complicate regular expressions. Use simplest regular expressions possible.
Here is example about Nginx access log, combined format:

```go
h := New()

h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`)  // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`)  // not a double quote *
h.Must("NQ", `[^"]+`)   // not a double quote +

h.Must("nginx", `%{NS:remote_addr}\s\-\s`+
	`%{NSS:remote_user}\s*\-\s\[`+
	`%{NLB:time_local}\]\s\"`+
	`%{NQ:request}\"\s`+
	`%{NS:status}\s`+
	`%{NS:body_bytes_sent}\s\"`+
	`%{NQ:http_referer}\"\s\"`+
	`%{NQ:user_agent}\"`)

nginx, err := h.Get("nginx")
if err != nil {
	panic(err)
}

for logLine := range catLogFileLineByLineChannel {
	values := nginx.Parse(logLine)

	// stuff

}
```

or there is a version (thanks for __@nanjj__)

```go
h := New()

h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`)  // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`)  // not a double quote *
h.Must("NQ", `[^"]+`)   // not a double quote +
h.Must("A", `.*`)       // all (get tail)

h.Must("nginx", `%{NS:clientip}\s%{NSS:ident}\s%{NSS:auth}`+
	`\s\[`+
	`%{NLB:timestamp}\]\s\"`+
	`%{NS:verb}\s`+
	`%{NSS:request}\s`+
	`HTTP/%{NS:httpversion}\"\s`+
	`%{NS:response}\s`+
	`%{NS:bytes}\s\"`+
	`%{NQ:referrer}\"\s\"`+
	`%{NQ:agent}\"`+
	`%{A:blob}`)

// [...]
```

## More performance

Since the
[`grokky.Pattern`](https://godoc.org/github.com/logrusorgru/grokky#Pattern)
inherits [`regexp.Regexp`](https://godoc.org/regexp#Regexp), it's possible
to use methods of the `regexp.Regexp`. E.g. you can to use
[`FindStringSubmatch`](https://godoc.org/regexp#Regexp.FindStringSubmatch)
for example instead of `(grokky.Pattern).Parse`. Or any other method of
the `regexp.Regexp`.

Check out
[Benchmark_parse_vs_findStringSubmatch](https://github.com/logrusorgru/grokky/blob/master/bench_test.go#L409)
for example.

For my machine result of this becnhmark is (the map is `Parse`, and the slice is
`FindStringSubmatch`)

```
map-4      200000    9980 ns/op    1370 B/op    5 allocs/op
slice-4    200000    7508 ns/op     416 B/op    2 allocs/op
```

# Licensing

Copyright © 2016-2018 Konstantin Ivanov <kostyarin.ivanov@gmail.com>  
This work is free. It comes without any warranty, to the extent
permitted by applicable law. You can redistribute it and/or modify
it under the terms of the Do What The Fuck You Want To Public License,
Version 2, as published by Sam Hocevar. See the LICENSE file for
more details.