1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
|
# grokky
[](https://godoc.org/github.com/logrusorgru/grokky)
[](http://www.wtfpl.net/about/)
[](https://travis-ci.org/logrusorgru/grokky)
[](https://coveralls.io/r/logrusorgru/grokky?branch=master)
[](https://goreportcard.com/report/logrusorgru/grokky)
[](https://gitter.im/logrusorgru/grokky?utm_source=share-link&utm_medium=link&utm_campaign=share-link)
Package grokky is a pure Golang Grok-like patterns library, which can
help you to parse log files and other. This is based on
[RE2](https://en.wikipedia.org/wiki/RE2_%28software%29)
regexp that
[much more faster](https://swtch.com/~rsc/regexp/regexp1.html)
than
[Oniguruma](https://en.wikipedia.org/wiki/Oniguruma) in some cases.
Check out the "much more faster" article to understand the difference.
The library was disigned for creating many patterns and using it many
times. The behavior and capabilities are slightly different from the
original library. The goals of the library are:
1. simplicity,
2. fast,
3. ease of use.
# Also
See also another golang implementation
[vjeantet/grok](https://github.com/vjeantet/grok) that
is closer to the
[original](https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html)
library.
The difference:
1. The grokky allows named captures only. Any name of a pattern is
just name of a pattern and nothing more. You can treat is as an
alias for regexp. It's impossible to use a name of a pattern as a
capture group. In some cases the grooky is similar to the grok that
created as `g, err :=
grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})`.
2. The grokky prefered top named group. If you have two patterns. And
the second pattern has same named group and nested into first. Then
the named group of the first pattern will be used. The grok uses last
(closer to tail) group in any cases. But the grok also has
`ParseToMultiMap` method. To see the difference explanation get the
package (using `go get -t`) and run the following command
`go test -v -run the_difference github.com/logrusorgru/grokky`. Or check
out [source code of the test](https://github.com/logrusorgru/grokky/blob/master/bench_test.go#L134).
3. The grokky was designed as a factory of patterns. E.g. compile once and use
many times.
# Get it
```
go get -u -t github.com/logrusorgru/grokky
```
Run test case
```
go test github.com/logrusorgru/grokky
```
Run benchmark comparsion with vjeantet/grok
```
go test -bench=.* github.com/logrusorgru/grokky
```
# Example
```go
package main
import (
"github.com/logrusorgru/grokky"
"fmt"
"log"
"time"
)
func createHost() grokky.Host {
h := grokky.New()
// add patterns to the Host
h.Must("YEAR", `(?:\d\d){1,2}`)
h.Must("MONTHNUM2", `0[1-9]|1[0-2]`)
h.Must("MONTHDAY", `(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]`)
h.Must("HOUR", `2[0123]|[01]?[0-9]`)
h.Must("MINUTE", `[0-5][0-9]`)
h.Must("SECOND", `(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?`)
h.Must("TIMEZONE", `Z%{HOUR}:%{MINUTE}`)
h.Must("DATE", "%{YEAR:year}-%{MONTHNUM2:month}-%{MONTHDAY:day}")
h.Must("TIME", "%{HOUR:hour}:%{MINUTE:min}:%{SECOND:sec}")
return h
}
func main() {
h := createHost()
// compile the pattern for RFC3339 time
p, err := h.Compile("%{DATE:date}T%{TIME:time}%{TIMEZONE:tz}")
if err != nil {
log.Fatal(err)
}
for k, v := range p.Parse(time.Now().Format(time.RFC3339)) {
fmt.Printf("%s: %v\n", k, v)
}
//
// Yes, it's better to use time.Parse for time values
// but this is just example.
//
}
```
# Performance note
Don't complicate regular expressions. Use simplest regular expressions possible.
Here is example about Nginx access log, combined format:
```go
h := New()
h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`) // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`) // not a double quote *
h.Must("NQ", `[^"]+`) // not a double quote +
h.Must("nginx", `%{NS:remote_addr}\s\-\s`+
`%{NSS:remote_user}\s*\-\s\[`+
`%{NLB:time_local}\]\s\"`+
`%{NQ:request}\"\s`+
`%{NS:status}\s`+
`%{NS:body_bytes_sent}\s\"`+
`%{NQ:http_referer}\"\s\"`+
`%{NQ:user_agent}\"`)
nginx, err := h.Get("nginx")
if err != nil {
panic(err)
}
for logLine := range catLogFileLineByLineChannel {
values := nginx.Parse(logLine)
// stuff
}
```
or there is a version (thanks for __@nanjj__)
```go
h := New()
h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`) // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`) // not a double quote *
h.Must("NQ", `[^"]+`) // not a double quote +
h.Must("A", `.*`) // all (get tail)
h.Must("nginx", `%{NS:clientip}\s%{NSS:ident}\s%{NSS:auth}`+
`\s\[`+
`%{NLB:timestamp}\]\s\"`+
`%{NS:verb}\s`+
`%{NSS:request}\s`+
`HTTP/%{NS:httpversion}\"\s`+
`%{NS:response}\s`+
`%{NS:bytes}\s\"`+
`%{NQ:referrer}\"\s\"`+
`%{NQ:agent}\"`+
`%{A:blob}`)
// [...]
```
## More performance
Since the
[`grokky.Pattern`](https://godoc.org/github.com/logrusorgru/grokky#Pattern)
inherits [`regexp.Regexp`](https://godoc.org/regexp#Regexp), it's possible
to use methods of the `regexp.Regexp`. E.g. you can to use
[`FindStringSubmatch`](https://godoc.org/regexp#Regexp.FindStringSubmatch)
for example instead of `(grokky.Pattern).Parse`. Or any other method of
the `regexp.Regexp`.
Check out
[Benchmark_parse_vs_findStringSubmatch](https://github.com/logrusorgru/grokky/blob/master/bench_test.go#L409)
for example.
For my machine result of this becnhmark is (the map is `Parse`, and the slice is
`FindStringSubmatch`)
```
map-4 200000 9980 ns/op 1370 B/op 5 allocs/op
slice-4 200000 7508 ns/op 416 B/op 2 allocs/op
```
# Licensing
Copyright © 2016-2018 Konstantin Ivanov <kostyarin.ivanov@gmail.com>
This work is free. It comes without any warranty, to the extent
permitted by applicable law. You can redistribute it and/or modify
it under the terms of the Do What The Fuck You Want To Public License,
Version 2, as published by Sam Hocevar. See the LICENSE file for
more details.
|