1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
|
-- |
-- Module : Data.ByteString.Lazy.Search.KMP
-- Copyright : Justin Bailey
-- Chris Kuklewicz
-- Daniel Fischer
-- Licence : BSD3
-- Maintainer : Daniel Fischer <daniel.is.fischer@googlemail.com>
-- Stability : Provisional
-- Portability : non-portable (BangPatterns)
--
-- Fast search of lazy 'L.ByteString' values using the
-- Knuth-Morris-Pratt algorithm.
--
-- A description of the algorithm can be found at
-- <http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm>.
--
-- Original authors: Justin Bailey (jgbailey at gmail.com) and
-- Chris Kuklewicz (haskell at list.mightyreason.com).
module Data.ByteString.Lazy.Search.KMP (-- * Overview
-- $overview
-- ** Complexity and Performance
-- $complexity
-- ** Partial application
-- $partial
-- * Functions
indices
, nonOverlappingIndices
-- ** Convenience
, strictify
) where
import Data.ByteString.Search.Internal.KnuthMorrisPratt (matchSL, indicesL)
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import Data.Int (Int64)
-- $overview
--
-- This module provides two functions for finding the occurrences of a
-- pattern in a target string using the Knuth-Morris-Pratt algorithm.
-- It exists mostly for systematic reasons, the functions from
-- "Data.ByteString.Lazy.Search" are much faster, except for very short
-- patterns or long patterns with a short period if overlap is allowed.
-- In the latter case, 'indices' from this module may be the best choice
-- since the Boyer-Moore function's performance degrades if there are many
-- matches and the DFA function's automaton needs much space for long
-- patterns.
-- In the former case, for some pattern\/target combinations DFA has better
-- performance, for others KMP, usually the difference is small.
-- $complexity
--
-- The preprocessing of the pattern is /O/(@patternLength@) in time and space.
-- The time complexity of the searching phase is /O/(@targetLength@) for both
-- functions.
--
-- In most cases, these functions are considerably slower than the
-- Boyer-Moore variants, performance is close to that of those from
-- "Data.ByteString.Search.DFA".
-- $partial
--
-- Both functions can be usefully partially applied. Given only a
-- pattern, the auxiliary data will be computed only once, allowing for
-- efficient re-use.
-- | @'indices'@ finds the starting indices of all possibly overlapping
-- occurrences of the pattern in the target string.
-- If the pattern is empty, the result is @[0 .. 'length' target]@.
{-# INLINE indices #-}
indices :: S.ByteString -- ^ Strict pattern to find
-> L.ByteString -- ^ Lazy string to search
-> [Int64] -- ^ Offsets of matches
indices = indicesL
-- | @'nonOverlappingIndices'@ finds the starting indices of all
-- non-overlapping occurrences of the pattern in the target string.
-- It is more efficient than removing indices from the list produced
-- by 'indices'.
{-# INLINE nonOverlappingIndices #-}
nonOverlappingIndices :: S.ByteString -- ^ Strict pattern to find
-> L.ByteString -- ^ Lazy string to search
-> [Int64] -- ^ Offsets of matches
nonOverlappingIndices = matchSL
-- | @'strictify'@ transforms a lazy 'L.ByteString' into a strict
-- 'S.ByteString', to make it a suitable pattern for the searching
-- functions.
strictify :: L.ByteString -> S.ByteString
strictify = S.concat . L.toChunks
|