1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
|
{-# LANGUAGE TypeSynonymInstances, FlexibleInstances, PatternGuards #-}
{-|
This module is for working with HTML/XML. It deals with both well-formed XML and
malformed HTML from the web. It features:
* A lazy parser, based on the HTML 5 specification - see 'parseTags'.
* A renderer that can write out HTML/XML - see 'renderTags'.
* Utilities for extracting information from a document - see '~==', 'sections' and 'partitions'.
The standard practice is to parse a 'String' to @[@'Tag' 'String'@]@ using 'parseTags',
then operate upon it to extract the necessary information.
-}
module Text.HTML.TagSoup(
-- * Data structures and parsing
Tag(..), Row, Column, Attribute,
module Text.HTML.TagSoup.Parser,
module Text.HTML.TagSoup.Render,
canonicalizeTags,
-- * Tag identification
isTagOpen, isTagClose, isTagText, isTagWarning, isTagPosition,
isTagOpenName, isTagCloseName,
-- * Extraction
fromTagText, fromAttrib,
maybeTagText, maybeTagWarning,
innerText,
-- * Utility
sections, partitions,
-- * Combinators
TagRep(..), (~==),(~/=)
) where
import Text.HTML.TagSoup.Type
import Text.HTML.TagSoup.Parser
import Text.HTML.TagSoup.Render
import Data.Char
import Data.List
import Text.StringLike
-- | Turns all tag names and attributes to lower case and
-- converts DOCTYPE to upper case.
canonicalizeTags :: StringLike str => [Tag str] -> [Tag str]
canonicalizeTags = map f
where
f (TagOpen tag attrs) | Just ('!',name) <- uncons tag = TagOpen ('!' `cons` ucase name) attrs
f (TagOpen name attrs) = TagOpen (lcase name) [(lcase k, v) | (k,v) <- attrs]
f (TagClose name) = TagClose (lcase name)
f a = a
ucase = fromString . map toUpper . toString
lcase = fromString . map toLower . toString
-- | Define a class to allow String's or Tag str's to be used as matches
class TagRep a where
toTagRep :: StringLike str => a -> Tag str
instance StringLike str => TagRep (Tag str) where toTagRep = fmap castString
instance TagRep String where
toTagRep x = case parseTags x of
[a] -> toTagRep a
_ -> error $ "When using a TagRep it must be exactly one tag, you gave: " ++ x
-- | Performs an inexact match, the first item should be the thing to match.
-- If the second item is a blank string, that is considered to match anything.
-- For example:
--
-- > (TagText "test" ~== TagText "" ) == True
-- > (TagText "test" ~== TagText "test") == True
-- > (TagText "test" ~== TagText "soup") == False
--
-- For 'TagOpen' missing attributes on the right are allowed.
(~==) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
(~==) a b = f a (toTagRep b)
where
f (TagText y) (TagText x) = strNull x || x == y
f (TagClose y) (TagClose x) = strNull x || x == y
f (TagOpen y ys) (TagOpen x xs) = (strNull x || x == y) && all g xs
where
g (name,val) | strNull name = val `elem` map snd ys
| strNull val = name `elem` map fst ys
g nameval = nameval `elem` ys
f (TagComment x) (TagComment y) = strNull x || x == y
f (TagWarning x) (TagWarning y) = strNull x || x == y
f (TagPosition x1 x2) (TagPosition y1 y2) = x1 == y1 && x2 == y2
f _ _ = False
-- | Negation of '~=='
(~/=) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
(~/=) a b = not (a ~== b)
-- | This function takes a list, and returns all suffixes whose
-- first item matches the predicate.
sections :: (a -> Bool) -> [a] -> [[a]]
sections p = filter (p . head) . init . tails
-- | This function is similar to 'sections', but splits the list
-- so no element appears in any two partitions.
partitions :: (a -> Bool) -> [a] -> [[a]]
partitions p =
let notp = not . p
in groupBy (const notp) . dropWhile notp
|