1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
|
-- to be found isLetter
-- to be found isMark
-- to be found isNumber
-- to be found isPunctuation
-- to be found isSymbol
-- to be found isSeparator
-- to be found isAsciiUpper
-- to be found isAsciiLower
-- to be found GeneralCategory
-- to be found generalCategory
-- to be found toTitle
-- to be found digitToInt
-- to be found UppercaseLetter
-- to be found LowercaseLetter
-- to be found TitlecaseLetter
-- to be found ModifierLetter
-- to be found OtherLetter
-- to be found NonSpacingMark
-- to be found SpacingCombiningMark
-- to be found EnclosingMark
-- to be found DecimalNumber
-- to be found LetterNumber
-- to be found OtherNumber
-- to be found ConnectorPunctuation
-- to be found DashPunctuation
-- to be found OpenPunctuation
-- to be found ClosePunctuation
-- to be found InitialQuote
-- to be found FinalQuote
-- to be found OtherPunctuation
-- to be found MathSymbol
-- to be found CurrencySymbol
-- to be found ModifierSymbol
-- to be found OtherSymbol
-- to be found Space
-- to be found LineSeparator
-- to be found ParagraphSeparator
-- to be found Control
-- to be found Format
-- to be found Surrogate
-- to be found PrivateUse
-- to be found NotAssigned
{-# OPTIONS_GHC -fno-implicit-prelude #-}
-----------------------------------------------------------------------------
-- |
-- Module : Data.Char
-- Copyright : (c) The University of Glasgow 2001
-- License : BSD-style (see the file libraries/base/LICENSE)
--
-- Maintainer : libraries@haskell.org
-- Stability : stable
-- Portability : portable
--
-- The Char type and associated operations.
--
-----------------------------------------------------------------------------
module Data.Char
(
Char
, String
-- * Character classification
-- | Unicode characters are divided into letters, numbers, marks,
-- punctuation, symbols, separators (including spaces) and others
-- (including control characters).
, isControl, isSpace
, isLower, isUpper, isAlpha, isAlphaNum, isPrint
, isDigit, isOctDigit, isHexDigit
, isLetter, isMark, isNumber, isPunctuation, isSymbol, isSeparator
-- ** Subranges
, isAscii, isLatin1
, isAsciiUpper, isAsciiLower
-- ** Unicode general categories
, GeneralCategory(..), generalCategory
-- * Case conversion
, toUpper, toLower, toTitle -- :: Char -> Char
-- * Single digit characters
, digitToInt -- :: Char -> Int
, intToDigit -- :: Int -> Char
-- * Numeric representations
, ord -- :: Char -> Int
, chr -- :: Int -> Char
-- * String representations
, showLitChar -- :: Char -> ShowS
, lexLitChar -- :: ReadS String
, readLitChar -- :: ReadS Char
-- Implementation checked wrt. Haskell 98 lib report, 1/99.
) where
#ifdef __GLASGOW_HASKELL__
import GHC.Base
import GHC.Arr (Ix)
import GHC.Real (fromIntegral)
import GHC.Show
import GHC.Read (Read, readLitChar, lexLitChar)
import GHC.Unicode
import GHC.Num
import GHC.Enum
#endif
#ifdef __HUGS__
import Hugs.Prelude (Ix)
import Hugs.Char
#endif
#ifdef __NHC__
import Prelude
import Prelude(Char,String)
import Char
import Ix
import NHC.FFI (CInt)
foreign import ccall unsafe "WCsubst.h u_gencat" wgencat :: CInt -> CInt
#endif
-- | Convert a single digit 'Char' to the corresponding 'Int'.
-- This function fails unless its argument satisfies 'isHexDigit',
-- but recognises both upper and lower-case hexadecimal digits
-- (i.e. @\'0\'@..@\'9\'@, @\'a\'@..@\'f\'@, @\'A\'@..@\'F\'@).
digitToInt :: Char -> Int
digitToInt c
| isDigit c = ord c - ord '0'
| c >= 'a' && c <= 'f' = ord c - ord 'a' + 10
| c >= 'A' && c <= 'F' = ord c - ord 'A' + 10
| otherwise = error ("Char.digitToInt: not a digit " ++ show c) -- sigh
#ifndef __GLASGOW_HASKELL__
isAsciiUpper, isAsciiLower :: Char -> Bool
isAsciiLower c = c >= 'a' && c <= 'z'
isAsciiUpper c = c >= 'A' && c <= 'Z'
#endif
-- | Unicode General Categories (column 2 of the UnicodeData table)
-- in the order they are listed in the Unicode standard.
data GeneralCategory
= UppercaseLetter -- ^ Lu: Letter, Uppercase
| LowercaseLetter -- ^ Ll: Letter, Lowercase
| TitlecaseLetter -- ^ Lt: Letter, Titlecase
| ModifierLetter -- ^ Lm: Letter, Modifier
| OtherLetter -- ^ Lo: Letter, Other
| NonSpacingMark -- ^ Mn: Mark, Non-Spacing
| SpacingCombiningMark -- ^ Mc: Mark, Spacing Combining
| EnclosingMark -- ^ Me: Mark, Enclosing
| DecimalNumber -- ^ Nd: Number, Decimal
| LetterNumber -- ^ Nl: Number, Letter
| OtherNumber -- ^ No: Number, Other
| ConnectorPunctuation -- ^ Pc: Punctuation, Connector
| DashPunctuation -- ^ Pd: Punctuation, Dash
| OpenPunctuation -- ^ Ps: Punctuation, Open
| ClosePunctuation -- ^ Pe: Punctuation, Close
| InitialQuote -- ^ Pi: Punctuation, Initial quote
| FinalQuote -- ^ Pf: Punctuation, Final quote
| OtherPunctuation -- ^ Po: Punctuation, Other
| MathSymbol -- ^ Sm: Symbol, Math
| CurrencySymbol -- ^ Sc: Symbol, Currency
| ModifierSymbol -- ^ Sk: Symbol, Modifier
| OtherSymbol -- ^ So: Symbol, Other
| Space -- ^ Zs: Separator, Space
| LineSeparator -- ^ Zl: Separator, Line
| ParagraphSeparator -- ^ Zp: Separator, Paragraph
| Control -- ^ Cc: Other, Control
| Format -- ^ Cf: Other, Format
| Surrogate -- ^ Cs: Other, Surrogate
| PrivateUse -- ^ Co: Other, Private Use
| NotAssigned -- ^ Cn: Other, Not Assigned
deriving (Eq, Ord, Enum, Read, Show, Bounded, Ix)
-- | The Unicode general category of the character.
generalCategory :: Char -> GeneralCategory
#if defined(__GLASGOW_HASKELL__) || defined(__NHC__)
generalCategory c = toEnum $ fromIntegral $ wgencat $ fromIntegral $ ord c
#endif
#ifdef __HUGS__
generalCategory c = toEnum (primUniGenCat c)
#endif
-- derived character classifiers
-- | Selects alphabetic Unicode characters (lower-case, upper-case and
-- title-case letters, plus letters of caseless scripts and modifiers letters).
-- This function is equivalent to 'Data.Char.isAlpha'.
isLetter :: Char -> Bool
isLetter c = case generalCategory c of
UppercaseLetter -> True
LowercaseLetter -> True
TitlecaseLetter -> True
ModifierLetter -> True
OtherLetter -> True
_ -> False
-- | Selects Unicode mark characters, e.g. accents and the like, which
-- combine with preceding letters.
isMark :: Char -> Bool
isMark c = case generalCategory c of
NonSpacingMark -> True
SpacingCombiningMark -> True
EnclosingMark -> True
_ -> False
-- | Selects Unicode numeric characters, including digits from various
-- scripts, Roman numerals, etc.
isNumber :: Char -> Bool
isNumber c = case generalCategory c of
DecimalNumber -> True
LetterNumber -> True
OtherNumber -> True
_ -> False
-- | Selects Unicode punctuation characters, including various kinds
-- of connectors, brackets and quotes.
isPunctuation :: Char -> Bool
isPunctuation c = case generalCategory c of
ConnectorPunctuation -> True
DashPunctuation -> True
OpenPunctuation -> True
ClosePunctuation -> True
InitialQuote -> True
FinalQuote -> True
OtherPunctuation -> True
_ -> False
-- | Selects Unicode symbol characters, including mathematical and
-- currency symbols.
isSymbol :: Char -> Bool
isSymbol c = case generalCategory c of
MathSymbol -> True
CurrencySymbol -> True
ModifierSymbol -> True
OtherSymbol -> True
_ -> False
-- | Selects Unicode space and separator characters.
isSeparator :: Char -> Bool
isSeparator c = case generalCategory c of
Space -> True
LineSeparator -> True
ParagraphSeparator -> True
_ -> False
#ifdef __NHC__
-- dummy implementation
toTitle :: Char -> Char
toTitle = toUpper
#endif
|