1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
|
-- --
-- package Strings_Edit.UTF8 Copyright (c) Dmitry A. Kazakov --
-- Implementation Luebeck --
-- Spring, 2005 --
-- --
-- Last revision : 10:11 25 Jun 2005 --
-- --
-- This library is free software; you can redistribute it and/or --
-- modify it under the terms of the GNU General Public License as --
-- published by the Free Software Foundation; either version 2 of --
-- the License, or (at your option) any later version. This library --
-- is distributed in the hope that it will be useful, but WITHOUT --
-- ANY WARRANTY; without even the implied warranty of --
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU --
-- General Public License for more details. You should have --
-- received a copy of the GNU General Public License along with --
-- this library; if not, write to the Free Software Foundation, --
-- Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. --
-- --
-- --
-- --
-- --
-- --
-- --
-- --
--____________________________________________________________________--
with Ada.IO_Exceptions; use Ada.IO_Exceptions;
package body Strings_Edit.UTF8 is
procedure Get
( Source : String;
Pointer : in out Integer;
Value : out UTF8_Code_Point
) is
Accum : UTF8_Code_Point'Base;
Code : UTF8_Code_Point'Base;
Index : Integer := Pointer;
begin
if Index < Source'First then
raise Layout_Error;
end if;
if Index > Source'Last then
if Index = Source'Last + 1 then
raise End_Error;
else
raise Layout_Error;
end if;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
case Code is
when 0..16#7F# => -- 1 byte (ASCII)
Value := Code;
Pointer := Index + 1;
return;
when 16#C2#..16#DF# => -- 2 bytes
Accum := (Code and 16#1F#) * 2**6;
goto Last;
when 16#E0# => -- 3 bytes
Index := Index + 1;
if Index >= Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#A0#..16#BF# then
raise Data_Error;
end if;
Accum := (Code and 16#3F#) * 2**6;
goto Last;
when 16#E1#..16#EF# => -- 3 bytes
Accum := (Code and 16#0F#) * 2**12;
when 16#F0# => -- 4 bytes
Index := Index + 1;
if Index >= Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#90#..16#BF# then
raise Data_Error;
end if;
Accum := (Code and 16#3F#) * 2**12;
when 16#F1#..16#F3# => -- 4 bytes
Accum := (Code and 16#07#) * 2**18;
Index := Index + 1;
if Index >= Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#80#..16#BF# then
raise Data_Error;
end if;
Accum := Accum or (Code and 16#3F#) * 2**12;
when 16#F4# => -- 4 bytes
Accum := (Code and 16#07#) * 2**18;
Index := Index + 1;
if Index >= Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#80#..16#8F# then
raise Data_Error;
end if;
Accum := Accum or (Code and 16#3F#) * 2**12;
when others =>
raise Data_Error;
end case;
Index := Index + 1;
if Index >= Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#80#..16#BF# then
raise Data_Error;
end if;
Accum := Accum or (Code and 16#3F#) * 2**6;
<<Last>>
Index := Index + 1;
if Index > Source'Last then
raise Data_Error;
end if;
Code := UTF8_Code_Point (Character'Pos (Source (Index)));
if Code not in 16#80#..16#BF# then
raise Data_Error;
end if;
Value := Accum or (Code and 16#3F#);
Pointer := Index + 1;
end Get;
function Image (Value : UTF8_Code_Point) return String is
Result : String (1..4);
Pointer : Integer := Result'First;
begin
Put (Result, Pointer, Value);
return Result (1..Pointer - 1);
end Image;
function Length (Source : String) return Natural is
Count : Natural := 0;
Accum : UTF8_Code_Point;
Index : Integer := Source'First;
begin
while Index <= Source'Last loop
Get (Source, Index, Accum);
Count := Count + 1;
end loop;
return Count;
end Length;
procedure Put
( Destination : in out String;
Pointer : in out Integer;
Value : UTF8_Code_Point
) is
begin
if Pointer not in Destination'Range then
raise Layout_Error;
end if;
if Value <= 16#7F# then
Destination (Pointer) := Character'Val (Value);
Pointer := Pointer + 1;
elsif Value <= 16#7FF# then
if Pointer >= Destination'Last then
raise Layout_Error;
end if;
Destination (Pointer) :=
Character'Val (16#C0# or Value / 2**6);
Destination (Pointer + 1) :=
Character'Val (16#80# or (Value and 16#3F#));
Pointer := Pointer + 2;
elsif Value <= 16#FFFF# then
if 1 >= Destination'Last - Pointer then
raise Layout_Error;
end if;
Destination (Pointer) :=
Character'Val (16#E0# or Value / 2**12);
Destination (Pointer + 1) :=
Character'Val (16#80# or (Value / 2**6 and 16#3F#));
Destination (Pointer + 2) :=
Character'Val (16#80# or (Value and 16#3F#));
Pointer := Pointer + 3;
else
if 2 >= Destination'Last - Pointer then
raise Layout_Error;
end if;
Destination (Pointer) :=
Character'Val (16#F0# or Value / 2**18);
Destination (Pointer + 1) :=
Character'Val (16#80# or (Value / 2**12 and 16#3F#));
Destination (Pointer + 2) :=
Character'Val (16#80# or (Value / 2**6 and 16#3F#));
Destination (Pointer + 3) :=
Character'Val (16#80# or (Value and 16#3F#));
Pointer := Pointer + 4;
end if;
end Put;
procedure Reverse_Put
( Destination : in out String;
Pointer : in out Integer;
Prefix : String;
Position : Natural
) is
begin
if ( Pointer not in Destination'Range
or else
Pointer - Destination'First < Prefix'Length
)
then
raise Layout_Error;
end if;
Destination (Pointer) := Character'Val (Position);
Pointer := Pointer - 1;
for Index in reverse Prefix'Range loop
Destination (Pointer) := Prefix (Index);
Pointer := Pointer - 1;
end loop;
end Reverse_Put;
procedure Skip
( Source : String;
Pointer : in out Integer;
Count : Natural := 1
) is
begin
if Count > 0 then
declare
Accum : UTF8_Code_Point;
Index : Integer := Source'First;
begin
for Character in 1..Count loop
Get (Source, Index, Accum);
end loop;
Pointer := Index;
end;
elsif ( Pointer < Source'First
or else
( Pointer > Source'Last
and then
Pointer - 1 > Source'Last
) )
then
raise Layout_Error;
end if;
end Skip;
function Value (Source : String) return UTF8_Code_Point is
Pointer : Integer := Source'First;
Result : UTF8_Code_Point;
begin
Get (Source, Pointer, Result);
if Pointer /= Source'Last + 1 then
raise Data_Error;
end if;
return Result;
end Value;
end Strings_Edit.UTF8;
|