1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344
|
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2023 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
// See https://swift.org/CONTRIBUTORS.txt for the list of Swift project authors
//
//===----------------------------------------------------------------------===//
#if FOUNDATION_FRAMEWORK
@_spi(_Unicode) import Swift
internal import Foundation_Private.NSString
#endif
#if canImport(Darwin)
import Darwin
#endif
#if os(Windows)
import WinSDK
extension String {
package func withNTPathRepresentation<Result>(_ body: (UnsafePointer<WCHAR>) throws -> Result) throws -> Result {
guard !isEmpty else {
throw CocoaError.errorWithFilePath(.fileReadInvalidFileName, "")
}
var iter = self.utf8.makeIterator()
let bLeadingSlash = if [._slash, ._backslash].contains(iter.next()), iter.next()?.isLetter ?? false, iter.next() == ._colon { true } else { false }
// Strip the leading `/` on a RFC8089 path (`/[drive-letter]:/...` ). A
// leading slash indicates a rooted path on the drive for the current
// working directory.
return try Substring(self.utf8.dropFirst(bLeadingSlash ? 1 : 0)).withCString(encodedAs: UTF16.self) { pwszPath in
// 1. Normalize the path first.
let dwLength: DWORD = GetFullPathNameW(pwszPath, 0, nil, nil)
return try withUnsafeTemporaryAllocation(of: WCHAR.self, capacity: Int(dwLength)) {
guard GetFullPathNameW(pwszPath, DWORD($0.count), $0.baseAddress, nil) > 0 else {
throw CocoaError.errorWithFilePath(self, win32: GetLastError(), reading: true)
}
// 2. Perform the operation on the normalized path.
return try body($0.baseAddress!)
}
}
}
}
#endif
extension String {
package func _trimmingWhitespace() -> String {
String(unicodeScalars._trimmingCharacters {
$0.properties.isWhitespace
})
}
package init?(_utf16 input: UnsafeBufferPointer<UInt16>) {
// Allocate input.count * 3 code points since one UTF16 code point may require up to three UTF8 code points when transcoded
let str = withUnsafeTemporaryAllocation(of: UTF8.CodeUnit.self, capacity: input.count * 3) { contents in
var count = 0
let error = transcode(input.makeIterator(), from: UTF16.self, to: UTF8.self, stoppingOnError: true) { codeUnit in
contents[count] = codeUnit
count += 1
}
guard !error else {
return nil as String?
}
return String._tryFromUTF8(UnsafeBufferPointer(rebasing: contents[..<count]))
}
guard let str else {
return nil
}
self = str
}
package init?(_utf16 input: UnsafeMutableBufferPointer<UInt16>, count: Int) {
guard let str = String(_utf16: UnsafeBufferPointer(rebasing: input[..<count])) else {
return nil
}
self = str
}
package init?(_utf16 input: UnsafePointer<UInt16>, count: Int) {
guard let str = String(_utf16: UnsafeBufferPointer(start: input, count: count)) else {
return nil
}
self = str
}
enum _NormalizationType {
case canonical
case hfsPlus
fileprivate var setType: BuiltInUnicodeScalarSet.SetType {
switch self {
case .canonical: .canonicalDecomposable
case .hfsPlus: .hfsPlusDecomposable
}
}
}
private func _decomposed(_ type: String._NormalizationType, into buffer: UnsafeMutableBufferPointer<UInt8>, nullTerminated: Bool = false) -> Int? {
var copy = self
return copy.withUTF8 {
try? $0._decomposed(type, as: Unicode.UTF8.self, into: buffer, nullTerminated: nullTerminated)
}
}
#if canImport(Darwin) || FOUNDATION_FRAMEWORK
fileprivate func _fileSystemRepresentation(into buffer: UnsafeMutableBufferPointer<CChar>) -> Bool {
let result = buffer.withMemoryRebound(to: UInt8.self) { rebound in
_decomposed(.hfsPlus, into: rebound, nullTerminated: true)
}
return result != nil
}
private var maxFileSystemRepresentationSize: Int {
// The Darwin file system representation expands the UTF-8 contents to decomposed UTF-8 contents (only decomposing specific scalars)
// For any given scalar that we decompose, we will increase its UTF-8 length by at most a factor of 3 during decomposition
// (ex. U+0390 expands from 2 to 6 UTF-8 code-units, U+1D160 expands from 4 to 12 UTF-8 code-units)
// Therefore in the worst case scenario, the result will be the UTF-8 length multiplied by a factor of 3 plus an additional byte for the null byte
self.utf8.count * 3 + 1
}
#endif
package func withFileSystemRepresentation<R>(_ block: (UnsafePointer<CChar>?) throws -> R) rethrows -> R {
#if canImport(Darwin) || FOUNDATION_FRAMEWORK
try withUnsafeTemporaryAllocation(of: CChar.self, capacity: maxFileSystemRepresentationSize) { buffer in
guard _fileSystemRepresentation(into: buffer) else {
return try block(nil)
}
return try block(buffer.baseAddress!)
}
#else
#if os(Windows)
var iter = self.utf8.makeIterator()
let bLeadingSlash = if iter.next() == ._slash, iter.next()?.isLetter ?? false, iter.next() == ._colon { true } else { false }
// Strip the leading `/` on a RFC8089 path (`/[drive-letter]:/...` ). A
// leading slash indicates a rooted path on the drive for the current
// working directory.
return try Substring(self.utf8.dropFirst(bLeadingSlash ? 1 : 0)).replacing(._slash, with: ._backslash).withCString {
try block($0)
}
#else
return try withCString {
try block($0)
}
#endif
#endif
}
package func withMutableFileSystemRepresentation<R>(_ block: (UnsafeMutablePointer<CChar>?) throws -> R) rethrows -> R {
#if canImport(Darwin) || FOUNDATION_FRAMEWORK
try withUnsafeTemporaryAllocation(of: CChar.self, capacity: maxFileSystemRepresentationSize) { buffer in
guard _fileSystemRepresentation(into: buffer) else {
return try block(nil)
}
return try block(buffer.baseAddress!)
}
#else
#if os(Windows)
var iter = self.utf8.makeIterator()
let bLeadingSlash = if iter.next() == ._slash, iter.next()?.isLetter ?? false, iter.next() == ._colon { true } else { false }
var mut: String =
Substring(self.utf8[self.utf8.index(self.utf8.startIndex, offsetBy: bLeadingSlash ? 1 : 0)...])
.replacing(._slash, with: ._backslash)
#else
var mut: String = self
#endif
return try mut.withUTF8 { utf8Buffer in
// Leave space for a null byte at the end
try withUnsafeTemporaryAllocation(of: CChar.self, capacity: utf8Buffer.count + 1) { temporaryBuffer in
try utf8Buffer.withMemoryRebound(to: CChar.self) { utf8CCharBuffer in
let nullByteIndex = temporaryBuffer.initialize(fromContentsOf: utf8CCharBuffer)
// Null-terminate
temporaryBuffer.initializeElement(at: nullByteIndex, to: CChar(0))
let result = try block(temporaryBuffer.baseAddress)
temporaryBuffer.prefix(through: nullByteIndex).deinitialize()
return result
}
}
}
#endif
}
}
extension UnsafeBufferPointer {
private enum DecompositionError : Error {
case insufficientSpace
case illegalScalar
case decodingError
}
fileprivate func _decomposedRebinding<T: UnicodeCodec, InputElement>(_ type: String._NormalizationType, as codec: T.Type, into buffer: UnsafeMutableBufferPointer<InputElement>, nullTerminated: Bool = false) throws -> Int {
try self.withMemoryRebound(to: T.CodeUnit.self) { reboundSelf in
try buffer.withMemoryRebound(to: Unicode.UTF8.CodeUnit.self) { reboundBuffer in
try reboundSelf._decomposed(type, as: codec, into: reboundBuffer, nullTerminated: nullTerminated)
}
}
}
fileprivate func _decomposed<T: UnicodeCodec>(_ type: String._NormalizationType, as codec: T.Type, into buffer: UnsafeMutableBufferPointer<UInt8>, nullTerminated: Bool = false) throws -> Int where Element == T.CodeUnit {
let scalarSet = BuiltInUnicodeScalarSet(type: type.setType)
var bufferIdx = 0
let bufferLength = buffer.count
var sortBuffer: [UnicodeScalar] = []
var seenNullIdx: Int? = nil
var decoder = T()
var iterator = self.makeIterator()
func appendOutput(_ values: some Collection<UInt8>) throws {
let bufferPortion = UnsafeMutableBufferPointer(start: buffer.baseAddress!.advanced(by: bufferIdx), count: bufferLength - bufferIdx)
guard bufferPortion.count >= values.count else {
throw DecompositionError.insufficientSpace
}
bufferIdx += bufferPortion.initialize(fromContentsOf: values)
}
func appendOutput(_ value: UInt8) throws {
guard bufferIdx < bufferLength else {
throw DecompositionError.insufficientSpace
}
buffer.initializeElement(at: bufferIdx, to: value)
bufferIdx += 1
}
func encodedScalar(_ scalar: UnicodeScalar) throws -> some Collection<UInt8> {
guard let encoded = UTF8.encode(scalar) else {
throw DecompositionError.illegalScalar
}
return encoded
}
func fillFromSortBuffer() throws {
guard !sortBuffer.isEmpty else { return }
sortBuffer.sort {
$0.properties.canonicalCombiningClass.rawValue < $1.properties.canonicalCombiningClass.rawValue
}
for scalar in sortBuffer {
try appendOutput(encodedScalar(scalar))
}
sortBuffer.removeAll(keepingCapacity: true)
}
decodingLoop: while bufferIdx < bufferLength {
var scalar: UnicodeScalar
switch decoder.decode(&iterator) {
// We've finished the input, return the index
case .emptyInput: break decodingLoop
case .error: throw DecompositionError.decodingError
case .scalarValue(let v): scalar = v
}
if scalar.value == 0 {
// Null bytes within the string are fine as long as they are at the end
seenNullIdx = bufferIdx
} else if seenNullIdx != nil {
// File system representations are c-strings that do not support embedded null bytes
throw DecompositionError.illegalScalar
}
let isASCII = scalar.isASCII
if isASCII || scalar.properties.canonicalCombiningClass == .notReordered {
try fillFromSortBuffer()
}
if isASCII {
try appendOutput(UInt8(scalar.value))
} else {
#if FOUNDATION_FRAMEWORK
// Only decompose scalars present in the declared set
if scalarSet.contains(scalar) {
sortBuffer.append(contentsOf: String(scalar)._nfd)
} else {
// Even if a scalar isn't decomposed, it may still need to be re-ordered
sortBuffer.append(scalar)
}
#else
// TODO: Implement Unicode decomposition in swift-foundation
sortBuffer.append(scalar)
#endif
}
}
try fillFromSortBuffer()
if iterator.next() != nil {
throw DecompositionError.insufficientSpace
} else {
if let seenNullIdx {
return seenNullIdx + 1
}
if nullTerminated {
try appendOutput(0)
}
return bufferIdx
}
}
}
#if FOUNDATION_FRAMEWORK
@objc
extension NSString {
@objc
func __swiftFillFileSystemRepresentation(pointer: UnsafeMutablePointer<CChar>, maxLength: Int) -> Bool {
autoreleasepool {
let buffer = UnsafeMutableBufferPointer(start: pointer, count: maxLength)
// See if we have a quick-access buffer we can just convert directly
if let fastCharacters = self._fastCharacterContents() {
// If we have quick access to UTF-16 contents, decompose from UTF-16
let charsBuffer = UnsafeBufferPointer(start: fastCharacters, count: self.length)
return (try? charsBuffer._decomposedRebinding(.hfsPlus, as: Unicode.UTF16.self, into: buffer, nullTerminated: true)) != nil
} else if self.fastestEncoding == NSASCIIStringEncoding, let fastUTF8 = self._fastCStringContents(false) {
// If we have quick access to ASCII contents, no need to decompose
let utf8Buffer = UnsafeBufferPointer(start: fastUTF8, count: self.length)
// We only allow embedded nulls if there are no non-null characters following the first null character
if let embeddedNullIdx = utf8Buffer.firstIndex(of: 0) {
if !utf8Buffer[embeddedNullIdx...].allSatisfy({ $0 == 0 }) {
return false
}
}
var (leftoverIterator, next) = buffer.initialize(from: utf8Buffer)
guard leftoverIterator.next() == nil && next < buffer.endIndex else {
return false
}
buffer[next] = 0
return true
} else {
// Otherwise, bridge to a String which will create a UTF-8 buffer
return String(self)._fileSystemRepresentation(into: buffer)
}
}
}
}
#endif
|