1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
|
# `Predicate` `Regex` Support
* Proposal: SF-0004
* Author(s): [Jeremy Schonfeld](https://github.com/jmschonfeld)
* Review Manager: [Charles Hu](https://github.com/iCharlesHu)
* Status: **Accepted**
* Implementation: [apple/swift-foundation#380](https://github.com/apple/swift-foundation/pull/380)
## Introduction/Motivation
`NSPredicate` supports complex string pattern matching via regular expression support. For example, you could write an `NSPredicate` such as `NSPredicate(format: "zipcode MATCHES %@", "\\d{5}(-\\d{4})?")` in order to match records where the `zipcode` string property is a valid US postal code. The new swift `Predicate` type supports basic string matching using functions/operators such as `==`, `contains`, `localizedStandardContains`, `localizedCompare`, and `caseInsensitiveCompare`, however `Predicate` does not currently support complex pattern matching operations such as regular expression matching.
## Proposed solution and example
In order to help continue to achieve feature parity with `NSPredicate` and ease `Predicate` adoption for developers with complex matching logic, we'd like to add regex support to `Predicate`. We propose adding new APIs to allow developers to use the new swift-designed `Regex` type within `Predicates`. For example:
```swift
let regex = Regex {
Anchor.startOfSubject
Repeat(.digit, count: 5)
Optionally {
"-"
Repeat(.digit, count: 4)
}
Anchor.endOfSubject
}
let predicate = #Predicate<Address> {
$0.zipcode.contains(regex)
}
// - OR -
let predicate = #Predicate<Address> {
$0.zipcode.contains(/^\d{5}(-\d{4})?$/)
}
```
## Detailed design
We propose adding the following APIs to support using regular expressions within predicates:
```swift
extension PredicateExpressions {
@available(FoundationPreview 0.4, *)
public struct StringContainsRegex<
Subject : PredicateExpression,
Regex : PredicateExpression
> : PredicateExpression, CustomStringConvertible
where
Subject.Output : BidirectionalCollection,
Subject.Output.SubSequence == Substring,
Regex.Output : RegexComponent
{
public typealias Output = Bool
public let subject: Subject
public let regex: Regex
public init(subject: Subject, regex: Regex)
}
@available(FoundationPreview 0.4, *)
public func build_contains<Subject, Regex>(_ subject: Subject, _ regex: Regex) -> StringContainsRegex<Subject, Regex>
}
@available(FoundationPreview 0.4, *)
extension PredicateExpressions.StringContainsRegex : Sendable where Subject : Sendable, Regex : Sendable {}
@available(FoundationPreview 0.4, *)
extension PredicateExpressions.StringContainsRegex : Codable where Subject : Codable, Regex : Codable {}
@available(FoundationPreview 0.4, *)
extension PredicateExpressions.StringContainsRegex : StandardPredicateExpression where Subject : StandardPredicateExpression, Regex : StandardPredicateExpression {}
```
Additionally, we will add the following APIs to support storing a predicate-supported regex constant value:
```swift
extension PredicateExpressions {
@available(FoundationPreview 0.4, *)
public struct PredicateRegex : Sendable, Codable, RegexComponent, CustomStringConvertible {
var regex: Regex<AnyRegexOutput> { get }
var stringRepresentation: String { get }
public init?(_ component: some RegexComponent)
}
@available(FoundationPreview 0.4, *)
public func build_Arg(_ component: some RegexComponent) -> Value<PredicateRegex>
}
```
This `PredicateRegex` type will be the `Codable & Sendable` storage for an underlying `RegexComponent`. Rather than storing the `RegexComponent` (which is not `Codable & Sendable`) directly in a `PredicateExpressions.Value`, this `build_Arg` overload allows us to store it inside of our wrapper type. We cannot catch all cases of unsupported regular expressions at runtime, so the `build_Arg` overload will `fatalError` for cases where the developer has constructed a non-representable regex. The `PredicateRegex` initializer is failable allowing developers performing manual predicate construction to determine appropriate behavior for non-representable regular expressions. We support all regular expressions that can be transformed to a textual representation; unsupported expressions include those built with capture transform closures or custom parsers.
_Note: The syntax returned by the `stringRepresentation` property will follow the Swift regex literal syntax as defined by [SE-0355](https://github.com/apple/swift-evolution/blob/main/proposals/0355-regex-syntax-run-time-construction.md#syntax) which is a syntactic "superset" of a set of popular regular expression engines._
## Source compatibility
The proposed changes are additive and there is no impact expected on existing source.
## Implications on adoption
The new API has an availability of FoundationPreview 0.4 or later.
## Alternatives considered
### Separate `Codable & Sendable` `Regex` type
Currently, all regular expressions are represented by the `Regex` type (and/or `RegexComponent` protocol) which are neither `Codable` nor `Sendable`. We could introduce a separate type/protocol that has a `Codable & Sendable` requirement (as well as a requirement to convert to a textual representation / to be introspectable), however this requires a considerable amount of new, duplicated APIs and would introduce a number of questions around which type heirarchy a given regex construction should produce. Due to the amount of effort and uncertainty around whether we can establish a fully statically-checked approach, we've decided the best option is to validate whether a regular expression is supported at runtime. We expect the overwhelming majority of expressions used in predicates will be supported which should minimize this impact on the developer experience. If we decide to create a new `Regex` type in the future that matches these requirements, we can add new APIs to predicate to support this type as well.
### Supporting a whole match in addition to `contains`
To determine whether a string fully matches a regular expression, this API requires the use of start/end anchors (`^`/`$`) with the `contains` function in order to achieve a full-string match. As it stands today, there is no API that returns a `Bool` value as to whether a string has a whole match. We could instead choose to support the existing `wholeMatch` API which returns a `Regex.Match?`, for example:
```swift
let predicate = #Predicate<Address> {
$0.zipcode.wholeMatch(/\d{5}(-\d{4})?/) != nil
}
```
However, this API would be rather difficult for developers to discover and use. The `Regex.Match` type is neither `Codable` nor `Sendable`, so developers would only be able to compare against a `nil` value. Additionally, it may also be tempting for developers to access various properties on `Regex.Match` such as `output`, `range`, or its captures which would not be supported in any SwiftData query or `NSPredicate` conversion. For this reason, I've only proposed support for the `contains` function which developers can add start/end anchors to in order to accomplish the behavior of a whole match. If a `Bool`-returning whole match function were added to the standard library in the future, we could choose to add support for that in addition to the existing support for the `contains` function.
### Alternatives to `fatalError` in the new `build_Arg` overload
In the new `build_Arg` overload for regex constants, we will `fatalError` if provided a regex that is not supported by `Predicate` (see details above). There is not an alternative to throw an `Error` here because `Predicate` construction is non-throwing and thus `build_Arg` must also be non-throwing. A possible alternative to this `fatalError` could be to allow any regex to be included in a predicate, but `throw` during evaluation of this predicate. However, this approach has a handful of downfalls detailed below:
1. Non-supported regex components may not be `Sendable` and including them could allow for inclusion of non-`Sendable` information within a `Predicate`. While we could take care to check for support before ever using or exposing the value, this could be prone to violating the `Sendable` contract for swift concurrency support
2. `Predicate` evaluation may take place a far distance from where the `Predicate` was constructed (potentially even in a different library or process). While throwing an error may be more resilient than a `fatalError`, increasing the distance between where the mistake (the invalid regex) was made and where the mistake is reported makes the issue harder to debug and the library/process that does encounter the failure may not be the most apt to adress the issue.
3. Doing so would also lead to more hoops to jump through for predicate inspection. With the current proposal, all regular expressions that a `Predicate` can contain will always be able to produce a `String` of its contents for inspection/usage, however pushing this error on to evaluation time would also require pushing this error on to each predicate conversion routine which may also be unfavorable.
For these reasons, I've chosen the approach of a `fatalError` during construction to call out the developer error in the best way we're able.
|