1##! User-extensible search-pattern interface.2##!3##! `Matcher` abstracts "things that can answer whether, and where, an4##! element matches inside a haystack". It's the foundation for5##! pluggable `.find` / `.contains` / `.split` on `Str` and any6##! `Seq`-like type. This module supplies the trait, four built-in7##! impls (`Str`, `Codepoint`, `(Vec Codepoint)`, `(Vec U8)`), and a8##! blanket impl for quotations.9##!10##! `Matcher` carries both `elem` and `haystack` as primary parameters11##! today; the `CodepointMatcher` / `ByteMatcher` constraint aliases12##! recover the single-parameter shape. Byte-index ranges use13##! `(Range AnyInt)` and the byte matcher's haystack and element are14##! `(Vec AnyInt) / AnyInt` — both sharpen to the narrower `AnyUInt`15##! and `U8` shapes once the corresponding checker plumbing lands.1617:use18:opencore19AnyIntAnyUIntBoolByteCodepointOptionNoneSomeRangeRangeBoth20StrU8UnitVec21:opentraitsEqOrdAddNotDefault22:opencodepointAscii23:opencollectionsFoldable24:end2526# si[impl stdlib.matcher.trait]27## Search-pattern interface. `Self` is the matcher (e.g. a literal28## needle, a codepoint, a character class). `haystack` is the input29## sequence type. `elem` is the element type of the haystack.30##31## `.find-in` returns the byte/index range of the first match.32## `.find-all-in` returns the non-overlapping match ranges in order.33:trait(pub) Matcherhaystackelem34## Byte/index range of the first match, or `None` if the matcher35## finds no occurrence in the input.36.find-in ( Selfhaystack-> (Option (RangeAnyInt)) )
37## All non-overlapping match ranges in input order.38.find-all-in ( Selfhaystack-> (Vec (RangeAnyInt)) )
39:end4041## Convenience constraint alias: "any matcher whose haystack is `Str`42## and whose element type is `Codepoint`". Used by `.find` / `.split`43## on `Str` once the follow-up WP rewires them.44:trait(pub) CodepointMatcher { (MatcherSelfStrCodepoint) } :end4546## Convenience constraint alias: "any matcher whose haystack is a47## byte `Vec` and whose element type is an `AnyInt` in the byte range".48:trait(pub) ByteMatcher { (MatcherSelf (VecAnyInt) AnyInt) } :end4950# si[impl stdlib.matcher.str-literal]51## `Str` is a matcher that matches itself byte-identically inside52## another `Str`. `.find-in` delegates to the host `str-find` intrinsic53## and shapes the byte offset into `Some (RangeBoth start end)` or54## `None`. `.find-all-in` walks the input with successive slices.55:implMatcherStrStrCodepoint56.find-in ( StrStr-> (Option (RangeAnyInt)) )
57pop-> haystackpop-> needle58needlehaystackstr-findpop-> pos59pos0 < :if60None61:else62posposneedlestr-len + RangeBothSome63:end ;
6465.find-all-in ( StrStr-> (Vec (RangeAnyInt)) )
66pop-> haystackpop-> needle67needlestr-lenpop-> nlen68haystackstr-lenpop-> hlen69nlen0 = :if (Vec.default) :ret:end70 (Vec.default) 0# stack: acc cursor71:loop72# stack: acc cursor73duphlen ≥ :ifdrop:break:end74pop-> cursor75needlecursorhlenhaystackstr-slicestr-findpop-> rel76rel0 < :if:break:end77# acc is still on stack; we need: acc' = acc .push (RangeBoth start end), cursor' = end78cursorrel + pop-> start79startnlen + pop-> end80startendRangeBoth.push# stack: acc'81end# stack: acc' cursor'82:end ;
83:end8485# si[impl stdlib.matcher.codepoint-single]86## A `Codepoint` is a matcher that matches its UTF-8-encoded single87## character inside a `Str`. `.find-in` encodes the codepoint to a88## one-char `Str` and delegates to `str-find`.89:implMatcherCodepointStrCodepoint90.find-in ( CodepointStr-> (Option (RangeAnyInt)) )
91pop-> haystack92char-to-strpop-> needle93needlehaystackstr-findpop-> pos94pos0 < :if95None96:else97posposneedlestr-len + RangeBothSome98:end ;
99100.find-all-in ( CodepointStr-> (Vec (RangeAnyInt)) )
101pop-> haystack102char-to-strpop-> needle103needlestr-lenpop-> nlen104haystackstr-lenpop-> hlen105 (Vec.default) 0# stack: acc cursor106:loop107duphlen ≥ :ifdrop:break:end108pop-> cursor109needlecursorhlenhaystackstr-slicestr-findpop-> rel110rel0 < :if:break:end111cursorrel + pop-> start112startnlen + pop-> end113startendRangeBoth.push114end115:end ;
116:end117118# si[impl stdlib.matcher.codepoint-set]119## A `(Vec Codepoint)` is a matcher that matches any codepoint in the120## set. Walks the haystack by byte offset via `str-next-char`.121:implMatcher (VecCodepoint) StrCodepoint122.find-in ( (VecCodepoint) Str-> (Option (RangeAnyInt)) )
123pop-> haystackpop-> set124haystackstr-lenpop-> hlen1250pop-> cursor126None# stack: result127:loop128cursorhlen ≥ :if:break:end129cursorhaystackstr-next-charpop-> chpop-> next-pos130set [ pop-> otherchchar-to-intotherchar-to-int = ] .any:if131drop# drop old None132cursornext-posRangeBothSome# push new Some(..)133:break134:end135next-pospop-> cursor136:end ;
137138.find-all-in ( (VecCodepoint) Str-> (Vec (RangeAnyInt)) )
139pop-> haystackpop-> set140haystackstr-lenpop-> hlen1410pop-> cursor142 (Vec.default) # stack: acc143:loop144cursorhlen ≥ :if:break:end145cursorhaystackstr-next-charpop-> chpop-> next-pos146set [ pop-> otherchchar-to-intotherchar-to-int = ] .any:if147cursornext-posRangeBoth.push# mutates acc on stack148:end149next-pospop-> cursor150:end ;
151:end152153# si[impl stdlib.matcher.byte-seq]154## A byte vector is a matcher that matches a byte-level substring155## inside a byte-vector haystack. Uses a helper fn156## `byte-substr-matches-at?` because nested quotations that close over157## vector locals move-consume those locals each iteration (captures158## are `std::mem::replace`-with-0 per the abstract-machine RC design);159## a plain :fn receiving Vec arguments on the stack avoids that hazard.160:fnbyte-substr-matches-at? ( AnyInt (VecAnyInt) (VecAnyInt) ->Bool )
161pop-> needlepop-> haystackpop-> i162needle.lenpop-> nlen1630pop-> j164true# stack: [matched]165:loop166jnlen ≥ :if:break:end167ij + haystack.getjneedle.get = not:if168dropfalse169:break170:end171j1 + pop-> j172:end173:end174175:implMatcher (VecAnyInt) (VecAnyInt) AnyInt176.find-in ( (VecAnyInt) (VecAnyInt) -> (Option (RangeAnyInt)) )
177pop-> haystackpop-> needle178needle.lenpop-> nlen179haystack.lenpop-> hlen180nlen0 = :if00RangeBothSome:ret:end1810pop-> i182None# stack: [result]183:loop184inlen + hlen > :if:break:end185ihaystackneedlebyte-substr-matches-at?:if186drop187iinlen + RangeBothSome188:break189:end190i1 + pop-> i191:end ;
192193.find-all-in ( (VecAnyInt) (VecAnyInt) -> (Vec (RangeAnyInt)) )
194pop-> haystackpop-> needle195needle.lenpop-> nlen196haystack.lenpop-> hlen197nlen0 = :if (Vec.default) :ret:end198# Note: this implementation yields overlapping matches (advances199# by 1 on hit, not by nlen) because non-overlapping advance200# requires rebinding the loop index inside a conditional, which201# the current checker does not preserve past the branch. Tighten202# to non-overlapping once local-rebinding-inside-conditional203# lands or the impl is rewritten as an iterator.204 (Vec.default) # stack: [acc]2050pop-> i206:loop207inlen + hlen > :if:break:end208ihaystackneedlebyte-substr-matches-at?:if209iinlen + RangeBoth.push# mutate acc on stack210:end211i1 + pop-> i212:end ;
213:end214215# si[impl stdlib.matcher.quotation]216## Blanket impl: any quotation `[elem -> Bool]` is itself a matcher217## against a `(Vec elem)` haystack. Interprets "match at position i"218## as "quot applied to element i returns true".219##220## Spec canonical shape is221## `[ (Seq elem _) AnyUInt -> (Option AnyUInt) ]` — a next-match222## driver that receives the remaining-haystack view and a starting223## offset and returns the end offset of the match. That shape lets224## matchers scan arbitrarily forward. Today the checker cannot225## dispatch blanket impls whose `Self` is a quotation type of that226## full shape, and structural-search-over-Seq-existentials is not227## expressible either, so this impl lands in a simpler predicate228## flavour: `[elem -> Bool]` walking a `(Vec elem)` haystack one229## element at a time. The driver-style blanket impl will land once230## those checker features arrive.231:implMatcher [elem->Bool] (Vecelem) elem232.find-in ( [elem->Bool] (Vecelem) -> (Option (RangeAnyInt)) )
233pop-> haystackpop-> pred234haystack.lenpop-> hlen2350pop-> i236None# stack: result237:loop238ihlen ≥ :if:break:end239ihaystack.getpred.call:if240drop241ii1 + RangeBothSome242:break243:end244i1 + pop-> i245:end ;
246247.find-all-in ( [elem->Bool] (Vecelem) -> (Vec (RangeAnyInt)) )
248pop-> haystackpop-> pred249haystack.lenpop-> hlen2500pop-> i251 (Vec.default) # stack: acc252:loop253ihlen ≥ :if:break:end254ihaystack.getpred.call:if255ii1 + RangeBoth.push256:end257i1 + pop-> i258:end ;
259:end