-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying names for capture groups #30
Comments
I started working on this, too. I'll take a look at your code in a second, but I recommend digging into the source for XRegExp (https://github.com/slevithan/xregexp). It augments js regexes with a lot of functionality found in other languages, including named captures. |
I think the way xregexp handles named captures makes sense, specifically:
The way it's implemented is by replacing the native replace & exec functions. I was going to start adapting that code for use here tomorrow night, but you're on the right track, so feel free to keep at it! I'll fetch any new commits made to your fork & pick it up where ever you leave off, unless you knock it all out. |
Also, I'm not sure if requiring a group name in endCapture() is necessary or worth doing. It feels like it might add a little more cognitive load to anyone using the library, since it requires them to track which group is being closed, though I can also see it being helpful seeing the group names when examining the code. It might be worth creating a separate pull request just for that aspect of the named captures that can be held on to while verex gains more traction/usage. If the need/desire for supporting names in endCapture emerges, the PR can be merged in at that point. |
@crawfordcomeaux I wasn't planning to require a group name in endCapture()--just allow it if someone wants the additional sanity check in their code. (I've found that in more complex regexes with nested groups, it's easy to get something wrong.) I'll look at xregexp as you suggested--thanks! GitHub newbie question: when you said "I'll take a look at your code", do you mean you already found it in my fork? Or do I need to do something else first? |
I already cloned your fork & took a peek at things, though I haven't tried using it yet. In the future, as a courtesy, you could link to what you want feedback on. I'm not sure if there's a magical autocomplete/auto-link syntax for doing that, but check out the links I added in #32 for git/Github protips. As for optionally passing group names to endCapture, you've convinced me 👍 |
I think having distinct chainable methods for 'start' and 'end' is a bad idea — there's a potential for missing the 'end', or introducing one too many, that is the kind of problem that RegEx suffers from in its illegibility and VerEx supposedly solves. How about This is more flexible (you can predefine VerEx / RegExs for groups), more legible (the group is cleanly separated), and makes invalid RegEx construction impossible (throws a JS syntax error on parse instead of on execution). |
Makes a lot more sense to me. +1 On Wed, Aug 7, 2013 at 3:15 AM, Barney Carroll notifications@github.com
|
I agree, and that's something that occurred to me also. The same thinking could apply to other functions, particularly "or". Writing a regex for something like /a.*|b[012]c|de/ looks something like
which I guess is OK, but I think something like this might be clearer, since the different alternatives are grouped together clearly: VerEx().oneOf (VerEx().find("a").anything(), Or not--I don't know. It's annoying that every argument would have to be prefixed with VerEx(). I tried to think of a way to set up a scope (perhaps in a nested function) in which the functions like "find", "anyOf", etc., could be used at the top level without a prefix, but I couldn't come up with one. (Maybe it's possible with eval()...) Anyway, doing things without chaining may be necessary to write certain kinds of regex's. I'm not sure it's currently possible with VerbalExpressions to write something like /(?:[A-Za-z][A-Za-z0-9]+, *)+/, i.e. putting a quantifier on a group, even with a non-capture group. Maybe there is a way, and I just haven't figured it out. |
Instead of assigning string names to capture groups, how about just defining them as separate VerEx objects and allow them to be passed around as arguments? Any update on this issue? |
This is 6 years later, but support for ES2018 named capture groups is coming in |
Following up on go-oleg's comment on issue #17: I tried adding his suggestion to add names to capture groups. I've got something working but wanted to run my ideas past others first. Please note that this would be the first time I've ever contributed anything on github. Here's what I came up with:
beginCapture(group): The group parameter is optional. If it's an integer (or a string of all digits), the value must match the actual group number that would be assigned to the group (in JavaScript this would be the index into the array returned by regexp.exec()) or else it's an error. If it's some other string, it's a group name that's assigned to the capture group.
endCapture(group): The group parameter is optional; if present it must match the parameter of the corresponding beginCapture().
groupNumberFor(group): would be used after the expression is built to return the index number. E.g.
var tester = VerEx()
.startOfLine()
.beginCapture( "one" )
.then( "http" )
.maybe( "s" )
.endCapture( "one" )
.then( "://" )
.maybe( "www." )
.anythingBut( " " )
.endOfLine();
var testMe = "https://www.google.com";
var a = tester.exec(testMe);
a [tester.groupNumberFor( "one" )]) -- returns "https"
I'm not sure how to handle the errors, though (mismatched group name or number). I've made them throw exceptions, but if this is the wrong way to go I can change it. Again, I'm really new at this so any suggestions are appreciated.
The text was updated successfully, but these errors were encountered: