Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

No support for lookbehind #571

Closed
ghost opened this issue Oct 22, 2015 · 61 comments · Fixed by atom/atom#19615
Closed

No support for lookbehind #571

ghost opened this issue Oct 22, 2015 · 61 comments · Fixed by atom/atom#19615

Comments

@ghost
Copy link

ghost commented Oct 22, 2015

Implementing JavaScript's native RegExp engine for a find/replace tool in a text editor seems like a dramatic oversight.

As find-and-replace leans on JS's RegExp, of course the following find expressions is invalid

(?<=hello)world
(?<!hello)world

Errors

Invalid regular expression: /(?<=hello)world/: Invalid group
Invalid regular expression: /(?<!hello)world/: Invalid group

This module is unusable without such features

@Epigene
Copy link

Epigene commented Oct 29, 2015

I agree, lookaround is crucial!

Using JS's RegExp is also weird considering Oniguruma is already used within Atom to parse grammar files as mentioned in this exchange.

@abe33
Copy link
Contributor

abe33 commented Nov 18, 2015

I as already mentioned on Discuss, using oniguruma for grammar made sense as it allows to reuse TextMate and SublimeText grammars out of the box, but AFAIK that's the only context where oniguruma is used.
OTOH I agree it's a shame that lookbehind is not part of JS regexp (while loohahead is).

@jesseleite
Copy link

I don't know much about the subject, apart from lookbehinds not being part of JS regex. However, as a PHP dev who regular-ily (pun intended) regexes via PHP regex engine with lookbehinds, it sucks not being able to lookbehind in Atom's find and replace. Curious about this Oniguruma?

@alexchandel

This comment has been minimized.

@jesseleite
Copy link

@alexchandel I'm unfamiliar with Oniguruma. I'm all for more features, but is the syntax standard and testable on a site like regex101.com?

@50Wliu
Copy link
Contributor

50Wliu commented Apr 7, 2016

@jesseleite It's what Atom uses for languages, and I've never had a problem testing any of those regexes on regex101.com.

@blackbaud-jakespirek

This comment has been minimized.

@alexchandel
Copy link

@bb-jakespirek It would if #698 happened

@lee-dohm lee-dohm changed the title positive and negative lookbehind (lookaround) Use a better regular expression engine May 3, 2016
@alexchandel
Copy link

Since #698 is closed as a duplicate, is this issue now for using Oniguruma for regexes?

@ghost
Copy link
Author

ghost commented May 19, 2016

relevant: #570

Oniguruma would fix this too.

@bric3
Copy link

bric3 commented Jun 16, 2016

@jesseleite https://regex101.com/r/dO0yG2/1

It depends on the regex engine, but yes it's almost de facto standard

@lzkelley
Copy link

Would this also allow better handling of indentation errors (e.g. atom/language-python#22)? Which really shouldn't need special packages to solve (e.g. https://atom.io/packages/python-indent)...

@50Wliu
Copy link
Contributor

50Wliu commented Jun 23, 2016

@lzkelley No, as find-and-replace has nothing to do with language grammars. Those are already using Oniguruma.

@MattSturgeon
Copy link

Those are already using Oniguruma.

Wait.. Atom already uses Oniguruma, but not for Find? Surely it should be a simple fix then? ;)

@ErikCorryGoogle
Copy link

Latest V8 releases have lookbehind available. It's behind the experimental JavaScript flag.

@lee-dohm
Copy link
Contributor

@alexchandel your comment was deleted as a violation of the Atom Code of Conduct as it is insulting or derogatory. You may consider this an official warning.

@MattSturgeon
Copy link

For those interested, @alexchandel was saying that while V8 has added experimental lookbehind support, it is still inferior to many alternative regex libraries in his opinion:

V8 still doesn't have have async callbacks, and catastrophic backtracking and misspelt regexes are the primary source of hangs in find-and-replace, for example #557 and #856.

He then went on to suggest using an alternative like Oniguruma (and rather rudely implied that the atom team should be prioritising this issue higher — hence the deleted comment).

@MartinBonner
Copy link

I was trying to find the documentation of the Atom regex syntax (given that they are all different), and was pleased to find that it used Oniguruma (with the lookbehind I wanted), and then was gutted to discover this bug.

It does seem rather extraordinary that Atom uses Oniguruma, but not for actual search-and-replace!

@Stanzilla
Copy link

VS Code recently switched to ripgrep for search and it might be worth investigating doing the same in Atom. It is really fast. ⚡️

@MartinBonner
Copy link

ripgrep is great for a "find in files" solution, but it achieves that speed (at least in part) by not supporting certain functions (the other part is achieved by not looking in .gitignore'd files - which is also only relevant to "find in files". In particular, I think it doesn't support look-around (which is what I was after). I don't think raw speed is nearly so important for a complex search-and-replace (which is what I wanted this feature for).

@ai-danno
Copy link

As a regex nerd looking for the ultimate editor, I was drawn to this conversation.

Question- does Atom now support lookbehinds? Thanks in advance!

@MartinBonner
Copy link

@ai-danno : No it doesn't. The lack of lookbehind is what drew me to this thread in the first place.

My comment was because switching to ripgrep would give us speed, but not lookaround (and I think that would be a poor tradeoff).

@ai-danno
Copy link

@MartinBonner 👍 Thanks for the fast reply. It seems like it might bubble to the surface here in not too long, so I'll keep checking back.

@MartinBonner
Copy link

@MattSturgeon The obvious reason not to use Oniguruma for find and replace is that it involves work to stop using the Javascript regex and use Oniguruma instead. In particular, if the Javascript regex is going to acquire the requested features anyway, it may be better to just wait for that to happen.

I'm not convinced that is a good enough reason, but I want to remind people it exists.

@lee-dohm
Copy link
Contributor

@MattSturgeon Here's what I see happening if we just accept Oniguruma:

  1. Atom maintainers do a bunch of work to migrate everything to Oniguruma
  2. People involved in this conversation all cheer
  3. Some other group comes along and says it is too slow or doesn't have X feature
  4. The whole argument starts all over again

We're not going to play Whack-a-Mole. I want to see a list of features and performance targets. I want to get broad-based community consensus that the list of features and performance targets is acceptable. I want us to find a regex engine that meets those features and targets. Then when someone comes along saying that isn't good enough I can point to all the work we did and the rationales everyone had for the choices made. "We couldn't think of a reason not to at the time," isn't a convincing rationale.

@Stanzilla
Copy link

What is wrong with ripgrep then? grep is a pretty accepted standard usually?

@jeancroy
Copy link
Contributor

jeancroy commented Mar 25, 2017

There's two engine to consider. One for search/replace in buffer, one for search/replace on whole project. Ripgrep apply to second case. Atom use scandal.

In general "what's wrong with new solution" is not a very useful question if we cannot pinpoint what's wrong with current solution.

@bric3
Copy link

bric3 commented Mar 25, 2017

I have been monitoring this issue for a while and I'm sad of the state of the progress on this issue. I'm trying to make an honest feedback and I am trying to be constructive. I am myself a developer of an opensource framework, I know there is features wanted by some users that are not in the priority list for various reasons. While priorities are important let's not ignore features that are useful for the users. As a library developer several time I never imagined some usage that users had, and limitations in my library hit them, so I am enclined to listen about usage and help them if it is possible, sometime it can take a long time but we do it. Regarding the information this issue has, it is very limited it only have the enhancement tag, whatever the Atom team chose it should at least indicate the state of this issue be it delayed, stopped, re-scoped, low-priority, never, to be redefined...

Feedback :

I am a strong Regex users, and I am usually disappointed by tools that don't support powerful enough edition. Especialy when there is complicated pattern to find (wiht big data it matter). Regarding regex I usually expects the features found in perlRE. Now I usually code with JetBrains tools that are very very good as IDEs. This lack of search support drives me and my colleague to use JetBrains IDEs and as such away from Atom. So we have a solution that works for us, but it is disrupting, and instead of starting with Atom we avoid it.

Some ideas :

  • Why not the silver searcher (ag) which seems to follow the fairly common ack standard. Now I don't know if this approach can fit in the atom design as it requires to run a separate process. But the benefit of this approach allows the UI to not hang.

  • Regex engines varies in features and performance, yet I think a fairly good scope (for a better regex engine No support for lookbehind #571 (comment)) would be the perl regex which has become quite popular outside the perl language (Java for example follows this engine in features).

@jeancroy
Copy link
Contributor

jeancroy commented Mar 25, 2017

I am a strong Regex users, and I am usually disappointed by tools that don't support powerful enough edition. Especially when there is complicated pattern to find (with big data it matter).

Can you share real life example where javascript engine fail your need ?
I think at this point this would be the most useful input to tip the balance.

IMO big data is a separate problem. With large enough data, it become more practical to write your own small tool (or use specialized big data tools) than try to fit within the limitation of a general purpose text editor.

Why not the silver searcher (ag).

In general grep like tool lack the ability to do multiline matches. This is because speed-usability compromise is different. See for example BurntSushi/ripgrep#176 or atom/scandal#5.

Ag in particular is free of this issue, but seems to be less compatible with window (?).
Also atom has an instant preview feature that need to test regexp on only some part of file, or files that are not yet saved to disk. (Finally the above issue of real life limitation remains).

kisg pushed a commit to paul99/v8mips that referenced this issue Mar 27, 2017
Previously the Boyer-Moore-Horspool optimization gave up in the presence of a
submatch.  A submatch is where we record the current position so that we can go
back to it, which is an essential part of the semantics of lookarounds
(lookaheads and lookbehinds).  This has been the case since
Boyer-Moore-Horspool was implemented, but it was overly cautious.

* For positive lookahead it is OK to use the patterns inside the lookahead to
  guide the BMS optimization.
* For positive lookbehind we harmlessly fail to optimize when the patterns
  inside the lookbehind go backwards because TextNode::EatsAtLeast returns 0.
* For negative lookarounds, the NegativeLookaroundChoiceNode::FillInBMInfo method
  (in jsregexp.h) knows to only look at the following pattern.

This is in response to disappointing lookbehind performance in Atom.
See atom/find-and-replace#571

R=yangguo@chromium.org
BUG=

Review-Url: https://codereview.chromium.org/2777583003
Cr-Commit-Position: refs/heads/master@{#44139}
@ErikCorryGoogle
Copy link

My performance fix for lookbehinds (and lookaheads) in V8 has landed. Of course it will take some time to percolate through to Atom.

@bric3
Copy link

bric3 commented Apr 4, 2017

@jeancroy Sorry for delayed answer.

Can you share real life example where javascript engine fail your need ?
I think at this point this would be the most useful input to tip the balance.

When dealing with a lot of heterogenous data, it is useful to inspect data. There are powerful tool on the comand line. But when you have to discover the data, and something is wrong or you want to just look at the structure of the data, I use quite often regular expressions, and more than rarely I use (positive / negative) look-ahead or look-behind expressions to identify specific parts of these files.
Actions may vary for me, e.g. sometime it is just looking up in a file. Sometime I really want to fix it.
What is disrupting is when you have to switch tools. Disclaimer : I understand what it is to use the right tool for the job, but for exploring data for any reason I'd like to stay in my editor.


Ah yes your are correct it seems complicated to have ag on windows. There's yet another alternative to ag, it's written in go and called the platinium searcher (pt) : https://github.com/monochromegane/the_platinum_searcher
As it is writen in go it is multipatform.

@jinglesthula
Copy link

I tried the atom --js-flags="--harmony_regexp_lookbehind" approach and now using lookbehind doesn't show the error, but it does sit and spin forever with "0 paths searched". Searching w/o lookbehind in the regex still works. Anyone else try this with better results?

More on topic, does anyone know why lookbehind didn't land in ES6? That seems like the root problem. Tried googling but didn't find much discussion.

@ErikCorryGoogle
Copy link

If it spins forever you may have written a regexp that takes forever. This is quite easy to do. Or you may have hit a bug. Can you share the regexp in question? There are people in this very thread who have tried the flag and it worked for them.

As for why it was not part of ES6, this is because there was not agreement on what form it should take. Perl-style fixed width lookbehinds had been proposed and I was not very interested in that happening. In the end I helped push the current full lookbehind proposal through so that we would not end up stuck with the fixed width version which has caused much gnashing of teeth in perl. To get something on the TC39 standard you need a champion who wants to write the spec and argue for it in committee, and these days you also need to actually implement it. Sometimes this takes time.

How do people feel about inline flags in regexps? eg
/(foo(?i)bar)/
would match foobar and fooBAR but not Foobar because the /i flag only goes from the (?i) to the close parenthesis. Is this something people miss?

@jeancroy
Copy link
Contributor

How do people feel about inline flags in regexps? eg
/(foo(?i)bar)/
would match foobar and fooBAR but not Foobar because the /i flag only goes from the (?i) to the close parenthesis. Is this something people miss?

I'm not sure I'd have much use for it. But if a single syntax is adopted I like the modifier on non-capturing group syntax. /(foo(?i:bar))/ . It seems .Net handle it (actually both) and I like how the scope is well defined.

@whoinoi
Copy link

whoinoi commented Jan 20, 2018

Is there any update on this? I found, that the following regular expression also doesn't work in Find in Projects: (?<!\$)\$(?!\$)

It says "Invalid regular expression /(?"

@fiapps
Copy link

fiapps commented Apr 2, 2018

In the interest of producing a concrete requirement, I (like many) require lookbehind, and in particular I want them to work for searches of the whole project. I would also point out that this was the requirement expressed in the original title of this issue: "positive and negative lookbehind (lookaround)." It only became a vague "better" due to a later edit.

Using Atom 1.25.0, with atom --js-flags="--harmony_regexp_lookbehind" to enable the experimental feature, I can search a single file quickly for (?<!{)foobar\b, but searching the entire project (23 files totaling less than 13000 lines) either takes an extremely long time or hangs, because after more than an hour, the spinner still said 0 files had been searched.

Over two years have passed since the announcement that lookbehind support was coming to V8, and we still don't have a way to reliably use lookbehind to search a whole project. Therefore, I do not think waiting for V8 to solve the problem is an adequate solution.

@ghost ghost changed the title Use a better regular expression engine No support for lookaround (lookahead, lookbehind) Apr 6, 2018
@ghost
Copy link
Author

ghost commented Apr 6, 2018

Sorry for the subjective title in the first place! Still waiting on lookaround in Atom...

@ErikCorryGoogle
Copy link

Is this really the regexp you used to test? If searching 13000 lines takes more than an hour then there's no way that's a V8 regexp performance issue. You probably need to file a new bug. Any way to work out which version of V8 this is?

I just tested this on V8 version 6.2 and the two regexps

/(?<!})foobar\b/

and

/foobar\b/

are exactly the same speed, about 14 million lines per second.

There is no reason for Atom to delay switching this on, that I can see. The proposal is finished: https://github.com/tc39/proposals/blob/master/finished-proposals.md

@ErikCorryGoogle
Copy link

Atom has had lookahead since forever, so the new title is not very accurate.

@fiapps
Copy link

fiapps commented Apr 9, 2018

I actually used a very similar regex with another word in place of foobar. However, testing (?<!{)foobar\b I get the same results (now with Atom 1.25.1): I can search a single file almost instantly, but a search of the whole project hangs or takes an extremely long time (update: after 4 hours it had still searched 0 files). I don't know whether the bug is in V8 or Atom. I haven't done any Atom development, so I don't know how to find out where it hangs, but it looks like jinglesthula reported the same issue above.

As for the version of V8, I see that the Atom 1.25.0 release notes report that Electron was upgraded to 1.7.11, and the Electron 1.7.0 release notes report that V8 was upgraded to 5.8.283.38.

@ErikCorryGoogle
Copy link

V8 version 5.8 is now over a year old, from March 20 2017: https://v8project.blogspot.dk/2017/03/v8-release-58.html

Unfortunately the performance fix was done on March 28 2017, so it has not got through to Atom yet.

Nevertheless there must be a different issue here, since the improvement was only by a factor of 2-3, and you are seeing a factor of at least 1 million slowdown.

@ghost ghost changed the title No support for lookaround (lookahead, lookbehind) No support for lookbehind Apr 13, 2018
@mrmass
Copy link

mrmass commented Jan 30, 2019

In the meantime I still get invalid regular expression on lookbehind syntax... Were there any plans to fix this issue?

@billybooth
Copy link

Right, is there any traction on this in the release version? I skimmed the thread, but I'd prefer to have negative lookbehind with poor performance than not at all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.