String manipulation vs Regexps

bets26

New Member
We are often told that Regexps are slow and should be avoided whenever possible.However, taking into account the overhead of doing some string manipulation oneself (not talking about algorithm mistakes - this is a different matter), especially in \[code\]PHP\[/code\] or \[code\]Perl\[/code\] (maybe \[code\]Java\[/code\]) what is the limit, in which case can we consider string manipulation to be a better alternative? What regexps are particularly CPU greedy?For instance, for the following, in \[code\]C++\[/code\], \[code\]Java\[/code\], \[code\]PHP\[/code\] or \[code\]Perl\[/code\], what would you recommendThe regexps would probably be faster:
  • \[code\]s/abc/def/g\[/code\] or a \[code\]... while((i=index("abc",$x)>=0) ...$y .= substr()...\[/code\] based solution?
  • \[code\]s/(\d)+/N/g\[/code\] or a scanning algorithm
But what about
  • an email validation regexp?
  • \[code\]s/((0|\w)+?[xy]*[^xy]){2,7}/u/g\[/code\]
wouldn't a handmade and specific algorithm be faster (while longer to write)?edit The point of the question is to determine what kind of regexp would better be rewritten specifically for a given problem via string manipulation?edit2A common implementation is Perl regexp. For instance in Perl - that requires to know how they are implemented - what kind of regexp is to be avoided, because the implementation will make the process lengthy and ineffective? It may not be a complex regexp...edit July 2011 (based on comments)I'm not saying all regexps are slow. Some particular regexps patterns are known to be slow, due to the particular processing their and due to their implementation.
In recent Perl / PHP implementations for instance, what is known to be rather slow - and should be avoided?
The answer is expected from people who did already their own research (profiler...) and who are able to provide a kind of general guidelines about what is recommended/to be avoided.
 
Back
Top