Regular expression crashes Apache due to PCRE limitations

Andross01

New Member
I am currently creating bbcode parsing engine and I have encountered a situation what I can't figure out on my own.The thing is, that I popped into a problem exactly like this one:Apache / PHP on Windows crashes with regular expressionThat means that if I make something like the example below Apache crashes because of recursion count reaching 690 (1MB memory limit for PCRE):\[code\]$txt = ''.str_repeat('a', 338).''; // if I change repeat count to lower value it's ok$regex = '#\[(?P<attributes>(?P<tag>[a-z0-9_]*?)(?:=.*?|\s.*?|))](?P<content>(?:[^[]|\[(?!/?(?P=tag)])|(?R))+?)\[/(?P=tag)]#mi';echo preg_replace_callback($regex, function($matches) { return $matches['content']; }, $txt);\[/code\]So I need to somehow minimize the need of \[code\]*\[/code\] and \[code\]+\[/code\] in my regex, but that's where I'm out of ideas so I though maybe you could suggest something.Other approaches for parsing bbcode (that could handle nested tags) are welcome.However I would not like to use an already built class or something. I like to do things on my own!I have also looked into PECL and Pear HTML_BBCodeParser. But I don't want my application to be dependent on extensions. More likely I may do some script that checks for that extension and if it doesn't exist use the BBCode parser that I'm trying to do here.Sorry if my descriptions are gloomy, I'm not pro at English ^^EDIT. So the regex explained:\[code\]\[(?P<attributes>(?P<tag>[a-z0-9_]*?)(?:=.*?|\s.*?|))]\[/code\]This is my opening tag. I have used named groups. With 'tag' I identify tag and with 'attributes' I identify tags attributes. Think of tag as an attribute also. So what is happening here? I try to match a tag, when a tag is matched, I try to match anything after \[code\]=\[/code\] sign or anything after \[code\]\s\[/code\] (spacer) until it reaches tag closure \[code\]]\[/code\].\[code\](?P<content>(?:[^[]|\[(?!/?(?P=tag)])|(?R))+?)\[/code\]Now here I am trying to match content. This is the tricky part. I am looking for any character that is not [ and if I find any, then I check if it is not my ending tag or recursion, and I tell the regex engine to do so until....\[code\]\[/(?P=tag)]\[/code\]... the ending tag is found.
 
Back
Top