Pattern matching text in the body of a PDF and adding hyperlinks with PHP

pead87

New Member
The situation is as follows: I have a series of big, fat PDF files, full of imagery and randomly distributed text - these are the sections of a huge promotional pricelist for a vast array of products. What I need is to pattern-match all the catalogue codes in the text of each PDF file and to wrap it with a hyperlink that will point to the respective page in an online store.So the task is very simple - scan a PDF file for all plain-text \[code\]10\[/code\] digits sequences, and convert those into links whose href is \[code\]http://something?code=[match]\[/code\].I would also prefer to put this together in a PHP script if possible, but any language would do. I have a gut feeling that maybe even flash could be an option.Any ideas? Thanks in advance.EDIT:Some answers coming in are teaching me pcre syntax. The problem here is that I need to search and replace in a PDF file. So the problem is twofold. Say we'll do this in PHP:
  • How do you read / write to a PDF in PHP?
  • As PDFs aren't plaintext files, I can't just regex against them, and I also believe that PDF links are not bundled together with the text but come separate as regions. Which also means that I could maybe overlay an active rectangle over the coordinates of the catalogue code's characters, if I only knew where a matched code resides on a page.
What do you think? Other languages are also an option.Thanks.
 
Back
Top