Process optimization for large sets of data

smart_666 · Sep 17, 2012

I currently have a project where we are dealing with 30million+ keywords for PPC advertising. We maintain these lists in Oracle. There are times where we need to remove certain keywords from the list. The process includes various match-type policies to determine if the keywords should be removed:

EXACT: \[code\]WHERE keyword = '{term}'\[/code\]
CONTAINS: \[code\]WHERE keyword LIKE '%{term}%'\[/code\]
TOKEN: \[code\]WHERE keyword LIKE '% {term} %' OR keyword LIKE '{term} %'OR keyword LIKE '% {term}'\[/code\]

Now, when a list is processed, it can only use one of the match-types listed above. But, all 30mil+ keywords must be scanned for matches, returning the results for the matches. Currently, this process can take hours/days to process depending on the number of keywords in the list of keywords to search for.Do you have any suggestions on how to optimize the process so this will run much faster?UPDATE:Here is an example query to search for Holiday Inn:\[code\]SELECT * FROM keyword_list WHERE(lower(text) LIKE 'holiday inn' ORlower(text) LIKE '% holiday inn %' ORlower(text) LIKE 'holiday inn %');\[/code\]Here is the pastebin for the output of EXPLAIN: http://pastebin.com/tk74uhP4Some additional information that may be useful. A keyword can consist of multiple words like:

this is a sample keyword
i like my keywords
keywords are great

Process optimization for large sets of data

smart_666

New Member