Hey I was wondering if some of you guys could help me on a problem I am having. I am working on a searching engine that will be primarily doing wildcard searching on a set at least 33 million URLs. The problem I am having is figuring out a underlying architecture that would support wildcard searching on this set in hopefully less than one second.
Some setups I am experimenting with includes using agrep/glimpse or using a database. Grep searchings have been working for smaller sets, but I am unsure that it will be efficient on a much larger corpus. I am hoping indexing with glimpse and perhaps putting the data in memory might help, but I haven't tried it yet. I have also tried using MySQL, but it's performance is even less than grep. I was thinking about trying another more powerful database, but MySQL's performance had been so poor that I am not sure trying other databases may help.
Anyones advise would be much appreciated. Perhaps someone could give me some tips on other paths I could try or some assurance that one of the above ideas might be worth looking into.
Thank you very much,
Ryan Sit
Some setups I am experimenting with includes using agrep/glimpse or using a database. Grep searchings have been working for smaller sets, but I am unsure that it will be efficient on a much larger corpus. I am hoping indexing with glimpse and perhaps putting the data in memory might help, but I haven't tried it yet. I have also tried using MySQL, but it's performance is even less than grep. I was thinking about trying another more powerful database, but MySQL's performance had been so poor that I am not sure trying other databases may help.
Anyones advise would be much appreciated. Perhaps someone could give me some tips on other paths I could try or some assurance that one of the above ideas might be worth looking into.
Thank you very much,
Ryan Sit