Building Topic-Specific Search Engines : A Data Mining Approach

  • 내보내기
  • 내책장담기
    • URL 복사
  • 오류접수

Topic specific search engines are becoming popular with the phenomental growth of the World Wide Web. They have higher accuracy rate than general purpose search engines, and offer functions they cannot provide. But the topic-specific search engines available nowadays have very low cost-efficiency, because they require intensive human labor, and thus enormous cost, to upkeep as weell as to build. Efficient processing of the exploding information in the World Wide Web seems to call for smarter search engines, topic-specific search engines that require far less human labor while performing almost as well as those built and maintained by humans. This dissertation is a contribution towards meeting this demand. Building and maintaining topic-specific search engines with minimal human labor requires an automatic or semi-automatic informatino gathering system, the outputs of which can be fed to the search engines. In the dissertation, I discuss techniques for four major components of the requisite information gathering system:
(1) Domain information extraction
(2) Topic expansion
(3) Topic-driven information gathering
(4) Text-classification system for web documents
I also discuss the performance of the prototype system, a search engine for XML, that I built to test the techniques.

더보기