Topic model-based temporal analysis method to enhance the integrative analysis of multiple resources
저자
발행사항
Seoul : Graduate School, Yonsei University, 2014
학위논문사항
학위논문(박사) -- Graduate School, Yonsei University : Dept. of Library and Information Science 2014.2
발행연도
2014
작성언어
영어
주제어
발행국(도시)
서울
형태사항
xiii, 144장 : 삽화 ; 26 cm
일반주기명
지도교수: Min Song
소장기관
Information resources such as academic papers, patents, and Web news articles serve as valuable materials for information analysis. By appropriately applying such resources, the technologies related to trend analysis and prediction can be performed more effectively. Although a number of analytical studies have been discussed, several common limitations exist in the methodology. First, most studies have focused on analysis using a single information resource. Second, despite several attempts to link heterogeneous resources, most works have attempted to connect patents and papers only using citation information. Third, no attempts have been undertaken to analyze comprehensively Web news articles considering a relationship with papers or patents. Thus, this study aims to provide a temporal analysis method to utilize papers, patents, and Web news articles in an integrated manner. To this end, the time gaps between multiple resources were analyzed by conducting text mining-based content analysis. Based on the premise that Web news articles are the most up-to-date, the following objectives were set. First, this study intended to analyze the characteristics of main topics of multiple resources and time-series trends using the topic modeling technique. Second, the proposed method aimed to determine whether the distinct time gaps occurred between multiple resources. Third, this study tried to reveal whether a distinct time difference phenomenon appeared across academic fields. If a time gap in a specific academic area is shorter, the subject of the academic area changes faster than any one of other academic areas. For data collection, papers and patents were collected from S&T portal sites for more than 14 and 12 years, respectively, whereas Web news articles for more than 12 years were also collected using a Web crawler. Apart from three representative resources, proceedings for about three years were collected. Proceedings were used only for pre-validation to evaluate performance of the proposed methods. To compare characteristics between subject areas, a total of 1.443 million documents were collected from the fields of computer science and medical science. Latent Dirichlet Allocation (LDA), a topic modeling technique highlighted in the text mining area as of late, was used to estimate the time gaps between multiple resources. Resources were conceptualized using a topic modeling technique based on the latent meaning. A time gap analysis was conducted by measuring and comparing topic similarity between two heterogeneous resources. Furthermore, the results of the time gap optimization and estimation were derived via the experiment. The interpreted results were then validated using two evaluation methods; first, through measuring the clustering tendency of terms based on statistical methods, and second, through content-centered analysis on the topic modeling technique. An evaluation method of topic modeling was employed to identify how the topic's subject changed according to time flow. The results of the time gap analysis are summarized as follows. First, topic modeling was effective in determining the content characteristics and time-series trends of papers, patents, and Web news. Second, the noteworthy time gap phenomenon was revealed using the time gap analysis based on topic modeling. Third, the experimental result of comparison between patents and papers showed that patents were more up-to-date compared with papers. That is, Web news articles had the highest order of the up-to-date property, followed by patents and papers. Fourth, results of the time gap optimization and estimation showed the time gap intervals of the medical science field were shorter than those of the computer science field. This finding showed that resources of the medical science field had more up-to-date property, and thus prompter disclosure to the public. Fifth, to evaluate the performance of topic modeling, the effect of the analysis was measured via computational methods by using a power-law exponent measurement. Sixth, content analysis of representative topics in Web news articles explained well the time difference phenomenon between computer and medical sciences. The contributions of this study to the information analysis area can be summarized in the following. First, the proposed method contributes to in-depth understanding of various resources by interpreting content characteristics and time-series trends. Second, representative resources for information analysis can be analyzed in an integrated manner; the proposed analysis method can be utilized for information analysis dealing with multiple resources. Third, as time gap characteristics between academic areas are understood, more precise information analysis can be achieved by conducting domain-specific analysis. In the future, multifaceted analyses can be derived by combining quantitative analysis methods and text mining-based approaches based on the proposed method. Furthermore, this temporal analysis method can be utilized to improve the performance of trend analysis and future prediction for the practical studies.
더보기분석정보
서지정보 내보내기(Export)
닫기소장기관 정보
닫기권호소장정보
닫기오류접수
닫기오류 접수 확인
닫기음성서비스 신청
닫기음성서비스 신청 확인
닫기이용약관
닫기학술연구정보서비스 이용약관 (2017년 1월 1일 ~ 현재 적용)
학술연구정보서비스(이하 RISS)는 정보주체의 자유와 권리 보호를 위해 「개인정보 보호법」 및 관계 법령이 정한 바를 준수하여, 적법하게 개인정보를 처리하고 안전하게 관리하고 있습니다. 이에 「개인정보 보호법」 제30조에 따라 정보주체에게 개인정보 처리에 관한 절차 및 기준을 안내하고, 이와 관련한 고충을 신속하고 원활하게 처리할 수 있도록 하기 위하여 다음과 같이 개인정보 처리방침을 수립·공개합니다.
주요 개인정보 처리 표시(라벨링)
목 차
3년
또는 회원탈퇴시까지5년
(「전자상거래 등에서의 소비자보호에 관한3년
(「전자상거래 등에서의 소비자보호에 관한2년
이상(개인정보보호위원회 : 개인정보의 안전성 확보조치 기준)개인정보파일의 명칭 | 운영근거 / 처리목적 | 개인정보파일에 기록되는 개인정보의 항목 | 보유기간 | |
---|---|---|---|---|
학술연구정보서비스 이용자 가입정보 파일 | 한국교육학술정보원법 | 필수 | ID, 비밀번호, 성명, 생년월일, 신분(직업구분), 이메일, 소속분야, 웹진메일 수신동의 여부 | 3년 또는 탈퇴시 |
선택 | 소속기관명, 소속도서관명, 학과/부서명, 학번/직원번호, 휴대전화, 주소 |
구분 | 담당자 | 연락처 |
---|---|---|
KERIS 개인정보 보호책임자 | 정보보호본부 김태우 | - 이메일 : lsy@keris.or.kr - 전화번호 : 053-714-0439 - 팩스번호 : 053-714-0195 |
KERIS 개인정보 보호담당자 | 개인정보보호부 이상엽 | |
RISS 개인정보 보호책임자 | 대학학술본부 장금연 | - 이메일 : giltizen@keris.or.kr - 전화번호 : 053-714-0149 - 팩스번호 : 053-714-0194 |
RISS 개인정보 보호담당자 | 학술진흥부 길원진 |
자동로그아웃 안내
닫기인증오류 안내
닫기귀하께서는 휴면계정 전환 후 1년동안 회원정보 수집 및 이용에 대한
재동의를 하지 않으신 관계로 개인정보가 삭제되었습니다.
(참조 : RISS 이용약관 및 개인정보처리방침)
신규회원으로 가입하여 이용 부탁 드리며, 추가 문의는 고객센터로 연락 바랍니다.
- 기존 아이디 재사용 불가
휴면계정 안내
RISS는 [표준개인정보 보호지침]에 따라 2년을 주기로 개인정보 수집·이용에 관하여 (재)동의를 받고 있으며, (재)동의를 하지 않을 경우, 휴면계정으로 전환됩니다.
(※ 휴면계정은 원문이용 및 복사/대출 서비스를 이용할 수 없습니다.)
휴면계정으로 전환된 후 1년간 회원정보 수집·이용에 대한 재동의를 하지 않을 경우, RISS에서 자동탈퇴 및 개인정보가 삭제처리 됩니다.
고객센터 1599-3122
ARS번호+1번(회원가입 및 정보수정)