科技论文关键词特征及其对共词分析的影响
Characteristics of Keywords in Scientific Papers and Their Impact on Co-word Analysis
  
中文关键词:共词分析 关键词特征 词频分布 情报学
英文关键词:co-word analysis , keyword characteristics , word frequency distribution , information science
基金项目:国家自然科学基金资助项目“数字图书馆社区的知识聚合与服务研究”(项目编号: 71273197 )
作者单位
胡昌平 Center for Studies of Information Resources of Wuhan University 
陈果 Center for Studies of Information Resources of Wuhan University 
摘要点击次数: 2783
全文下载次数: 152
中文摘要:
      针对传统共词分析中高频词共现矩阵的构建方法提出了一些疑问,包括:抽取高频词作为分析对象的可靠性?高频词矩阵对领域内重要共现关系的保留程度?关键词的语义类型特征和关键词缺失可能带来的影响? 通过实证数据揭示了科技论文的关键词词频?共现关系?语义类型的分布特征,并分析了它们对共词分析方法的影响,包括:基于关键词的共词分析只能分析热门知识节点,共词网络实质上是建立在不稳定的单次关联基础之上,而高频词矩阵则会丢失大量重要的共现关系,这些问题是由关键词的语义类型特征决定的,该特征是实现词语间差异化乃至语义化处理的重要切入点? 另外,本文在对比关键词增补前后的共词矩阵后发现,增补关键词实质上无法优化高频词矩阵对所分析领域的代表性? 在结尾部分,提出了两种可尝试的思路:一是结合关键词频次和共现关系强度抽取分析对象;二是以关键词语义类型为维度构建多维共现矩阵以更好地挖掘多种语义关联?
英文摘要:
      This paper raises some doubts about the traditional co-word analysis methods , including the reliability of high-frequency keywords extraction , the retention rate of important co-occurrence relations in the high-frequent word matrix ,the possible impact of keyword ’ s semantic feature and missing keywords. Through the analysis of a real scientific publication dataset , we revealed its word frequency distribution , co-occurrence distribution and semantic feature. We also find their impacts on co-word analysis , including : keywords based co word analysis can only show the research hotspots and relations among them ; nearly a half of the important co-occurrence relations is lost if only using high-frequent keywords to generate matrix ; the semantic information of keyword could be an important feature for the differentiation and semantization of keywords. Considering additional keywords from publication title cannot help to improve the representativeness of the high-frequent word matrix to the whole knowledge network. In the conclusion , we propose two possible methods for improvement : one is to select keywords by combining their frequency and intensity of co-occurrence relationship ; the other is to construct multi-dimensional co-occurrence matrix in order to differentiate multiple semantic associations.
胡昌平,陈果.科技论文关键词特征及其对共词分析的影响[J].情报学报,2014,(1):23~32
查看全文  查看/发表评论  下载PDF阅读器
关闭
Copyright © 2008 《情报学报》编辑部 地址:北京市三里河路54号 
邮编:100045 电话:010-68598273,010-68598285 E-mail: qbxb@istic.ac.cn