ON-LINE DATABASE of SCIENTIFIC PAPERS
10.5593/sgemsocial2017/32/S14.115

ON THE DIFFERENCES BETWEEN ASSOCIATION MEASURES FOR AUTOMATIC COLLOCATION EXTRACTION: EVALUATION AGAINST DICTIONARIES

M. Khokhlova
Wednesday 18 October 2017 by Libadmin2017

References: 4th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2017, www.sgemsocial.org, SGEM2017 Conference Proceedings, ISBN 978-619-7408-19-5 / ISSN 2367-5659, 24 - 30 August, 2017, Book 3, Vol 2, 887-892 pp, DOI: 10.5593/sgemsocial2017/32/S14.115

ABSTRACT

Collocation extraction is a prominent task in natural language processing, its results are important in various areas of applied linguistics. The research focuses on a comparison between seven statistical measures based on the large Russian ruTenTen corpus. The paper examines the bigrams with a number of Russian high-frequency nouns that were extracted by the given measures. The analysis is organized in two ways. First, we ranked the lists of bigrams obtained by each measure and evaluated the results against the Russian dictionaries identifying automatically extracted and manually collected collocations. The second experiment involved the comparison between each pair of measures in order to determine to what extent they produce the same results. The results show that dictionary collocations have higher rankings in the lists. Also the produced bigrams can be considered as collocations by experts and thus may enrich dictionaries. Several measures prove to demonstrate a relative interchangeability giving the overlapping results.

Keywords: collocation, corpora, statistics, association measures, evaluation

PAPER DOI: 10.5593/sgemsocial2017/32/S14.115 ; ON THE DIFFERENCES BETWEEN ASSOCIATION MEASURES FOR AUTOMATIC COLLOCATION EXTRACTION: EVALUATION AGAINST DICTIONARIES

35 EURO ADD TO CART


Your cart ( items)

Subtotal:
Tax cost ():
Shipping cost:
Total:
Checkout Empty