Filters








15 Hits in 1.1 sec

Issues of POS Tagging of the (Diachronic) Corpus of Czech : Preparing a Morphological Dictionary

Anna Řehořková
2017 Jazykovedný Časopis  
Many important decisions concerning the part-of-speech categorization remain unexplained in the current practice, only reported in corpus manuals.  ...  Focused mainly on function words in Czech, we discuss the possibilities of the POS tagging of the inherently ambiguous category of particles and we introduce criteria for distinguishing particles from  ...  Therefore, to estimate the ambiguity rate of the items in the P-list and to map the approach to tagging particles in the corpus of present-day Czech, we tested the P-list against the CNC -SyN2015 corpus  ... 
doi:10.1515/jazcas-2017-0041 fatcat:abrltrydkjgaxddlvpsxfocfua

Gender-Specific Adjectives in Czech Newspapers and Magazines

Adrian Jan Zasina
2019 Jazykovedný Časopis  
This study is one of the few studies dealing with gender in the Czech language using corpus methods. It focuses on the issue of gender in Czech journalistic texts from the years 2010–2014.  ...  This analysis is based on adjectival collocations of the lexemes muž 'man' and žena 'woman' and their semantic categorization. The research uses a journalistic part of the SYN2015 corpus.  ...  DATA AND RESEARch qUESTIONS My research was provided on the material of the SyN2015 corpus ( [12] , [13] ), a collection of contemporary written Czech texts of the last five-year period (2010-2014).  ... 
doi:10.2478/jazcas-2019-0060 fatcat:oeegbkiqyzetphmydzvh7ykd7q

Colloquiality in the style of contemporary Czech journalistic writing

2019 Media Linguistics  
The main part of the article describes the occurrence of two types of colloquial lexical items in contemporary Czech journalistic texts.  ...  Contemporary Czech journalistic writing increasingly makes use of language means which are primarily associated with spoken, spontaneous and private communication.  ...  Some quotations were obtained from the representative synchronous corpus of Czech language SYN2015 [Křen et al. 2015] , which comprises of 100 million tokens (from texts published between 2010 and 2014  ... 
doi:10.21638/spbu22.2019.104 fatcat:2pa4zfpvkvhkpmzlvlggyj235m

Still having a conflict potential? German and Hungarian toponyms in the Czech and Slovak national corpora texts

Jaroslav David, Tereza Klemensová
2019 Miscellanea Geographica: Regional Studies on Development  
This concerns their thematization, which is illustrated on the Czech National Corpus and the Slovak National Corpus materials, and on the 1990s discussions about their restoration.  ...  The paper focuses on German forms of place names in Czechia and Slovakia, and Hungarian forms of place names in Slovakia, especially on their revitalization and perception after 1989.  ...  [Reality Creation through Language -A Qualitative Analysis of Modern Czech Texts].  ... 
doi:10.2478/mgrsd-2019-0005 fatcat:jln265fby5dmpaqksw5nkmzaam

Merging Professional and Collaborative Lexicography: The Case of Czech Neology

Michal Škrabal, Martin Kavka
2021 International Journal of Lexicography  
The objective data from a monitor corpus of Czech is used in contrast with the initial dataset and thereby leads to some open questions, especially with regards to the extent to which amateur and professional  ...  A pair of case studies is presented concerning two thematically defined groups of recent Czech neologisms: those abusing the Czech ex-president V.  ...  Acknowledgements This study was supported by the programme Progres Q08 "Czech National Corpus" implemented at the Faculty of Arts, Charles University and by the European Regional Development Fund-Project  ... 
doi:10.1093/ijl/ecab003 fatcat:wm5yiunkn5ajzgvmkkrbfwx5ky

Selected constructions with nouns denoting emotions and metaphors of emotions in Czech

Lucie Saicová Římalová
2020 Prace Filologiczne  
Artykuł łączy teorię walencyjną z kognitywnym podejściem do języka i jest oparty na danych z synchronicznego korpusu pisanego języka czeskiego SYN2015.  ...  'to get oneself into frenzy'. 5 SYN2015 is a representative referential corpus of contemporary written Czech.  ...  The data for the analysis were retrieved from the SYN2015 corpus of the Czech National Corpus (Křen et al. 2015) 5 , which contains the following numbers of instances of each lemma: vztek 3,375; hněv  ... 
doi:10.32798/pf.671 fatcat:qezwuqqdszcqbmxifiw4idikvq

No Keyword is an Island: In search of covert associations [article]

Václav Cvrček, Masako Ueda Fidler
2021 arXiv   pre-print
To showcase the advantages of MBA in "re-contextualizing" keywords within the discourse, a pilot study on the topic of migration was conducted contrasting anti-system and center-right Czech internet media  ...  MBA is a data mining technique used originally in marketing that can reveal consistent associations between items in a shopping cart, but also between keywords in a corpus of many texts.  ...  Any errors and inconsistencies that remain are of course the authors' responsibility.  ... 
arXiv:2103.17114v2 fatcat:hs2xcfqpuvbhtav5fw4afepxky

Structure of second-grade diminutives in Czech and Slovak. A corpus-based synchronic-diachronic analysis

Renáta Gregová
2021 Vestnik of Samara University. History, pedagogics, philology  
The aim of this paper is to verify this idea on the basis of the analysis of data from Czech and Slovak. The DIM2 for the analysis were excerpted from the corpora.  ...  Neither Czech nor Slovak current sources apprehend diminutive markers as combinations of primary and secondary diminutive suffixes.  ...  Consequently, the Czech second-grade diminutives analyzed in this paper were extracted from the Czech National Corpus, version SYN2015 (Křen et al. 2015) and the analyzed Slovak second-grade diminutives  ... 
doi:10.18287/2542-0445-2021-27-3-111-117 fatcat:qaztg4qs2vaidlbl43spgkq2pm

El corpus paralelo como herramienta para explorar los elementos únicos en el checo

Michaela MARTINKOVÁ, Markéta JANEBOVÁ
2019 CLINA Revista Interdisciplinaria de Traducción Interpretación y Comunicación Intercultural  
According to Chlumská and Richterová (2014a, 17), 34 % of the books published in the Czech Republic in 2012 were translations.  ...  Having previously investigated the correspondences of the Czech polyfunctional particle prý in English (Martinková and Janebová 2017), we now turn to Spanish.  ...  Translated texts have, for example, traditionally been included in the SYN corpora (corpora of contemporary written Czech) of the Czech National Corpus.  ... 
doi:10.14201/clina2019527798 fatcat:23e7icapa5bynbtcrvg6woykwi

Event and degree numerals: Evidence from Czech [chapter]

Mojmír Dočekal, Marcin Wągiel
2019 Zenodo  
In this paper, we bring in novel data concerning the distribution and semantic properties of two classes of adverbs of quantification in Czech, i.e., event numerals such as dvakrát 'twice/two times' as  ...  We propose that degree numerals target values on a provided scale and are, hence, best analyzed as predicates of degrees whereas event numerals have a more general semantics which primarily allows for  ...  We gratefully acknowledge that the research was supported by a Czech Science Foundation (GAČR) grant to the Department of Linguistics and Baltic Languages at the Masaryk University in Brno (GA17-16111S  ... 
doi:10.5281/zenodo.2554021 fatcat:px7gduz5dnb37gm65icmsltg3m

Modelling Morphographemic Alternations in Derivation of Czech

Magda Ševčíková
2018 Prague Bulletin of Mathematical Linguistics  
The present paper deals with morphographemic alternations in Czech derivation with regard to the build-up of a large-coverage lexical resource specialized in derivational morphology of contemporary Czech  ...  After a summary of available descriptions in the Czech linguistic literature and Natural Language Processing, an extensive list of alternations is provided in the first part of the paper with a focus on  ...  GA16-18177S of the Czech Science Foundation.  ... 
doi:10.2478/pralin-2018-0001 fatcat:aohhezgcvbe5fl3pklgax2n33u

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

Klára Jágrová, Michael Hedderich, Marius Mosbach, Tania Avgustinova, Dietrich Klakow
2021 Frontiers in Psychology  
To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences.  ...  However, the role of context for the intelligibility of target words in sentences was subject to very few studies.  ...  The CS LMs were trained on the SYN v4 version of the Czech National Corpus (Křen et al., 2016) , a collection of contemporary written CS containing ∼4.3 billion tokens.  ... 
doi:10.3389/fpsyg.2021.662277 fatcat:gzifgxmjkvfjte6riozeuajnu4

Vliv subjektivních stylotvorných faktorů na styl současných českých odborných textů: k vývojovým proměnám stylu současných českých teoreticky odborných textů

Martin Schacherl
2021 Stylistyka  
současné češtiny, frekvence a typický úzus excerpovaných jazykových jevů byly ověřeny v žánrově vyváženém korpusu SYN2010, v němž převažují texty z let 2005-2009, a v reprezentativním korpusu psané češtiny SYN2015  ...  The paper presents the quantitative characteristics of selected expressional means used in contemporary Czech written scientific discourse through analysing representative material from selected monographs  ...  discourse written in Czech The aim of this study is to demonstrate the impact of subjective stylistic factors on the style of contemporary theoretical professional communication written in the Czech language  ... 
doi:10.25167/stylistyka30.2021.14 fatcat:6gmq2hoxbbgp5pu3govxocmt54

Vliv subjektivních stylotvorných faktorů na styl současných českých odborných textů: k vývojovým proměnám stylu současných českých teoreticky odborných textů

Martin Schacherl
2021 Stylistyka  
současné češtiny, frekvence a typický úzus excerpovaných jazykových jevů byly ověřeny v žánrově vyváženém korpusu SYN2010, v němž převažují texty z let 2005-2009, a v reprezentativním korpusu psané češtiny SYN2015  ...  The paper presents the quantitative characteristics of selected expressional means used in contemporary Czech written scientific discourse through analysing representative material from selected monographs  ...  discourse written in Czech The aim of this study is to demonstrate the impact of subjective stylistic factors on the style of contemporary theoretical professional communication written in the Czech language  ... 
doi:10.25167/10.25167/stylistyka30.2021.14 fatcat:46rnnedl6bbxzhmqy5bfgkrmpi

Proceedings of the workshop on challenges in the management of large corpora and big data and natural language processing (CMLC-5+BigNLP) 2017 including the papers from the web-as-corpus (WAC-XI) guest section. Birmingham, 24 july 2017

S.N.
2017
This publication was written with the support of the Specific University Research provided by the Ministry of Education, Youth and Sports of the Czech Republic. 6 The tokenisation of the BNC had to be  ...  changed to the same way the web corpus was tokenised in order to make the counts of tokens in both corpora comparable. 7 The comparison with the BNC also revealed there are words related to the modern  ...  The IPRcleared Corpus of Contemporary Written and Spoken Romanian Language. In: Calzolari, Nicoletta et al.  ... 
doi:10.5167/uzh-139700 fatcat:dpzmrcslt5gvlk2myhh5db626e