@prefix prodottidellaricerca: . @prefix istituto: . @prefix prodotto: . istituto:CDS048 prodottidellaricerca:prodotto prodotto:ID112965 . @prefix pubblicazioni: . @prefix unitaDiPersonaleEsterno: . unitaDiPersonaleEsterno:ID17591 pubblicazioni:autoreCNRDi prodotto:ID112965 . @prefix modulo: . modulo:ID2074 prodottidellaricerca:prodotto prodotto:ID112965 . @prefix rdf: . @prefix retescientifica: . prodotto:ID112965 rdf:type retescientifica:ProdottoDellaRicerca , prodotto:TIPO1303 . @prefix rdfs: . prodotto:ID112965 rdfs:label "Statistical profiling of Italian L2 texts: competence and native language (Comunicazione a convegno)"@en . @prefix xsd: . prodotto:ID112965 pubblicazioni:anno "2010-01-01T00:00:00+01:00"^^xsd:gYear . @prefix skos: . prodotto:ID112965 skos:altLabel "
Frontini F. (2010)
Statistical profiling of Italian L2 texts: competence and native language
in 20th Annual Conference of the European Second Language Association, Reggio Emilia
"^^rdf:HTML ; pubblicazioni:autori "Frontini F."^^xsd:string ; pubblicazioni:note "In: Eurosla 20 - 20th Annual Conference of the European Second Language Association (Reggio Emilia, 1 -4 September 2010)."^^xsd:string ; pubblicazioni:descrizioneSinteticaDelProdotto "In Granger & Rayson (1998) and Aarts & Granger (1998) the idea of applying text categorization techniques to the analysis of interlanguage was first introduced. Given the hypothesis that each sample of interlanguage is characterized by a unique matrix of frequencies of various forms, profiling techniques can be applied to L2 corpora to extract groups with consistently similar behaviour. In Aarts & Granger (1998) PoS n-gram based profiling was applied to L2 English samples to identify distributional correlates for the different L1s of each informant. In Nerbonne & Wiersma (2006) instead PoS tri-gram distribution was used to capture syntactic deviations within informants with same L1. This paper addresses the question of whether PoS tri-gram profiling in L2 texts is more sensitive to the L1 of informants or to their level of competence. Three collections of L2 Italian texts are used, having been produced by informants with Chinese, French and German L1s respectively; the texts are homogeneous for what concerns mode and theme of elicitation and are grouped for level of competence by using developmental sequences of morpho-syntactic features such as those produced by Bartning and Schlyter (2004) for French. Using the statistical methodology described in Frontini, Lynch & Vogel 2008 the texts are clustered by similarity of PoS distribution, and a statistical evaluation is performed in order to compare the homogeneity of the texts when grouped by L1 based groups with that of the texts when grouped by Level. Subsequently Multidimensional Scaling and Principal Component Analysis are used to give a visual representation of the homogeneity of the samples, and to extract the tri-grams that are maximally responsible for the differences among groups. In this way it is possible to assess whether PoS distribution is more sensitive to L1 or to Competence level, and to identify PoS sequences that may constitute a fingerprint for a specific L1 or Level."^^xsd:string ; pubblicazioni:titolo "Statistical profiling of Italian L2 texts: competence and native language"^^xsd:string ; prodottidellaricerca:prodottoDi modulo:ID2074 , istituto:CDS048 ; pubblicazioni:autoreCNR unitaDiPersonaleEsterno:ID17591 . @prefix parolechiave: . prodotto:ID112965 parolechiave:insiemeDiParoleChiave . parolechiave:insiemeDiParoleChiaveDi prodotto:ID112965 .