Marina Fedotova
April 16, 2013
It happens that we are faced with the fact that the text is written in a language unfamiliar to us. Even trite, on imported goods written in an incomprehensible language, and you want to know the composition of the product, product. It is clear that we can easily distinguish known languages, we mean English, German. Then the question arises how to determine the language of the text, if you first meet with such characters.

Of course, you can ask specialists who understand the various languages ​​of the world, but why, if everyone now has access to the Internet, where you can find automatic language identifiers. This refers to special programs that define the language of the text. So, how to determine which language is used in the text using programs? We will try to explain the algorithm of action, that is, how programs define a particular language.

Any determinant of a language, by several words entered, can name a language. This is done by matching the words with the dictionary that is sewn into the program.More specifically, it happens like this: the text that you entered in the program field is broken down into words, which in turn pass the definition to match words from different languages, in the end you get a notification with a list of one or more languages, which fit Of course, the work of such a program is not as simple as it seems at first glance, because you need to take into account the lexical content of the text, the construction of the sentence, because these programs can only be used in an approximate analysis of the text. Here are the most frequently used programs: Polyglot 3000, Xerox, TextCat.

