Mainly as a purpose of in order to provide data for Uyghur Kazak Kirghiz languages in some research fields such as NLP, Speech recognition, Speech synthesis, Machine translation, Information retrieval, Uyghur Intelligent Monitoring as well as the Uyghur Public Opinion Analysis. In the process of design and implementation of software, referred to the syntax rules of Uyghur Ka⁃ zak Kirghiz languages. Introducing these three languages International coding, In addition to according to current webpage’s fea⁃ tures to analyze structure of webpage and judging the text to development data collector Uyghur Kazak Kirghiz multilingual pure text from web. Finally achieved for minority NLP research to build corpora prepared a large corpus.
[PDF Chinese Paper]