Outcome: To analyse frequency distribution of the characters, compound characters, vowels, consonants, part vowels and part consonants for a Kannada text documents like press report, general fiction, Kannada poems.To figure out the most happening event and the most popular person of particular period in press reports and also to determine frequently preferred words used by the writers in their works.
Description: Kannada is one of the classical languages in India and there are many efforts made to adopt Kannada in digital world to develop Kannada as tech savvy language. The structure of Kannada script is distinct than any other language as the characters in Kannada word are isolated. Hence processing and summarizing of Kannada document requires several steps. Summaries of Kannada documents can help find the right information and are particularly effective when the document base is very large and keywords are closely associated to a document. In this project, we propose a novel approach to obtain summary of any Kannada documents like press reports, fictional works etc. The input document taken from web corpus is disassembled into its constituent words which allow us to search for well defined patterns and determine the words that are most frequently used. After obtaining and categorizing the words, the document can be summarized.