Genres on the Web
The description, automatic identification and further processing of web genres is a novel field of research in computational linguistics, NLP and related areas such as text-technology, digital humanities and web mining. One of the driving forces behind this research is the idea of genre-enabled search engines which enable users to additionally specify web genres that the documents to be retrieved should comply with (e.g., personal homepage, weblog, scientific article etc.). This book offers a thorough foundation of this upcoming field of research on web genres and document types in web-based social networking. It provides theoretical foundations of web genres, presents corpus linguistic approaches to their analysis and computational models for their classification. This includes research in the areas of web genre identification, web genre modelling and related fields such as genres and registers in web based communication social software-based document networks web genre ontologies and classification schemes text-technological models of web genres web content, structure and usage mining web genre classification web as corpus.
The book addresses researchers who want to become acquainted with theoretical developments, computational models and their empirical evaluation in this field of research. It also addresses researchers who are interested in standards for the creation of corpora of web documents. Thus, the book concerns readers from many disciplines such as corpus linguistics, computational linguistics, text-technology and computer science.
The first comprehensive collection in the upcoming, challenging field of natural language processing and information retrieval researchIntegrates text-technology, corpus linguistics and machine learningProvides a methodological bridge between many related disciplines in the area of text and language technologyThe wide range of research areas such as computational linguistics, text-technology and data mining will appeal to a broad audience beyond any of these fields in isolation