Abstract
While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems
— ‘information overload’ is a commonly discussed condition. Tasks such
as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from
another language are now commonplace. These tasks need not cause
stress and feelings of overload: the human intellectual capacity is not
the problem. Rather, the computational interfaces to linguistic data are
problematic — there exists a Linguistic Visualization Divide in the current
state-of-the-art. Through five design studies, this dissertation combines
sophisticated natural language processing algorithms with information
visualization techniques grounded in evidence of human visuospatial
capabilities.
The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat
and automatic speech recognition. By providing explicit indications of
algorithmic confidence, the visualization enables informed decisions about
the quality of computational outputs.
Two design studies explore the space of content analysis. DocuBurst
is an interactive visualization of document content, which spatially organizes
words using an expert-created ontology. Broadening from single
documents to document collections, Parallel Tag Clouds combine keyword
extraction and coordinated visualizations to provide comparative
overviews across subsets of a faceted text corpus.
Finally, two studies address visualization for natural language processing
research. The Bubble Sets visualization draws secondary set relations
around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when
assigning visual encodings to data. Expanding considerations of spatial
rights, we present a formalism to organize the variety of approaches
to coordinated and linked visualization, and introduce VisLink, a new
method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support
high level comparison between visualizations.
From the design studies we distill challenges common to visualizing
language data, including maintaining legibility, supporting detailed reading,
addressing data scale challenges, and managing problems arising
from semantic ambiguity. |