logo
ResearchBunny Logo
Introduction
This paper focuses on the Genealogies of Knowledge (GoK) project, which explores the role of translation and mediation in shaping the historical evolution of scientific and political concepts. Traditional scholarly work in this area is often 'bottom-up,' relying heavily on individual scholars' memory and interpretation. The GoK project utilizes a 'top-down' corpus-based approach, leveraging computational tools and visualization to analyze large volumes of text. This iterative process allows researchers to move between broad overviews and detailed analysis, with visualization tools aiding in identifying patterns and providing visual explanations. The paper aims to document the co-design and development of text visualization tools for the GoK project, establishing general methods for tool development in interdisciplinary contexts and aiming to bridge the gap between 'developers' and 'users' of such tools.
Literature Review
The paper reviews existing digital humanities visualization tools, particularly those related to concordance analysis. While many visualization techniques exist, concordance-based visualization, encoding lexical and grammatical co-occurrence patterns around a keyword, is relatively rare in digital humanities, despite its extensive use in corpus linguistics and translation studies. The authors examine various approaches, including tree-based visualizations (Word Tree, Double Tree), which maintain linear text structure but struggle with representing quantitative information effectively. Other methods like interHist, Corpus Clouds, and Structured Parallel Coordinates are evaluated based on their ability to represent the attributes of a Keyword-in-Context (KWIC) concordance list. The review highlights challenges in visualizing both qualitative (readability of text fragments) and quantitative (positional frequencies and collocation patterns) aspects of concordance analysis simultaneously.
Methodology
The development of GoK visualization tools followed an iterative co-design process. This involved: 1. **Analysis of Published Methodology:** A hierarchical task analysis of John Sinclair's work on concordance analysis identified key actions (e.g., estimating frequency, reading context, identifying collocations). This informed the design of the tools. 2. **Conceptual Data Model:** A conceptual model of the KWIC concordance list was developed to formalize data structures and relationships, guiding the design of visualizations. This model incorporated qualitative (word order, readability) and quantitative (positional frequency, collocation strength) aspects of the data. 3. **Analysis of Existing Visualizations:** Existing visualizations were evaluated based on their ability to represent the attributes of the KWIC conceptual model. The authors used Mackinlay's ranking of visual variables to guide the selection and justification of visual encodings. 4. **Establishing Initial Requirements:** Two GoK researchers provided lists of questions they wanted to answer about a corpus. This domain characterization, along with follow-up discussions, helped identify key requirements for the software. The questions highlighted needs for metadata integration, efficient collocation analysis, frequency comparison across subcorpora, and methods for visualizing temporal spread. 5. **Software Prototyping:** Low-fidelity prototyping and user interface sketching were used to communicate design ideas with the research team. This iterative process led to the development of the GoK tools. 6. **Observational Research:** The authors observed GoK researchers using the software and conducted interviews to understand their workflows and challenges. Two detailed case studies (one on the concept of democracy, another on 'the people') are presented, illustrating how researchers use the tools.
Key Findings
The GoK software consists of: 1. **A basic concordancer:** Provides standard KWIC display, frequency lists, and metadata management. 2. **Concordance Mosaic:** Summarizes a KWIC display in a space-filling tabular format, showing positional frequencies or collocation statistics. Allows interactive filtering and exploration. 3. **Concordance Tree:** Displays the left or right context of a concordance as a tree, preserving sentence structure and showing positional frequencies. 4. **Metafacet:** Provides faceted summaries of metadata, allowing for interactive filtering of the concordance and Mosaic. Visualizes keyword distribution across different metadata attributes. 5. **Frequency Comparison Tool:** Allows visual comparison of frequency lists across different subcorpora, enabling statistically valid comparisons even with corpora of different sizes. The case studies reveal the importance of: * Visualizations to identify overall patterns and guide qualitative analysis. * Combining qualitative and quantitative methods. * The need for tools that support the analysis of concordance lists through the lens of metadata. * The importance of detailed documentation written collaboratively by developers and users.
Discussion
The iterative design process highlighted the blending of data representation, statistical elements, and qualitative interpretation inherent in corpus-based research. The GoK tools, particularly Mosaic, facilitate the identification of patterns which may be missed in traditional sequential reading. While some statistical measures are employed, the analysis remains fundamentally qualitative and interpretative. The GoK methodology diverges from traditional corpus linguistics methods in its emphasis on comparative analysis across subcorpora defined by metadata. Bias in the corpus is acknowledged as unavoidable, and the tools help researchers identify and address potential sources of bias. The visualizations also play a crucial role in communicating research findings effectively. The need for clear, user-friendly documentation, written collaboratively by developers and users, is emphasized.
Conclusion
The co-design process resulted in visualization tools that effectively support corpus-based research in the humanities. The tools address the challenges of integrating qualitative and quantitative analysis and enhance the exploration of large text corpora. Future research should focus on expanding the range of statistical measures and developing more advanced methods for visualizing complex relationships within large datasets. Improved collaborative practices between developers and humanities scholars are essential for fostering innovation and wider adoption of such tools.
Limitations
The study is limited to the specific context of the GoK project. While the methods and tools are generally applicable, their effectiveness in other research contexts might vary. The case studies presented offer a snapshot of the researchers' workflows and may not fully capture the complexities of their entire analytical process. Further investigation is needed to explore the generalizability of the findings and the tools' applicability across different research questions and corpora.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny