Interdisciplinary Studies
Multi-scale methods for reconstructing collective shapes of digital diasporas
Q. Lobbé
The paper addresses how to meaningfully characterize an emerging, interdisciplinary research community—digital diasporas—and its representative publications, and whether a family of methods can reconstruct both the field’s landscape and its core research objects. It situates the study in the context of the ICT-driven transformation of migrant communication, identity, and organization, noting that digital traces from everyday life and crises have reshaped diasporic practices and research approaches. The purpose is to apply a macro-to-micro, multi-scale investigation: (1) map the scientific landscape of digital diasporas; (2) trace the historical evolution of a socio-technical sub-branch focused on online diasporic representations; and (3) demonstrate a micro-level reconstruction of an extinct online collective (Moroccan migrants). The importance lies in promoting multi-scale reconstruction methods that bridge social and computational sciences to understand complex, stigmergic, digitally mediated diasporic phenomena.
While not presented as a standalone section, the article reviews key strands in migration and digital diaspora scholarship. It references foundational concepts on connected migrants, collective intelligence, stigmergy, and the role of ICTs in shaping diasporic practices. It engages prior cartographic work such as the e-Diasporas Atlas (mapping migrant websites via hyperlinks) and situates digital diasporas within broader interdisciplinary trends identified in migration studies. Through a semantic mapping of 1,562 publications (WoS and HAL), it synthesizes six research clusters spanning social network theory, youth and identity, definitions and theorization of digital diasporas, digital communication and multimodality, social media and political mobilization/humanitarian data, and socio-technical, web-native methods (e.g., e-Diasporas). This review emphasizes the hybridization of methods (qualitative and computational), cross-language/platform concerns, and evolving digital-born sources (web archives, social media), citing extensive prior work across anthropology, media studies, sociology, and computer science.
The study introduces and applies three multi-scale reconstruction methods following an O → R → V workflow (Object → Reconstruction → Visualization):
- Semantic maps: From a curated corpus, extract terms and compute distributional similarity to build a paradigmatic landscape; visualize with Gephi to reveal clusters of related concepts and communities.
- Phylomemies: Within Gargantext, semi-manually create a map list of terms, then use text-mining to find term groups per time slice and intertemporal matching to reconstruct kinship relations (foliation), yielding an evolutionary representation of topics/branches over time.
- Web fragments: Define patterns over archived web pages to extract coherent semantic/syntactic units with metadata (author, title, date), reconstruct temporal structures from heterogeneous web archives, and query/visualize subsets via maps or phylomemies. Data and procedures by section:
- Scientific landscape (semantic map):
- Corpus construction: Query Web of Science (primary) and English-language section of HAL using a broad digital-diasporas query; harvested 2,106 records (titles/abstracts), manually pruned to 1,562 documents (1980s–April 2020, English only).
- Term selection: 325 terms/expressions selected in Gargantext from metadata; computed distributional similarity (paradigmatic proximity) to build the semantic network; visualized in Gephi with PageRank sizing and community detection, yielding six clusters.
- Emergence of a socio-technical branch (phylomemy):
- Reused the 1,562-paper corpus and term list; distributed documents into constant time periods in Gargantext.
- Algorithms extracted frequently co-used term groups per period; intertemporal matching built kinship links across periods.
- Result: a phylomemy with 222 terms, 156 groups, 26 branches; visualized and manually annotated to interpret the diachronic evolution (from 1980s to present).
- Reconstruction of an extinct online migrant collective (web fragments):
- Object: Moroccan e-Diaspora blogosphere mapped in 2008 (156 sites; 47 blogs).
- Archives: Extracted INA web archives (2010–2014, DAFF format) for targeted blogs; supplemented with Internet Archive captures (2014–2018).
- Fragmentation: Applied a dedicated engine to segment hundreds of thousands of records into web fragments; timestamped fragments (enabling queries back to 2006 by reconstructing continuity).
- Pattern matching: Used HTML/CSS pattern definitions and regular expressions to detect embedded social media traces (Facebook comments, Twitter widgets, YouTube links) within archived blog pages and stylesheets.
- Analysis: Temporally sorted fragments; combined automated extraction with manual validation and network reconstruction (hyperlinks and Twitter follower/following) to compare 2008 blog network to 2018 social media network (primarily Twitter). Computed network density and inferred follower geographies via Twitter API tweet-location data.
- Digital diasporas is cross-disciplinary and fragmented: The semantic map identified six major clusters spanning social network analysis of migrants (No. 1), youth/identity/self-spaces and smartphones (No. 2), theorization/definition of digital diasporas (No. 3), digital communication and multimodal semiotics (No. 4), social media and political mobilization/humanitarian data (No. 5), and socio-technical, web-native methods around the e-Diasporas Atlas (No. 6).
- Diachronic evolution (phylomemy): Progressive integration of ICT data into migration studies from the 1980s–1990s; 2000s digital shift enriched analyses of social networks and trajectories; late-2000s growth in cross-language topics; recent surge in digital self-spaces/identities; emergence and consolidation of a socio-technical branch focused on the diasporic web and web-native methods (e-Diasporas).
- Moroccan e-Diaspora case (web fragments): • 2008 blogosphere: 156 websites (47 blogs; mostly French; categories: associations/NGOs, institutional/governmental, individual blogs). • By 2018: 19 dead blogs, 23 abandoned, 5 still alive among the 47 blogs, indicating an extinct blog-based collective. • Digital migration: Identified 33 active social media accounts in 2018 linked to 20 dead blogs (mainly Twitter and Facebook; some YouTube and Pinterest), evidencing a shift of activity from blogs to social platforms. • Network structure: Overlaying 2008 hyperlinks with 2018 Twitter follower/following ties showed increased internal connectivity; density rose from 0.16 (blogs) to 0.24 (Twitter). • Diasporic continuity: Twitter follower geographies remained diasporic (Morocco and at least 10 other nationalities). For larbi.org, at least 15% of 2008 commenters (covering 26% of posts) followed the 2018 Twitter account, indicating community carryover and preservation of identities (pseudonyms, avatars, content).
Multi-scale reconstruction methods serve as quali-quantitative instruments that bridge social and computational sciences to study complex, stigmergic, digitally mediated collectives. Implemented within socio-technical protocols, they support iterative analysis (O → R → V) that maintains traceability to original sources, enabling both macro-level overviews and micro-level inspections. The study demonstrates applicability across domains (semantic maps, phylomemies, web fragments) while warning against black-box effects and emphasizing transparency, comparable measures, and reproducibility. The Moroccan case shows that apparent extinctions of online collectives (blogs) can conceal migrations to new platforms (Twitter), with communities and connectivity preserved or even strengthened. The methods can complement qualitative approaches (e.g., interviews) and suggest broader relevance to contemporary phenomena such as post-COVID-19 digital presence, highlighting the need for cross-disciplinary translation and shared methodological vocabularies.
The paper charts the rise of digital diasporas as a cross-disciplinary domain and demonstrates a multi-scale methodological toolkit—semantic maps, phylomemies, and web fragments—to reconstruct both the field’s scientific landscape and the evolution of a specific online collective. It shows interdisciplinarity and dynamic topic shifts within the field, and, through the Moroccan case, reveals a platform migration from blogs to social media with preserved diasporic connectivity. These methods extend beyond digital diasporas, offering scalable, iterative, and transparent ways to analyze large textual and web-archival corpora. Future work should deepen socio-technical protocols, foster cross-disciplinary translation, improve handling of multimedia and multilingual data, and integrate qualitative validation (e.g., interviews) to explain drivers of digital migrations and transformations.
- Corpus constraints: English-only publications; reliance on WoS and HAL (access limitations, potential coverage bias); poorer metadata pre-1990; thus not fully representative.
- Method constraints: Gargantext’s limited cross-language handling; reliance on semi-manual term selection and expert annotations; potential software black-box issues.
- Web archives: INA/DAFF crawling and storage may introduce quality, coherence, and timestamping issues; web fragments framework is a prototype requiring advanced engineering; images and videos not analyzed.
- Exhaustiveness: Social media linkage detection depends on blogs having embedded or linked accounts; accounts not referenced in archived pages may be missed; thus findings on platform migration are indicative rather than exhaustive.
- Causality: The study cannot definitively attribute causes for the blog-to-social migration (e.g., Arab Spring, platform trends) without further qualitative inquiry.
Related Publications
Explore these studies to deepen your understanding of the subject.

