logo
ResearchBunny Logo
Situated data analysis: a new method for analysing encoded power relationships in social media platforms and apps

Health and Fitness

Situated data analysis: a new method for analysing encoded power relationships in social media platforms and apps

J. W. Rettberg

Discover the groundbreaking concept of situated data analysis as proposed by Jill Walker Rettberg. This innovative method delves into the nuanced ways social media and digital apps, like Strava, manage data representation and processing. Explore the intricate power dynamics between platforms, users, and society that shape our digital experiences.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses how personal data collected by platforms and apps is constructed, presented, and used to shape behavior and power relations. It introduces “situated data analysis” as a method grounded in Haraway’s notion of situated knowledges to examine how the same data is differently framed for different audiences and purposes. The author situates this within broader concerns about surveillance capitalism and environmentality: while disciplinary power is internalized by individuals through self-tracking norms, environmental power is embedded in infrastructures and interfaces that make some behaviors easy and others hard. Using Strava as a case, the paper asks how data moves along a representational–operational continuum and how corresponding power relations shift from technologies of the self (discipline) to environmental power enacted through infrastructures and algorithmic systems.
Literature Review
The paper synthesizes several bodies of scholarship: (1) Feminist STS and digital humanities on the partiality and constructed nature of data (Haraway’s situated knowledges; Gitelman’s “Raw Data is an Oxymoron”; intersectional digital humanities). (2) Algorithmic bias literature showing encoded racial and gender biases in datasets and models (Bolukbasi et al.; Buolamwini & Gebru; Eubanks; O’Neil) and artistic critiques of ML training sets (Crawford & Paglen; Kronman). (3) Theories distinguishing representation vs operation/operational images and operative media (Drucker; Farocki; Paglen; Hoel; Andrejevic). (4) Foucauldian biopolitics, technologies of the self, and environmentality, with contemporary elaborations in smart city and media environments (Foucault; Gabrys; Hörl; Andrejevic). (5) Self-tracking and quantified self research addressing motivation, discipline, and social comparison (Ajana; Lupton; Kristensen & Ruckenstein; Sanders; Smith & Treem). (6) Platform, infrastructure, and methodology literatures relevant to studying apps and data (Light et al.’s walkthrough method; platform/infrastructure studies; Brock’s CTDA). This corpus underpins the conceptualization of situated data and the representational–operational and disciplinary–environmental spectra.
Methodology
The paper develops a conceptual-analytical method—situated data analysis—and demonstrates it via a qualitative case study of Strava. The method proceeds by: (1) identifying levels at which the same underlying data is situated differently according to source (individual vs aggregate) and intended audience (individual user, humans in general, or machines); and (2) analyzing how each level emphasizes representation (human-oriented visualizations) vs operation (machine processing) and corresponding power dynamics (disciplinary vs environmental). For Strava, four levels are examined: (1) personal data visualized for the individual (and nearby/friends), (2) aggregate data visualized for humans (e.g., Global Heatmap), (3) aggregate data as an operational dataset for human operators via dashboards (Strava Metro), and (4) aggregate data processed primarily by machines (trajectory mining for routing, infrastructure optimization). Methods employed include: semiotic and rhetorical analysis of user interfaces and data visualizations; close reading of platform materials and public demos (e.g., Strava Metro dashboard video); synthesis of existing ethnographic/user studies on Strava; review of technical and urban analytics literature using Strava data (e.g., validation studies and bias critiques); examination of media reports and public incidents (e.g., deanonymization via Global Heatmap); and conceptual mapping of power relations across levels. The author also outlines complementary approaches (walkthrough studies, interviews with stakeholders, reverse engineering, critical code studies, audience studies, and patent analysis) that can operationalize the method on other platforms or deepen access to machine-level operations.
Key Findings
- Data is inherently situated: the same Strava data is constructed, framed, and used differently across audiences and contexts. Representational and operational uses form a continuum rather than a binary. - Four levels of situated data are identified: (1) personal data visualized for the user (and local social comparisons), (2) aggregate data visualized for a general human audience (e.g., Strava Global Heatmap), (3) aggregate data operationalized via dashboards for human decision-makers (Strava Metro for city planning), and (4) aggregate data processed by machines with minimal human-facing representation (trajectory mining for routing, optimization). - Power relations shift along these levels: Level 1 primarily enacts disciplinary power via self-tracking, social comparison, and normative goals (leaderboards, badges), aligning with technologies of the self. Levels 3–4 increasingly enact environmental power by shaping infrastructures, routes, and recommendations that nudge behavior without requiring internalized norms. - Aggregate visualizations are rhetorical and biased representations: the Global Heatmap aesthetically evokes a “God’s eye” view, yet reflects demographic and usage biases (e.g., wealthier Western users), reproducing uneven visibility (e.g., darker regions like parts of Harlem or Africa). - Re-identification risks persist despite aggregation: the 2017–2018 incident showed the Global Heatmap revealed sensitive sites (e.g., US bases) and potentially individual homes, demonstrating how aggregate representations can undermine user privacy when situated differently across levels. - Operational datasets can be useful yet biased: Strava Metro supports urban planning but is not a representative sample (e.g., users skew male and affluent). Still, validation studies found linear relationships with ground counts, e.g., in Victoria, Canada, each Strava rider corresponded to ~51 actual cyclists; in Glasgow, Strava flows aligned with roadside surveys despite 82% male user base. Other uses risk ecological fallacies (e.g., Johannesburg study inferring commuting rates from Strava users). - Data brokerage and third-party sharing underscore broader environmental control and opacity (e.g., Norwegian Consumer Council: 10 tested apps shared data with at least 135 adtech/behavioral profiling third parties). - Methodologically, situated data analysis provides a transferable framework for other platforms (YouTube, TikTok, Instagram, search and ad ecosystems), clarifying how metrics and recommendations move users from disciplinary self-management toward environmental governance embedded in interfaces and infrastructures.
Discussion
The analysis demonstrates that treating data as situated clarifies how platforms encode and enact power through both what data shows to humans and what data does via computation. At the personal level, Strava’s representational dashboards cultivate self-discipline through visibility, comparison, and goal-setting, addressing the research aim of linking representations to disciplinary power. As data is aggregated and operationalized, it becomes an instrument of environmental power, shaping infrastructures, routes, and recommendations often without users’ awareness—thus “changing the rules of the game.” The re-identification episode illustrates cross-level tensions: data framed as communal and anonymous at one level can expose individuals when resituated. The practical relevance is twofold: (1) scholars gain a portable analytic lens to interrogate platforms’ representational rhetoric, data provenance, omissions, and power effects; and (2) practitioners (planners, policymakers) are cautioned to account for demographic biases and privacy risks when using operational datasets, validating against ground truths and combining with complementary data. Extending the model to other platforms clarifies how engagement metrics, recommendation systems, and ad-tech infrastructures convert personal traces into environmental governance that organizes attention, mobility, and opportunity.
Conclusion
The paper contributes a method—situated data analysis—that maps how the same data is differently situated across four levels (personal visualization; aggregate visualization; operational dashboards; machine processing), and how representation–operation and disciplinary–environmental spectra align across these levels. Representation is present both in human-facing visuals and in the construction of data itself; operationality intensifies with distance from the individual user, shifting power from technologies of the self to environmental governance enacted through infrastructures and algorithms. The Strava case illustrates methodological pathways (semiotic/rhetorical analysis, ethnographic synthesis, technical literature review, validation studies, media incident analysis) and generalizes to other platforms (social video, search, ad-tech). Future work should deepen access to operational layers (e.g., audits, code/algorithm studies, patent analyses), triangulate with ground-truth and demographic data to mitigate bias, and investigate stakeholder practices and impacts on different populations. Researchers should also explore additional power constellations beyond the disciplinary–environmental dyad and develop ethical guidelines for cross-level analyses that minimize privacy risks.
Limitations
- Conceptual and qualitative emphasis: limited direct access to proprietary operational systems and machine-level processing; relies on public demos, secondary studies, and technical literature. - The four levels are heuristic; boundaries are fluid and may differ across platforms. - Case specificity: Strava’s user base skews demographically (e.g., male, affluent), limiting generalizability and potentially biasing inferences about cycling/running behavior. - Privacy and ethical constraints limit the extent of reverse engineering or deanonymization analyses. - Validation evidence (e.g., 1 Strava to ~51 actual cyclists) is context-dependent and may not transfer across locales or modalities. No single study can exhaustively cover all levels, methods, and stakeholders.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny