logo
ResearchBunny Logo
A text mining approach to elicit public perception of bike-sharing systems

Transportation

A text mining approach to elicit public perception of bike-sharing systems

B. Kutela, N. Langa, et al.

Discover the insights from a groundbreaking study conducted by Boniphace Kutela, Neema Langa, Sia Mwende, Emmanuel Kidando, Angela E. Kitali, and Prateek Bansal, which utilizes a unique text network approach to analyze public perceptions of bike-sharing systems in Seattle. This research reveals the pros and cons of docked and dockless systems, highlighting user experiences and potential policy recommendations that could shape the future of urban transport.

00:00
00:00
~3 min • Beginner • English
Introduction
Bike-sharing programs address first- and last-mile challenges and have evolved from station-based (docked) to free-floating (dockless) models. Prior research typically relies on actual ridership data or structured, closed-ended surveys to infer public perceptions and usage determinants. While ridership data provide utilization estimates, they are often unavailable at planning stages and can be biased when transferred across cities with different topologies and weather. Closed-ended surveys constrain expression, depend on design choices (e.g., Likert scales), and regression analyses primarily reveal associations subject to endogeneity. The study argues that open-ended questionnaires can better capture opinions—especially for emerging services—yet are underused due to analysis challenges. This paper’s purpose is to apply a text network analysis to open-ended responses from Seattle residents to elicit perceptions of docked (Pronto) and dockless (Spin, LimeBike, Ofo) systems, identify central themes (e.g., stations, access, parking, helmets), and explore heterogeneity by user experience. The overarching research question is: What do open-ended text responses reveal about perceived positive and negative characteristics of docked and dockless bike-sharing systems, and how do these perceptions vary by user experience?
Literature Review
- Empirical ridership and survey studies have linked usage to supply-side factors such as system scale, station density, geographic coverage, and pricing. However, ridership data are not always available for planning and may not transfer well between cities (topology, weather). Closed-ended surveys constrain respondents and introduce measurement sensitivities (scale points, midpoint presence, visual representation) and endogeneity issues in econometric models. - Text mining on social media has been used to study bike-sharing sentiments and keyword frequencies (e.g., tweets in Washington DC; public opinion on dockless in DC, SF, Seattle; regional sentiment comparisons). These approaches often provide aggregate sentiment or frequency counts without mapping connections among concepts. - Text network analysis, widely used in social sciences, addresses limitations of sentiment/frequency-only methods by uncovering frequently co-occurring keywords and communities, enabling visualization of connections and meaning circulation (e.g., Paranyushkin 2011; Drieger 2013; Kim and Jang 2018). - The paper positions text network analysis as a complementary method to traditional econometrics, suitable for analyzing open-ended responses to derive policy-relevant insights and to potentially integrate with quantitative models in a hybrid framework.
Methodology
Design: Apply text network analysis to open-ended survey responses from Seattle residents about docked (Pronto) and dockless (Spin, LimeBike, Ofo) systems. Four open-ended questions were analyzed: (1) What worked well with Pronto? (2) What did not work well with Pronto? (3) What are dockless companies doing better than Pronto? (4) What are dockless companies doing worse than Pronto? Data: Peters and MacKenzie (2019a) collected 783 responses (Feb–Mar 2018) via UW online channels and social media; 701 (89.5%) answered at least one open-ended question. Incentive: lottery for a premium bag. Seattle context: Pronto (docked) operated 2011–2014; dockless launched in 2017 and continued. Sample characteristics included user-experience segments: only dockless users (39.9%), only docked (3.0%), both (25.4%), neither (31.7%); male 55.6%; White 85.3%; income spread across four brackets; generations primarily Millennials and Gen X. Text network procedure: - Components: Nodes/clusters represent keywords; edges represent co-occurrence within sentences. Edge thickness ∝ co-occurrence frequency. Communities are groups of keywords with similar patterns. Degree centrality captures number of connections; betweenness centrality captures bridging across communities. - Steps: 1) Text normalization: remove connecting words/symbols, lowercase. 2) Transform to structured corpus of keywords. 3) Network creation using the two-word gap algorithm (preferred over five-word gap for consistency): scan sentences to record first-time words as nodes; for consecutive word pairs, create or increment weighted edges; no cross-sentence links. 4) Quantitative indexes: visualize co-occurrence networks and conduct collocation analysis (adjacency). Association measures for collocation were computed and tested via ANOVA following Blaheta and Johnson (2011). Degree and betweenness centralities inform influence and thematic bridges. - Implementation: R 4.0.0 with quanteda; direct visualization without filtering/generalization. Interpret top 40 keywords due to network size, following prior text-mining practices. - Heterogeneity: Separate networks generated for user-experience subgroups (only docked, only dockless, both, non-users) for each question.
Key Findings
Response rates to open-ended questions: (1) Positive aspects of docked: 418 responses (53%); (2) Negative aspects of docked: 518 (66%); (3) Dockless better than docked: 672 (86%); (4) Dockless worse than docked: 605 (76%). Positive aspects of docked (Pronto): - Stations were central to positive perceptions, linked with locations, availability, and helmets; themes included good/great station locations and availability of helmets at stations. - Co-occurrence/collocation analyses highlighted easy-use as the top positive factor, surpassing raw frequency rankings of ‘helmet’ and ‘stations’. Table 2: easy use (co-occurrence 15; collocation count 13; z=10.06), never used (collocation count 9; z=9.05), annual membership (collocation count 9; z=8.53), worked well (collocation count 8; z=8.43), provided helmet (collocation count 7; z=7.34). - Subgroup insights: docked users valued station locations and ease; dockless-only users often indicated didn’t/never use but still cited station location as a positive; users of both also praised annual memberships; non-users focused on never-used and provided-helmets. Negative aspects of docked (Pronto): - Core criticisms: insufficient number of stations, poor station locations, limited service area. ‘Stations’ had the highest frequency (285). Table 3: docking stations (collocation count 48; z=13.99), service area (23; z=14.10), enough stations (25; z=9.61). Additional collocations included wanted go (11; z=11.27) and pricing scheme ($8/day-related) though pricing appeared with thin edges (minor concern). - The network sparsity beyond the station-related core suggested diverse, fragmented negative views, implying challenging system improvements due to heterogeneous complaints. - Subgroups consistently centered on ‘stations’; dockless users emphasized not-near/inconvenient stations and smaller coverage area; similar themes for both-users and non-users. Dockless better than docked: - Central theme: ease. ‘Easy’ had the highest frequency (174) and connected to find, access, use. Table 4 collocations: easy use (count 49; z=15.81), easy find (27; z=11.79), easy access (24; z=11.18). - Flexibility and coverage: clusters for destination, leave, everywhere/anywhere, available; service area and leave anywhere among top collocations (service area 18; z=11.68; leave anywhere 20; z=13.26). Dockless appreciated for not needing docking stations. - Subgroups: all reported easy-find; most also had easy-use; themes of leave/find anywhere were widespread with minor heterogeneity. Dockless worse than docked: - Parking and sidewalk obstruction: strong edge between leave and locations; block–sidewalk was the top collocation (Table 5: block sidewalk collocation count 21; z=13.13). Other co-occurrences: leave locations (24), park sidewalk (15), leave sidewalk (14), leave everywhere (11). - Helmets: major concern about low helmet use and availability; helmet frequency 123; co-occurrences included people–helmet (16), don’t–helmet (15), helmet–use (14), without helmet (11). Collocations included lack helmet, helmet available, without helmet (z-values ~7.6–7.8). Literature corroborates low helmet use among dockless riders (~20%). - Subgroup nuances: docked users cited pedestrian safety and perceived poor use of data for planning; dockless users noted helmet non-use and maintenance; both-users mentioned app quality, broken bikes, Lime quality; non-users disliked helmet sharing (people-use-one-helmet; people-use-share-helmet). Cross-cutting insights: - Stations dominate both positive and negative perceptions of docked systems. - Dockless systems are lauded for ease, access, and flexibility, but criticized for sidewalk blocking and low helmet use. - Network sparsity for docked negatives indicates many small issues rather than a single dominant one. - Findings align with prior econometric and text-mining studies, supporting the validity of the text network approach.
Discussion
The text network analysis of open-ended responses directly addresses the research question by revealing connected, central themes in public perceptions that closed-ended surveys or keyword frequencies alone might miss. For docked systems, station density, placement, and coverage are the pivotal determinants of both praise and criticism, emphasizing the need for optimized station location and expansion strategies. The sparse set of additional negatives suggests varied, individualized pain points, complicating one-size-fits-all fixes. For dockless systems, the core value proposition—ease of finding, accessing, and using bikes, and flexible travel/parking—explains their success. Yet the same flexibility generates externalities: sidewalk obstruction and low helmet use raise pedestrian safety and rider risk concerns. Policy relevance includes: increasing station coverage and relocating poorly sited docks for docked systems; implementing designated parking, geofencing, or incentives to curb sidewalk clutter for dockless; and strengthening helmet availability/usage policies or campaigns. Subgroup analyses show broadly consistent themes with nuanced differences, indicating that interventions such as parking controls and helmet initiatives would be broadly acceptable, while app quality, maintenance, and data-driven planning improvements could target experienced users’ specific concerns. The concordance with prior econometric findings suggests these text-derived insights are robust, and motivates a hybrid framework wherein co-occurrence-based features inform or complement regression models for more reliable recommendations.
Conclusion
This paper demonstrates the first use of text network analysis to elicit public perceptions of docked and dockless bike-sharing from open-ended survey responses in Seattle. Methodologically, the approach transforms unstructured opinions into networks of co-occurring and collocated keywords, revealing central themes and their connections without constraining respondents. Substantively, perceptions hinge on: (i) station locations and coverage (key for docked); (ii) ease of access/use and flexibility (key strengths of dockless); and (iii) externalities of dockless systems—sidewalk blockage and low helmet use—as primary concerns. The negative perceptions of docked systems are diverse (sparse network), implying nontrivial challenges for system improvement. Future directions proposed include: employing a hybrid approach that combines structured/econometric analyses with text networks; segmenting surveys into closed- and open-ended components tailored to user experience; and using top co-occurrence-derived dummies as explanatory variables in regression models. With larger samples, hierarchical segmentation can explore demographic heterogeneity (e.g., age–gender cohorts) to profile groups with specific positive or negative perceptions and support targeted policy and design interventions.
Limitations
- The text network provides qualitative, point-estimate-like relationships among keywords; it cannot, by itself, establish clear statistical relationships or causal links between perceptions and factors (unlike ordered logit/factor analysis on structured data). - Although collocation significance is tested, overall inference remains associative and descriptive. - Open-ended responses can be sparse or heterogeneous (especially among non-users), limiting interpretability for some groups. - The study focuses on Seattle during a specific period and may reflect local system history and demographics (e.g., high share of White respondents), which can affect generalizability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny