Introduction
Bike-sharing programs aim to solve first- and last-mile transportation problems. Originating in the 1970s in the Netherlands, early systems used docked bikes at stations, optimizing station locations based on demand and rebalancing costs. More recently, dockless systems have emerged, sometimes complementing, substituting, or competing with docked systems. Existing research mainly uses ridership data or surveys with structured questionnaires (e.g., Likert scales). Ridership data, while valuable, isn't always available during planning stages, and survey-based studies restrict respondent expression. Structured questionnaires, particularly Likert scales, limit the richness of responses and can lead to biased conclusions due to their inherent limitations, including sensitivity to the number of scale points, midpoint issues, and visual representation effects. Econometric models, while frequently used, often reveal associations rather than causal relationships, and endogeneity biases can be challenging to address. This study proposes that open-ended questionnaires allow for more nuanced and comprehensive expression of public opinion, particularly regarding emerging technologies. However, the analysis of such data is difficult using traditional methods. Advancements in text mining offer tools to analyze unstructured open-ended data. Previous text mining applications in bike-sharing studies often used sentiment analysis (providing limited information) or keyword frequency analysis (ignoring connections between words). This research addresses these limitations by applying a state-of-the-art text network approach to analyze open-ended responses from a survey of Seattle residents on docked and dockless bike-sharing systems. The text network approach allows for the identification of frequently co-occurring keywords, visualizing connections and providing policy relevant conclusions.
Literature Review
Several studies have investigated public perception and ridership of bike-sharing systems using various methods. Some have utilized actual ridership data, offering accurate service utilization estimates but often unavailable during the planning phase. Others have employed surveys with structured questionnaires, using regression models to analyze responses. These studies have identified associations between ridership and factors like system scale, station density, geographic coverage, and pricing. However, these closed-ended approaches suffer from limitations, including the inability of respondents to fully express their opinions, biases inherent in Likert scales, and the focus on associations rather than causal relationships. Previous text mining applications in bike-sharing studies focused on simpler approaches like sentiment analysis or keyword frequency, neglecting contextual information and connections between words. In contrast, this study leverages the more sophisticated text network analysis, commonly used in social sciences, to overcome these limitations and provide a richer understanding of public perception.
Methodology
This study employs text network analysis, a branch of text mining that reveals hidden insights within texts. The analysis involves four main steps: text normalization (removing connecting words and symbols, converting to lowercase), transformation from unstructured to structured data (creating a corpus of keywords), generation of nodes and links (creating a network), and the development of quantitative indices for in-depth analysis. The text network consists of nodes (keywords) and edges (co-occurrence of keywords), with edge thickness reflecting co-occurrence frequency. Degree centrality (number of edges) and betweenness centrality (ability to connect different clusters) measure keyword influence. This study uses the two-word gap method to build the network, identifying consecutive words that co-occur in a sentence. The algorithm assigns a new node to each unique pair of words. For each co-occurrence, a weight is added to the edge connecting the corresponding nodes. Co-occurrence and collocation (adjacency of keywords) are analyzed and statistical significance is assessed using ANOVA. The analysis is performed using the R 4.0.0 environment and the quanteda package. For interpretation, the top 40 most frequently co-occurring keywords are selected, a common practice in text mining.
Key Findings
The study analyzed open-ended responses from 701 Seattle residents (out of 783 surveyed) regarding their perceptions of docked (Pronto) and dockless (Spin, LimeBike, Ofo) bike-sharing systems. The analysis focused on four open-ended questions: positive and negative aspects of Pronto, and aspects where dockless systems were better or worse than Pronto. For positive aspects of Pronto, ‘stations’ was central, with positive associations to location, helmet availability, ease of use, and annual memberships. However, co-occurrence and collocation analyses revealed ‘easy use’ as the most positively perceived aspect, highlighting the importance of considering keyword relationships beyond simple frequency. For negative aspects of Pronto, ‘stations’ again was central, with negative associations to insufficient numbers, poor locations, and limited service area. A sparse network indicated diverse negative opinions, making system improvements challenging. For dockless systems' advantages, ‘easy’ was central, indicating ease of finding, accessing, and using the bikes, along with convenience and extensive coverage area. For dockless systems' disadvantages, concerns focused on bikes blocking sidewalks, and low helmet usage. Analyses of responses were also segmented based on user experience. Users of both systems offered the most detailed and consistent feedback. In contrast, those with no bike-sharing experience provided sparse information. Regardless of experience, sidewalk blockage and low helmet use were common concerns about dockless bikes. Docked bike users emphasized poor data use for planning by the dockless companies.
Discussion
The study's findings directly address the research question of how text network analysis can be used to understand public perceptions of bike-sharing systems. The results demonstrate the value of this approach in capturing the nuances of opinions that are missed by traditional survey methods. The identification of ‘easy use’ as a key positive aspect of Pronto and the varied negative perceptions of Pronto highlight the limitations of relying solely on frequency analysis of keywords. The consistently identified concern about sidewalk blockage and low helmet use regarding dockless systems underscores the need for targeted policy interventions. The observation that users of both systems provide the most detailed feedback suggests that surveying experienced users might be a more efficient strategy for future studies. The consistency of these qualitative findings with those from previous econometric studies supports the use of a hybrid approach, combining both quantitative and qualitative methods.
Conclusion
This study presents the first application of text network analysis to investigate public perceptions of bike-sharing systems. It demonstrates the method’s value in providing rich insights into user opinions, revealing subtle relationships between keywords and addressing limitations of previous text mining approaches. Key findings highlight the importance of station location, ease of use, and helmet usage. The study shows the diverse and varied nature of negative perceptions, especially for docked systems. The consistent concerns regarding sidewalk blockage and helmet use for dockless systems necessitate policy interventions. Future research should explore hybrid approaches combining text network analysis with econometric methods and demographic segmentation for more comprehensive understanding.
Limitations
The study’s reliance on a single city (Seattle) limits generalizability. The survey’s sampling method might not fully represent the entire population of Seattle. The subjective nature of text analysis necessitates careful interpretation of results. Despite statistical significance testing for collocations, the qualitative nature of the findings means no direct statistical relationships can be established between keywords and respondents' perceptions. Finally, the choice to analyze only the top 40 keywords may have omitted valuable insights from less frequent words.
Related Publications
Explore these studies to deepen your understanding of the subject.