Engineering and Technology
Self-organizing maps of typhoon tracks allow for flood forecasts up to two days in advance
L. Chang, F. Chang, et al.
Taiwan is highly vulnerable to western North Pacific tropical cyclones due to its mountainous topography and short, steep rivers that respond rapidly to typhoon rainfall. Climate change is expected to increase the frequency and intensity of damaging typhoons, and orographic effects cause intense, localized rainfall and rapid runoff. Existing flood forecasts based on rainfall–runoff models typically provide only short lead times (on the order of hours), which are insufficient for reservoir drawdown and flood defense planning that require days. Improved forecasting is needed that links typhoon track predictions to expected watershed inflows. This study aims to digitize analog typhoon tracks, cluster similar tracks using self-organizing maps (SOM), derive event-specific flow characteristic curves (FCCs), and use these to predict reservoir inflow hydrographs up to two days before landfall with continual updates, thereby improving early warning for flood control and water supply management.
Prior work has clustered typhoon tracks using methods such as K-means and fuzzy clustering, and has used AI and remote sensing to improve rainfall–runoff forecasting. However, many approaches rely on limited trajectory points and do not fully integrate track–terrain interactions for site-specific rainfall and inflow prediction. Improved sensor networks in Taiwan have facilitated access to remote sensing data, enabling machine learning applications. Traditional rainfall–runoff models have demonstrated reliable real-time forecasts up to about six hours before events, which is inadequate for reservoir operations. Recent AI studies have effectively handled high-dimensional hydro-meteorological data, but there remains a need to convert full analog typhoon tracks into digital vectors to better couple track patterns with watershed responses, and to bridge track prediction with longer-lead flood hydrograph forecasting.
Study area and data: The Shihmen Reservoir watershed (763 km²; capacity 197 million m³; annual precipitation ~2500 mm) experiences about three typhoons per year and requires pre-typhoon drawdown for flood control. Data from 97 typhoons (1965–2019) that produced reservoir inflow >600 m³/s were compiled, including total rainfall, typhoon tracks, Central Weather Bureau (CWB) warning timestamps, and hourly reservoir inflows from the Water Resources Agency (WRA). The maximum recorded flow was 8594 m³/s. Of the 97 events, 87 were used for training and 10 for testing.
Pipeline (four modules):
-
Typhoon track vectorization: Analog tracks were projected onto a base 5×5 grid spanning 116–126°E and 20–30°N, refined near the watershed into variable-size cells (huge 2×2°, large 1×1°, medium 0.5×0.5°, small 0.25×0.25°) to capture local sensitivity. Across the domain, 277 grid cells were defined. For each event, grid cells traversed by the track were assigned base weights by grid size (1 huge, 2 large, 3 medium, 4 small). A diffusion process accounted for near-track uncertainty by assigning nonzero weights to adjacent zero-weight cells: for large and medium cells, adjacent zero-weight neighbors received 0.8 and 1.5, respectively; for small cells, two layers received 2.5 (radius 1) and 1 (radius 2). The result was a 277-dimensional weighted vector per typhoon, preserving direction, speed, and duration characteristics.
-
SOM clustering: The 87 training vectors were clustered using self-organizing maps. Candidate SOM sizes (3×3, 4×4, 5×5) were evaluated; 4×4 neurons provided the best balance between distinct classification and adequate samples per cluster. The SOM topological map grouped similar-shaped tracks into neighboring neurons, reflecting spatially coherent rainfall impacts on the Shihmen watershed. Tracks approaching the northern coast with greater watershed rainfall tended to fall in clusters #9, #10, #13, and #14, which produced larger inflows, while tracks missing the watershed fell in clusters such as #7, #8, and #12 with low inflows.
-
FCC extraction: For each event, an FCC (normalized cumulative inflow versus normalized duration) was derived from the observed inflow hydrograph. Normalization mapped cumulative inflow volume and typhoon duration (TD) to [0,1]. Three TD schemes were tested to align FCC shapes within clusters: (i) arrival–departure of the typhoon across the gridded zone; (ii) onset of rising limb to departure; (iii) onset of rising limb to cessation of rainfall. The latter two schemes reduced effective TD and improved shape similarity within clusters by excluding inconsistent pre-rise and post-departure tails. FCCs were tagged to their SOM clusters. For operational prediction, total flow volume is estimated from forecast total rainfall, allowing use of cluster-representative FCCs without needing full rainfall-runoff histograms.
-
Flood hydrograph prediction: For an approaching typhoon, its forecast track is matched to the best SOM cluster (best-matched cluster). Two FCC selection strategies were used: (a) best-matched track FCC (the historical event within the cluster whose track most closely matches the forecast); (b) ensemble average FCC of all events in the best-matched cluster. The predicted hydrograph is obtained by scaling the selected FCC by forecast total inflow volume (from forecast total rainfall) and stretching over the chosen TD scheme.
Benchmarking: A conceptual storage function model (SFM) was calibrated on the 87 training events and applied to the 10 test events using historical rainfall patterns (simulations). Performance comparisons used RMS error and R², summarized in Supplementary Table 1.
Implementation details: The approach supports continual updates as track forecasts evolve during the event. Track forecast errors are partially mitigated by SOM topology, where neighboring clusters have similar behavior. Data and custom codes are available from the authors per the paper.
- The SOM-based clustering of 87 training events produced 16 track clusters with clear hydrologic signatures: clusters #9, #10, #13, and #14 were associated with larger peak inflows (>5000 m³/s in highest cases), while clusters #7, #8, #12 produced lower inflows.
- Across 87 events, time to peak (TP) ranged 7–55 h, with mean TP per cluster generally <24 h, indicating rapid watershed response that complicates reservoir operations without long lead times.
- Peak flow (QP) among events ranged from 662 to 8594 m³/s. Clusters #1–#8 generally had highest QP <3000 m³/s, while clusters #9, #10, #13, #14 included highest QP values >5000 m³/s.
- Redefining typhoon duration improved intra-cluster FCC consistency: for example, cluster #5 TD range narrowed from 48–167 h (scheme 1) to 35–120 h (scheme 2) and to 23–66 h (scheme 3).
- Predictive performance on 10 independent test typhoons (2013–2019) showed predicted hydrographs generally matched observed hydrographs with acceptable peak timing and magnitude errors, enabling lead times of several days and up to two days before landfall for actionable reservoir management.
- Case studies: Typhoon Fitow (cluster #15) was predicted almost perfectly using the best-matched track FCC (Typhoon Cora) under TD schemes 2 and 3; ensemble FCC also performed well under scheme 3. Typhoon Soulik (cluster #13) was suitably predicted using both best-match and ensemble strategies under scheme 3. Typhoon Dujuan (cluster #3) was well predicted using best-match (Typhoon Talim) and ensemble under schemes 2–3.
- Compared to the storage function model (SFM), the AI method achieved smaller RMS errors and larger R² across all 10 test events, especially for high-peak events (e.g., Typhoon Aere), where SFM underestimated peaks while the AI best-match strategy reproduced them closely.
- Robustness to track forecast error: An 80 km forecast error for Typhoon Lekima mapped the forecast to cluster #15 while the actual was cluster #16; resulting peak and timing differences remained within acceptable management ranges due to SOM neighborhood continuity.
- Operationally, the method provides early warnings up to two days before landfall and supports real-time updates as tracks evolve, aligning with reservoir drawdown needs (e.g., reducing from 245 m to 240 m requires ~11 h at 1000 m³/s or ~37 h at 300 m³/s).
The study demonstrates that digitizing full typhoon tracks and clustering them via a SOM, then associating clusters with normalized flow characteristic curves, effectively links forecasted track patterns to watershed inflow responses. This addresses the challenge of providing longer lead times than traditional hourly-based rainfall–runoff forecasts by bypassing the need for precise rainfall time series and instead using track-informed FCCs scaled by forecast total rainfall. The approach captures key topographic and trajectory influences, explaining why certain track types (e.g., northern-coast approaches) yield higher inflows. Results on independent events show that both peak magnitude and timing can be anticipated with sufficient accuracy to inform reservoir operations days in advance, improving flood defense and water supply reliability. The SOM topology lends error tolerance when track forecasts deviate moderately, maintaining reasonable predictions by mapping to neighboring clusters. Compared with a conceptual SFM, the AI approach better reproduces high peaks and overall hydrograph shapes, underscoring its suitability for typhoon-driven floods where nonlinearity and spatial heterogeneity challenge traditional models. Incorporating continued track updates enhances situational awareness during events.
This work introduces a practical AI framework that digitizes typhoon tracks, clusters them with a 4×4 self-organizing map, and employs cluster-specific flow characteristic curves to forecast reservoir inflow hydrographs up to two days before landfall with real-time updates. Applied to the Shihmen Reservoir watershed using 97 historical typhoons, the method outperforms a conceptual storage function model, especially for high-peak events, and exhibits robustness to moderate track forecast errors. The framework provides actionable early warnings for reservoir drawdown and flood defense while maintaining water supply. Future work should incorporate additional predictors such as total rainfall amount (operationally already used) and tropical cyclone velocity to further refine FCC selection and timing, extend the forecast horizon beyond two days as track and rainfall prediction skill improves, and generalize and validate the approach across diverse watersheds and storm regimes.
- Dependence on track and rainfall forecasts: Accuracy relies on the quality of forecasted typhoon tracks and total rainfall; large errors can lead to misclassification and degraded predictions.
- Timing uncertainties: Exact times of rainfall cessation and typhoon departure cannot be known a priori, requiring estimation from historical analogs or simple kinematic calculations.
- Subjective weighting in vectorization: Grid size and diffusion weights were selected via trial and error, which may influence clustering outcomes.
- Site specificity: The model is trained on Shihmen Reservoir events; transferability to other watersheds requires retraining and local calibration of grids and FCCs.
- Limited benchmarking: Comparative evaluation used SFM with historical rainfall patterns; broader comparisons with other modern models and true forecast inputs would strengthen validation.
- Data constraints: FCC scaling assumes reliable conversion from forecast total rainfall to total inflow volume, which may vary with antecedent conditions and storm structure.
Related Publications
Explore these studies to deepen your understanding of the subject.

