Engineering and Technology
Clustering-based adaptive ground motion selection algorithm for efficient estimation of structural fragilities
T. Kim, O. Kwon, et al.
The study addresses the performance-based earthquake engineering challenge of estimating structural fragilities when ground motions and nonlinear structural responses are highly uncertain and variable record-to-record. Traditional approaches require many nonlinear time-history analyses, causing high computational cost. Prior simplifications (e.g., using SDOF-based proxies or modal pushover) can miss higher-mode effects and complex hysteresis. The research question is whether a smaller, adaptively chosen subset of records can reproduce the fragility obtained from an entire ground motion set representing site hazard. The paper proposes an algorithm that identifies ground-motion features most influential to seismic demand and uses hierarchical clustering and adaptive selection to efficiently estimate fragility without sacrificing accuracy.
Within the PBEE framework, fragility is the conditional failure probability given an IM. Numerous ground motion selection and scaling procedures exist (e.g., spectrum-compatible and conditional spectrum methods), but still often require large record sets. Incremental Dynamic Analysis (IDA) is a common tool for demand estimation. Alternative efficient approaches include using SDOF idealizations and empirical equations (e.g., Vamvatsikos and Cornell), or modal pushover methods, but these may neglect higher-mode and complex hysteretic behaviors. Fragility estimation can be performed by second-moment methods (assuming lognormal distribution of IM at limit-state) or by regression-based cloud analysis using the full IM–EDP dataset. The literature indicates that selection procedures based on single IMs may not capture near-collapse nonlinear behavior, motivating multi-feature characterizations of ground motions and data-driven feature selection (e.g., Lasso) combined with clustering.
Overview: The method comprises (1) identifying ground-motion critical features via Lasso regression using extensive IDA data from a large set of bilinear SDOF systems and (2) an adaptive hierarchical clustering-based selection of records using those features, iterating until fragility convergence.
Dynamic analysis and fragility estimation: IDA is performed with a chosen IM (e.g., Sa(T1)) and EDP (e.g., drift). For second-moment fragility construction, the minimum IM at limit-state per record is assumed lognormal; parameters are estimated by sample mean and standard deviation of ln(IM) at failure.
Candidate features and preprocessing: Twenty-eight ground-motion features from basic, peak, cumulative, and mixed categories are considered (e.g., duration tstrong, zero-crossing during strong motion fstrong, magnitude, epicentral distance, soil class; peak measures PGA, PGV, PGD, spectral measures Sa/Sv/Sd at T1; integrated/geometric spectral intensities, standardized spectra; cumulative indices CAA/CAV/CAD/SCAA/Arias intensity; mixed Fajfar index). For feature calculation, all records are scaled to a common IM level (e.g., Sa(T1)=1g), then standardized (zero mean, unit variance). Features unaffected by scaling (e.g., magnitude, soil class) are included as-is.
Identification of critical features with Lasso: A comprehensive dataset is created from 1,692 bilinear kinematic hardening SDOF systems (periods 0.1–2.5 s in 47 steps; normalized yield strength 0.1–1.2 g in 9 steps; post-yield stiffness ratio 0–0.3 in 4 steps) and 135 NGA-West records (M>6, 15<R<35 km, soil classes A–D). For each system and three displacement limit-states (2dy, 4dy, 6dy), IDA is performed and Sa(T1) at limit-state is recorded. Lasso regression is applied to ln Sa(T1)ls versus ln(feature set) with a multiplicative base model transformed to additive form. Feature importance is evaluated by (a) sensitivity (squared coefficient magnitude) and (b) frequency (nonzero occurrences across systems). Features are ordered by distance from origin in the sensitivity–frequency plane. OLS models with progressively more features quantify residual standard deviation reduction; the set achieving 85% of the total reduction is designated critical.
Critical features: Nine features are identified as critical for Sa(T1)-based IDA and drift-based EDP: Sveff(T), Spectral shape, Sdeff(T), Soil class, fstrong, Sageo(T), PGV, CAA, and PGD.
Adaptive hierarchical clustering algorithm: In the space of the nine critical features, hierarchical clustering with Ward linkage is performed. Starting with K=2 clusters, one record per cluster is randomly selected for analysis. Fragility is estimated assuming non-selected records in a cluster share the representative record’s result at the limit-state, so the effective sample count equals the original set size. Convergence is checked using a fragility-difference (FD) metric based on differences in quantiles normalized by the base fragility’s mean IM; two criteria must be below prescribed tolerances (ε1, ε2). If not converged, increment K and repeat, reusing previous analyses. Guidance on ε: to achieve FD<0.03 broadly, ε≈0.001–0.0015 is recommended.
Variant for cloud analysis: When using regression-based fragility from the IM–EDP cloud (ln EDP = α ln IM + β with homoscedastic residuals), clustering still uses the critical features but all IDA points inform the regression. Convergence is defined by relative changes in α and β between iterations.
- Nine critical ground-motion features were identified (Sveff(T), Spectral shape, Sdeff(T), Soil class, fstrong, Sageo(T), PGV, CAA, PGD) that capture demand variability across many SDOF systems and limit-states.
- Using the adaptive clustering algorithm with these features significantly reduces the number of records needed while maintaining near-optimal fragilities: • Bilinear SDOF example (135 NGA-West records): convergence with 49, 50, and 46 records for limit-states 2dy, 4dy, and 6dy, respectively; estimated curves closely match the optimal (full-set) fragilities, whereas random subsets of the same size show large variability. • Three-story RC frame (135 recorded motions): convergence with 55 (serviceability), 54 (damage control), and 46 (collapse prevention) records; about 40% of the full set sufficed with good agreement to the optimal fragilities. • Three-story steel MRF (200 synthetic motions): convergence with 52 records; performance robust to change in full-set composition and limit-state definition (IM- vs DM-based). • Nine-story steel MRF (135 recorded motions): with ε=1e-3, about 75 records were required and the match was weaker due to higher-mode effects; tightening ε to 7e-4 yielded near-optimal fragility using 115 records.
- Sensitivity to feature choice: using non-critical (random) features degraded performance, showed larger FD fluctuations, and risked local-optimum convergence.
- Sensitivity to tolerances: average iteration count decreases and final FD increases with looser ε; to achieve FD<0.03 broadly across 1,692 SDOF systems, ε should be ≤0.0015 (the study adopts 0.001).
- Cloud-analysis variant on the nine-story steel MRF achieved close agreement with only 51 records and rapid convergence of regression coefficients, indicating that using the full IM–EDP relationship can further improve efficiency.
- Overall, the algorithm is robust across structural types, ground-motion sets, and limit-state definitions for low- to mid-rise systems; performance is comparatively weaker for systems with strong higher-mode effects unless stricter tolerances (and thus more records) are used.
The proposed approach directly addresses the computational burden of fragility estimation by leveraging data-driven identification of demand-influential ground-motion features and clustering records in that feature space. This enables selection of a minimal yet representative subset while preserving the statistical characteristics of the full hazard-consistent set. The FD-based convergence effectively quantifies differences in both mean and dispersion of fragilities, ensuring the estimated curve approaches the full-set baseline as clusters are refined. Numerical studies show that the method yields near-optimal fragilities with roughly 30–60% of the records for SDOF and low-rise MDOF systems and remains effective across different record sets and limit-state definitions. For taller buildings where higher modes contribute significantly, the SDOF-derived features are less predictive, necessitating either tighter convergence tolerances (bringing the result closer at the cost of more analyses) or enriched features that account for multi-modal behavior. The cloud-analysis variant mitigates some limitations by exploiting the full IM–EDP data, further enhancing efficiency. These findings suggest the method offers a practical, broadly applicable pathway to efficient, accurate fragility assessment within PBEE workflows.
The paper introduces a clustering-based adaptive ground motion selection algorithm that identifies and uses critical ground-motion features to adaptively select representative records for efficient fragility estimation. Key contributions include: (1) a systematic Lasso-based procedure to prioritize and select nine critical features that broadly capture demand variability; (2) a hierarchical clustering and adaptive selection framework with an FD-based convergence criterion; and (3) demonstrations on SDOF and multiple MDOF buildings showing substantial reductions in required analyses while achieving fragilities compatible with full-set results. The approach is applicable to both second-moment and cloud-based fragility estimation. Future work should develop feature sets that explicitly account for higher-mode participation and complex hysteretic behaviors (e.g., deterioration, pinching), extend the methodology to near-fault motions (e.g., directivity pulses), and explore data-driven or physics-informed features that could generalize across assessment methods and potentially lead to new, more predictive IMs.
- Critical features were identified using idealized bilinear SDOF systems; performance degrades for taller systems with significant higher-mode effects unless stricter convergence tolerances and more records are used.
- The selected features may not fully represent behaviors with complex hysteresis (e.g., strength deterioration, pinching), potentially limiting accuracy near collapse.
- The feature set and algorithm were demonstrated on far-field records; near-fault motions may require different or additional features.
- Results depend on the chosen IM–EDP pairing and fragility estimation method; differences were observed between second-moment and cloud approaches.
- Assumptions such as lognormality of IM at limit-state (for second-moment) and homoscedastic regression residuals (for cloud) may not always hold.
Related Publications
Explore these studies to deepen your understanding of the subject.

