
Humanities
A fragment-based approach for computing the long-term visual evolution of historical maps
R. Petitpierre, J. H. Uhl, et al.
Discover an innovative methodology in cartographic stylometry that treats maps as visual languages. By analyzing 10,000 French and Swiss maps from 1600 to 1950, this research by Remi Petitpierre, Johannes H. Uhl, Isabella di Lenardo, and Frédéric Kaplan reveals evolving abstraction processes and cultural shifts in mapping practices.
~3 min • Beginner • English
Introduction
The study investigates maps as culturally constructed artifacts and proposes a quantitative, corpus-based methodology—cartographic stylometry—to analyze their visual language over time. Motivated by digitized historical map collections and inspired by corpus linguistics and structuralist/semiotic concepts, the authors aim to identify formal, distinctive features of cartographic figuration. They view maps as visual language systems and define a fragment-based approach using small image units (mapels) to operationalize visual features. The research questions include: how to characterize cartographic figuration computationally; how to measure figurative distance between maps; and how time and scale relate to figurative similarity. The methodology contrasts higher-level coherent structures (map series) to learn pertinent features. Hypotheses: an optimized feature space will (1) distinguish series depicting the same place and (2) distinguish series at similar scales; conversely, highly standardized Napoleonic cadastral series should be largely indistinguishable due to uniform conventions. Figuration is used here to denote the graphical configuration of features on map images (not their semantic meaning).
Literature Review
The paper situates its work at the intersection of history of cartography, semiotics, corpus linguistics, and computer vision. It builds on views of maps as social constructions and power/knowledge practices (Harley; Crampton; Foucault), the role of cartography in state formation (Branch; Schulten), and cartographic semiology (Bertin). From corpus linguistics and distributional approaches (Davies; Hardie & McEnery; Hilpert & Gries), the authors adapt tokenization concepts to visual analysis (“distant viewing,” Arnold & Tilton). Prior work on map symbol catalogs (Dainville) is noted as limited and subjective. In computer vision for historical maps, prior efforts include semantic segmentation and feature extraction (Hosseini et al.; Uhl et al.; Petitpierre et al.). The authors draw on semiotic principles (pertinence), contrastive learning ideas, and interpretable features from image analysis (HOG, LBP, Otsu thresholding, edge density) to design a transparent, explainable stylometric framework.
Methodology
Corpus: 10,046 digitized French and Swiss maps (1600–1950) from 32 institutions (e.g., BnF: 4,817; University Library of Zurich: 1,452), restricted to European France (75%) and Switzerland (25%). Maps of France in Swiss collections and vice versa were excluded to avoid cultural context confusion. Selection was based on period, scale (1:200 to 1:53,500), and coverage; scales were sometimes estimated by expert review. For large series, 2–5% sampling (min 5 maps/series) was applied. Metadata includes >1,800 publishers and 2,700 creators; geocoding used Nominatim on extracted place names with manual checks.
Series for contrastive learning: 11 coherent series (1063 maps, ~10% of corpus) spanning topics, scales, periods, and places: three Lausanne cadastral series (Melotte 1721–27; Berney 1827–31; Deluz 1880–86), three Napoleonic cadastral series (Rhône 1808–68; Côtes d’Armor 1800–66; Haute-Marne 1800–50), two Paris city atlases (Jacoubet 1825–40; Alphand 1860–89), and three national/topographic series (French Etat-Major 1:40,000 1831–65; IGN 1:50,000 1926–63; Swiss Swisstopo including Siegfried 1870–1949).
Preprocessing and segmentation: A supervised semantic segmentation (OCRNet with HRNet-V2p-W48, pretrained on Cityscapes) separates geographic content from map background (legends, frames, scanner margins). Trained on 1,061 manually annotated maps (70/20/10% split), achieving IoU 87.7% (pixel accuracy 90.5%). Low-information areas are filtered using ED2 edge-density metric; tiles with ED2 < 2.75 are excluded.
Fragmentation into mapels: Each map is sampled into 800 mapels (50×50 px tiles). Non-geographic and empty/low-load regions are removed. Initial tile positions are random, then each tile recenters on the most salient local feature. The dominant orientation is computed from the Histogram of Oriented Gradients (HOG) and used to rotate the mapel to a neutral orientation; the original orientation value is stored as an explicit feature.
Candidate features (13 total) computed per mapel: color distributions (RGB histogram variants; peak, std, skew, kurtosis per channel), morphology (HOG variants), texture (LBP), graphical load (Otsu dark pixel proportion, ED2 edge density, number of connected components), line width (skeletonization ratio), and orientation (raw HOG peak; binned HOG pattern). Multiple parameterizations were tested for HOG (12 bins; cell sizes 5/10/25) and LBP (radii 2/3/4). Color features used per-channel histograms (6 or 9 bins) and 256-bin statistics.
Stylometric distance and optimization: Define mapel analogy by a radius of free variation k: two mapels are analogous if their feature-space distance d < k. The stylometric map distance D(A,B) is the proportion of mapels in each map that have at least one analogous counterpart in the other map (pointwise minimum-Hausdorff-like). Optimization seeks a feature space that maximizes inter-series distances and minimizes intra-series distances, while constraining dispersion to avoid sparse/inhomogeneous spaces. A genetic algorithm explores inclusion/exclusion of features, weights (0.05–1), and internal parameters; k is optimized via line search for each candidate. Features are normalized before optimization; GA evolves with population 20, parent ratio 4/20, up to plateau ~80 generations.
Mapotypes: After optimization, the feature space is iteratively subdivided using KMeans splits (K=4) until each cell radius r < k. Each resulting cell defines a canonical type (mapotype) representing a set of variant mapels. For visualization, the representative mapel closest to a cell’s geometric center is projected via t-SNE and arranged on a grid to form a “mapotypic mosaic.”
Statistical analyses: The distribution of mapotypes is stratified by time (nine uniform strata from 1605–1950) and scale (six strata from 1:200 to 1:53,500). Shapiro–Wilk W statistics assess departure from normality per stratum (characteristic figuration when W is low). Kendall’s tau measures inter-scale distribution similarity across time. Regional Kendall tests evaluate monotonic trends for interpretable features (graphical load, line width, number of components) across three scale-based regions to control for scale effects.
Key Findings
- Optimization outcome and feature selection: The contrastive optimization achieved a strong separation between series (objective ~3.59). The proximity matrix shows clear within-series similarity (diagonal) and between-series distinction, except for Napoleonic cadasters which are intentionally similar due to standardized conventions. Six features were retained from 13 candidates; morphology (HOG) had the highest weight (1.0) followed by graphical load (0.25 and 0.15), line width (0.25), texture (LBP, 0.1), and minimal orientation effect (0.05). No color-distribution features were selected, indicating color shade is not a strong stylometric discriminator in this setting.
- Mapotypes: 55,599 mapotypes were identified. Only 0.1% have fewer than 5 occurrences; over half (53%) have more than 100 occurrences and together account for 85% of all mapels. The mosaic is sensitive to linework, textures (hatching, grids), and textual elements.
- Temporal evolution and characteristic periods (SW W statistic): Early period (1605–1730) shows handcrafted shading, irregular hatching, thick dark lines, and iconography; later (1731–1817) sees more abstract textures (pastel colored hatchings), symbolic representations for vegetation and relief; W increases toward 0.92 by 1771–1817, indicating more diffuse figuration. The 1818–1834 period is most characteristic (W=0.81), dominated by thin lines and colored borders, coinciding with Napoleonic cadaster practices. Mid-19th century (1835–1866) shows regularized hatching and the rise of color printing; 1867–1884 remains characteristic with dashed lines and more text; 1885–1950 features prevalent color, contour lines, and halftone dot patterns.
- Scale-related divergence: Large-scale (1:200–1:2,500) and small-scale (1:25,001–1:53,500) maps occupy distinct figurative spaces. Kendall’s tau across scale strata shifts from weakly positive association early to markedly negative by late 18th/early 19th century, indicating a durable bifurcation of styles by scale.
- Macroscopic trends (Regional Kendall tests): Graphical load shows no trend 1600–1807 (p>0.05), then increases 1808–1950 (n=28, p≈5×10⁻⁸, S=479). Line width decreases 1600–1834 (n=13, p≈8×10⁻¹⁰, S=−175), then increases 1835–1950 (n=22, p≈3×10⁻⁹, S=365). Number of components increases over 1600–1950 (n=35, p≈1×10⁻³, S=404). These patterns persist after controlling for scale strata.
- Qualitative validation: The method highlights technological and cultural shifts: copperplate engraving legacies (thick lines) in 17th century; abstraction and symbolic notation by 18th century; Napoleonic cadastral standardization with fine ink lines and regulated color borders early 19th; industrial innovations (mechanical rulings, lithography, chromolithography) enabling fine regular hatching and later widespread color; national topographic map production with precise contouring; photomechanical processes introducing halftone screens by early 20th century.
Discussion
Findings support the feasibility of a fragment-based, interpretable stylometric framework that captures diachronic shifts in cartographic figuration. Contrasting coherent series yields a feature space where morphology (linework structure) dominates stylistic discrimination, aligning with expert knowledge about print techniques and tools. The approach validates hypothesized behaviors: series depicting the same places at similar scales are distinguishable (e.g., Paris atlases), while highly standardized Napoleonic cadasters are not, reflecting production constraints. The diachronic distribution of mapotypes reveals an epistemological transition from iconographic, handcrafted aesthetics toward symbolic, systematized representations, and later a rise in overlaid informational layers (topography, ownership, infrastructure), consistent with administrative, military, and nation-state developments. The method quantitatively identifies a durable bifurcation in figuration between large- and small-scale maps beginning around the turn of the 19th century, offering a measurable account of genre differentiation. These results demonstrate that cartographic stylometry can bridge quantitative analysis with historical interpretation, scaling beyond close reading to reveal macroscopic cultural and technological dynamics.
Conclusion
The paper introduces a transparent, fragment-based methodology for cartographic stylometry that decomposes maps into mapels, learns an optimized interpretable feature space by contrasting coherent series, and aggregates into mapotypes for diachronic analysis. The method effectively distinguishes series and surfaces long-term evolutions: a 17th–18th century abstraction process, a 19th century surge of fine linework and increased graphical load associated with cadastral expansion and printing innovations, and the widespread adoption of color, contour lines, and photomechanical patterns by late 19th/early 20th centuries. A notable contribution is the quantitative detection of a lasting divergence in figuration between large- and small-scale maps around the early 19th century.
Future directions include: refining the scale and number of mapels; allowing overlapping mapotypes; expanding or redefining series; accounting for authorship; incorporating alternative color spaces and additional interpretable features; exploring differentiable optimization; integrating semantics to connect style and meaning; and replicating across broader corpora to study cultural diffusion, evolutionary complexity, and links between visual signs and environmental perception.
Limitations
The study is a proof of concept with design choices that may influence outcomes: fixed tile size (50×50 px) and number (800 per map); reliance on 11 series to optimize feature selection; exclusion of color-distribution features may reflect current color encoding choices (RGB) rather than the irrelevance of color; t-SNE visualization is non-linear and not globally metric-preserving; sampling of large series (2–5%) may affect representativeness; segmentation and low-load filtering can influence which content is analyzed; the analogy radius k and clustering parameters (KMeans K=4) are design decisions. Some statistical tests (e.g., SW) have caveats for very large n (p-value validity), though W statistics themselves remain informative. The approach emphasizes style over semantics; meaning-making is not directly modeled.
Related Publications
Explore these studies to deepen your understanding of the subject.