Elite polarization,
measured from parliamentary speech

An interactive map of how legislators talk about their own and rival parties, built from millions of political mentions extracted across national parliaments. It includes the world map, country trajectories, cross-national comparisons and the underlying data.

How the map is built

This project aggregates parliamentary data worldwide to measure elite polarization by analyzing what politicians say about one another inside parliaments. It employs an agent-AI-driven framework for parliamentary data collection, homologation, and analysis, orchestrated through the OpenClaw framework.

The system collects transcripts from academic archives across many countries. When corpora exist only on parliamentary websites, it uses web-scraping techniques to acquire them. Because many parliaments publish proceedings as PDF scans, the pipeline separates born-digital PDFs, where text is extracted directly, from scanned-image PDFs, where optical character recognition is applied. For video-only records, it extracts official captions or subtitles; otherwise, it tests automated speech recognition on a sample before transcription.

OpenClaw coordinates the workflow across collection, normalization, affiliation recovery, extraction, validation, and escalation. Collected transcripts are normalized into a unified format in which each row is one speaker’s turn, with country, date, speaker name, party, and speech text. Since speaker–party affiliation is often missing, a recovery algorithm uses registry and metadata joins, deterministic date-window rules for known affiliations over time, and fuzzy matching against biographical and parliamentary records. Unresolved cases then fall back to language-model lookup.

The extraction stage uses the Google Gemma 4 26B LLM to identify passages where a speaker evaluates another named politician, distinguishing attacks or praise from neutral and procedural references. Each mention is coded as in-group or out-group using recovered coalition affiliation. The output is a cross-national dataset of elite-hostility relationships, aggregated by party-pair and time period, covering over 42 million speeches across 82 countries.

How the corpus was acquired

82 corpora · 42.0 million speeches, grouped by how each parliament’s record was obtained.

Ready-downloadincl. ParlaMint releases

56

37.2M speeches

Web-scrapecustom site scrapers & APIs

13

3.7M speeches

PDF / OCRborn-digital + scanned-image text

13

1.0M speeches

World map

Choropleth of any indicator, pooled across all years (mention-weighted) or for a single year.

Indicator

Year

Trends over time

One indicator over a year window — line per country, or switch to mention-weighted regional averages.

View

Indicator

Countries

Year range

–

Country profile

Each party's trajectory within a country — out-party evaluation by default. Countries without a party breakdown show the national aggregate.

Country

View

Compare indicators

Scatter two indicators against each other for a given year. Bubble size = corpus volume; colour = region.

X axis

Y axis

Year

Heat map

Country × year grid for one indicator. Grey cells have no data.

Indicator

Data table

The underlying panel. Switch between country-level and party-level rows; filter, search and sort.

Level

Country

Indicator

Search

Elite polarization,measured from parliamentary speech

How the map is built

How the corpus was acquired

World map

Trends over time

Country profile

Compare indicators

Heat map

Data table

Elite polarization,
measured from parliamentary speech