Last revised: 30 May 2026 · Reviewed by the Baby Name Finder editorial team

The five-step pipeline

Every name that ends up in our 15-language database moves through the same five-step pipeline. We publish it openly so that readers, researchers and reviewers can reproduce or challenge any individual ranking.

Pull from a primary source

A candidate name must first appear in one of the primary sources listed on our Data sources page — typically the most recent annual baby-name release from a national statistics office, or a recognised cultural name dictionary for languages without a central registry. Where the source publishes a CSV or downloadable list, we record the exact file URL and download date. Where the source publishes only a web page, we record the URL and the date the page was last accessed.

Normalise the spelling

We record the name in its authoritative native form first (Arabic in Arabic script, Hindi in Devanagari, etc.), then a romanised form using the transliteration standard most widely accepted for that language (e.g. Hepburn for Japanese, Pinyin for Chinese, Hunterian for Hindi). Diacritics are preserved — Sofía and Sofia are treated as distinct entries when both appear in the source.

Assign style, length and tier

Three deterministic labels are applied. The rules are listed in the table below — there is no editorial judgement at this step; the same input always produces the same output.

LabelRule
ModernName has risen in the relevant national registry in the past 10–15 years, OR first appeared in the registry within that window.
TraditionalName has been continuously present in the registry for 50+ consecutive years, OR is documented in a cultural / religious / historical source for that language community.
Top 100Ranked 1–100 in the most recent published year of the relevant registry.
TrendingRanked 101–300 in the most recent published year.
Under the radarRanked 301+ in the most recent year, OR present in a historical / cultural source but not in the most recent top 300.
Short / medium / long1–4 / 5–7 / 8+ characters of the romanised form. Diacritics count as one character.
Boy / girl / unisexGender weighting from the most recent registry. Unisex requires ≥10% of births in each direction.

A name can be both modern and traditional (for example Olivia, present for decades and currently at a modern peak). Both labels are applied where both rules fire.

Source the meaning and origin

For each name we record:

  • Origin language — the language community in which the name originated, not just where it is popular now.
  • Etymology root — the underlying lexical root and its literal sense (e.g. Sofía → Greek sophia, "wisdom").
  • Cultural context — where relevant, the religious, mythological or historical figure most associated with the name.
  • Notable namesakes — where well-documented and public.

Etymologies are sourced from peer-reviewed onomastic references (academic surveys of personal-name etymology) and national-language dictionaries. Where two reputable sources disagree, we cite both and pick the more conservative reading.

Native-script verification

For Arabic, Hindi, Japanese, Chinese, Korean and Russian names, a second editor with reading competence in the relevant script verifies that the native-script form is correctly entered and matches the romanisation. Common pitfalls we explicitly check for: combining-character vs precomposed Unicode, RTL/LTR isolates, Han variant characters, and the difference between similar Cyrillic and Latin glyphs (е vs e, а vs a).

How popularity is scored over time

Popularity is not a single number. We treat it as three layered facts:

  1. Current rank — directly from the most recent published registry year.
  2. Trajectory — change in rank over the previous five years. A name is "rising" if it has moved up by ≥30 positions in five years; "falling" if down by the same.
  3. Historical span — first and last year the name appears in the registry's top 1,000.

Charts on individual name pages are drawn directly from the cleaned annual registry CSVs. The cleaned per-year data files are available on request via the contact form.

How we compute the cross-language statistics

The statistics published on the homepage — average name length by style, count of names appearing across multiple languages, letter distribution diversity — are computed from the same names.js dataset that powers the generator. Anyone can reproduce them.

Reproducibility: the dataset is published as a single JavaScript file and is open for inspection. The aggregate statistics are deterministic functions of that file. If the published numbers ever fail to match the file's actual contents, that is a bug; please tell us.

What we deliberately do not do

Limitations we acknowledge openly

Questions or challenges

If you can show that a name is mis-classified, a rank is wrong, or an etymology has a better source, please write through the contact form. The faster path is to include the page URL, what is wrong, and a citation to the correct source. Our corrections policy describes what happens next.