Methodology — How Baby Name Finder Classifies Names

Last revised: 30 May 2026 · Reviewed by the Baby Name Finder editorial team

The five-step pipeline

Every name that ends up in our 15-language database moves through the same five-step pipeline. We publish it openly so that readers, researchers and reviewers can reproduce or challenge any individual ranking.

Pull from a primary source

A candidate name must first appear in one of the primary sources listed on our Data sources page — typically the most recent annual baby-name release from a national statistics office, or a recognised cultural name dictionary for languages without a central registry. Where the source publishes a CSV or downloadable list, we record the exact file URL and download date. Where the source publishes only a web page, we record the URL and the date the page was last accessed.

Normalise the spelling

We record the name in its authoritative native form first (Arabic in Arabic script, Hindi in Devanagari, etc.), then a romanised form using the transliteration standard most widely accepted for that language (e.g. Hepburn for Japanese, Pinyin for Chinese, Hunterian for Hindi). Diacritics are preserved — Sofía and Sofia are treated as distinct entries when both appear in the source.

Assign style, length and tier

Three deterministic labels are applied. The rules are listed in the table below — there is no editorial judgement at this step; the same input always produces the same output.

Label	Rule
Modern	Name has risen in the relevant national registry in the past 10–15 years, OR first appeared in the registry within that window.
Traditional	Name has been continuously present in the registry for 50+ consecutive years, OR is documented in a cultural / religious / historical source for that language community.
Top 100	Ranked 1–100 in the most recent published year of the relevant registry.
Trending	Ranked 101–300 in the most recent published year.
Under the radar	Ranked 301+ in the most recent year, OR present in a historical / cultural source but not in the most recent top 300.
Short / medium / long	1–4 / 5–7 / 8+ characters of the romanised form. Diacritics count as one character.
Boy / girl / unisex	Gender weighting from the most recent registry. Unisex requires ≥10% of births in each direction.

A name can be both modern and traditional (for example Olivia, present for decades and currently at a modern peak). Both labels are applied where both rules fire.

Source the meaning and origin

For each name we record:

Origin language — the language community in which the name originated, not just where it is popular now.
Etymology root — the underlying lexical root and its literal sense (e.g. Sofía → Greek sophia, "wisdom").
Cultural context — where relevant, the religious, mythological or historical figure most associated with the name.
Notable namesakes — where well-documented and public.

Etymologies are sourced from peer-reviewed onomastic references (academic surveys of personal-name etymology) and national-language dictionaries. Where two reputable sources disagree, we cite both and pick the more conservative reading.

Native-script verification

For Arabic, Hindi, Japanese, Chinese, Korean and Russian names, a second editor with reading competence in the relevant script verifies that the native-script form is correctly entered and matches the romanisation. Common pitfalls we explicitly check for: combining-character vs precomposed Unicode, RTL/LTR isolates, Han variant characters, and the difference between similar Cyrillic and Latin glyphs (е vs e, а vs a).

How popularity is scored over time

Popularity is not a single number. We treat it as three layered facts:

Current rank — directly from the most recent published registry year.
Trajectory — change in rank over the previous five years. A name is "rising" if it has moved up by ≥30 positions in five years; "falling" if down by the same.
Historical span — first and last year the name appears in the registry's top 1,000.

Charts on individual name pages are drawn directly from the cleaned annual registry CSVs. The cleaned per-year data files are available on request via the contact form.

How we compute the cross-language statistics

The statistics published on the homepage — average name length by style, count of names appearing across multiple languages, letter distribution diversity — are computed from the same names.js dataset that powers the generator. Anyone can reproduce them.

Reproducibility: the dataset is published as a single JavaScript file and is open for inspection. The aggregate statistics are deterministic functions of that file. If the published numbers ever fail to match the file's actual contents, that is a bug; please tell us.

What we deliberately do not do

We do not invent names. Every entry must trace back to a public, named source listed on the Data sources page.
We do not "score" names against subjective criteria like "beauty" or "uniqueness rating" — those aren't measurable and would mislead readers.
We do not bury name pages behind clickbait ranking lists. Names are presented with their facts; readers do their own ranking.
We do not republish meanings sourced only from commercial baby-name forums without a primary onomastic reference.

Limitations we acknowledge openly

Registry lag. National registries publish 12–18 months in arrears. The "2026 popularity" you see is generally 2024 or 2025 data, the most recent fully-published year.
Coverage asymmetry. Languages with strong central registries (US, UK, France, Germany) have richer data than languages without (e.g. Arabic across multiple jurisdictions). We compensate by using cultural and historical sources, which we disclose per language on the Data sources page.
Dataset scope. 5,000+ names is a sample, not an exhaustive inventory of every name in every language. For languages with millennia of naming history (Arabic, Hindi, Chinese) the underlying corpus is far larger; we have prioritised names currently in use.
Etymology disagreement. Onomasts disagree. Where two reputable sources differ, we say so on the relevant name page.

Questions or challenges

If you can show that a name is mis-classified, a rank is wrong, or an etymology has a better source, please write through the contact form. The faster path is to include the page URL, what is wrong, and a citation to the correct source. Our corrections policy describes what happens next.