A graphical analysis of national anthem lyrics

With attention to religious expression, Olympic performance,
and general bloodthirstiness

One of my 2010 New Year’s resolutions was simple: I wanted to learn the words to the French national anthem. My reasons for memorizing “La Marseillaise” were twofold: first, I’d always wanted to sing along with that climactic scene in Casablanca where Bogart, Bergman, and the whole gang at Rick’s Café Américain join together to drown out an annoying chorus of Nazi officers. And second, for the past few years I’ve undertaken an unsuccessful effort to teach myself the language of Voltaire and Hulot, largely by watching Le 20 Heures, the French national broadcaster’s nightly newscast.

I’ve seen a lot of Les 20 Heures over the years, enough to notice how certain stories cycle through every year or two: transit strikes, theater festivals, cheese fairs, Johnny Hallyday. One of these is a story I like to call “Are our students French enough?” The inciting topic is usually something to do with education policy, but the climax inevitably features clips of giggling high schoolers trying, and failing, to get through the first verse of “La Marseillaise.” Thus was my inspiration born: I know I’ll never be fluent in French, but darn it, at least I can out-Frenchify those French kids on one count. Le jour de gloire et arrivé!

But a funny thing happened—in the midst of my memorizing, I realized that the anthem’s lyrics were far stranger and more disturbing than I’d imagined: the last lines of every refrain go thus:

Qu’un sang impur
Abreuve nos sillons!

May an impure blood
Water our furrows!

In essence, kill the foreigner! It’s hardly the sort of thing you’d expect to hear from Ingrid Bergman, or a grinning high-schooler, let alone the French soccer team as they line up to lose the World Cup. I started to wonder: were other national anthems like this too? Was my own?

Soon I was clicking every link on Wikipedia’s List of national anthems, reading and copying and pasting all sorts of odd stuff from various national ditties. I made a giant spreadsheet of the full translated lyrics of every anthem I could find, then hopped over to wordle.net to generate frequency clouds of all the anthem’s not-so-common words (we’ll look at the untranslated and unsifted lyrics data further down). What follows are some findings and some thoughts—all to be taken lightly, of course. Whatever country you’re from, let me start by assuring you that its national anthem is the very best one.

A couple of notes about my technique: many of the countries with longer anthems (more on that below) have wisely designated only a stanza or two for their official versions. It is a wisdom that I admire, commend, and ignore: I’m an anthem maximalist. Who cares if the last stanza of Il Canto degli Italiani, which is a celebration of Poland’s independence struggle, isn’t in the version they sing before soccer games? And for that matter, who cares if “Il canto degli Italiani” is supposed to be sung in Italian? Wikipedia’s provided English translations vary in quality and archaic diction, but for the most part facilitate decent cross-cultural comparison. (The only anthem whose full translation proved elusive was the Somali hymn: its page tantalized with an English-rendered first verse, but the rest was impermeable to my internet searches and auto-translations. Fed snippets of the later stanzas, Google Translate guesses that it was written, depending on the excerpt, in Finnish, Spanish, Dutch, Malay, Estonian, Basque, Hungarian, English, or possibly Quechua.)

So what do the combined lyrics tell us? When it comes to national anthems, it’s all about the Land, in all its Mother- Father- and Home- varieties. It’s a concreteness that makes sense: this is where we are, and here’s a song about it. More surprising is the popularity of the less-substantial verbs may and let which, as a friend pointed out, speak to an anthemic tendency towards hopefulness and singing-into-being. If you squint your eyes and pretend some of the words rhyme, you can almost make out a one-size-fits all global hymn (imagine it playing as a population-weighted composite world flag runs up the pole).

The trouble with this arrangement, though, is that the residents of the world’s most populous countries are underrepresented: “March Forward Dear Mother Ethiopia” gets the same emphasis as “March On, Bahamaland,” which doesn’t seem quite fair.

To even things out, made a population-weighted text file, which contained 98 copies of China’s anthem for every one copy “My Kazakhstan” (countries below a certain Kazakhstan-ish population were left out entirely). Suddenly the word-cloud gets a lot more Asian:

New words jump to the fore: March, in particular, owes its clout to the Chinese anthem, “March of the Volunteers” which, true to title, contains a fourfold exhortation to march. The miracle of national branding that is the Indonesian anthem punches above its hefty population-weight by including the name of the country both in the anthem’s title (“Indonesia Raya” aka “Great Indonesia”) and in just about every other line thereafter:

Indonesia, my nationality

My nation and my homeland

Let us exclaim

“Indonesia unites!”

Perhaps a nation of a thousand-plus islands needs more than the occasional reminder what country it is they’re singing about. “God defend…—wait, what country are we again?”

By way of comparison, here’s a cloud of the anthems of the twenty-five least populated countries. God definitely gets a higher billing, and not just thanks to the Pope. Apparently smallness turns a nation’s mind towards higher things. (During communist times, the Bulgarian Anthem added a line that said “Moscow is with us at peace and at war” just to be on the safe side, higher-power-wise.)

Next, I split up all the anthems by region, and then performed my same word-weighting trick. Here is a world map with each region’s anthem-word-cloud scaled according to population and placed, roughly at least, over the region it represents. To make things a bit more legible, I made the following table of word-clouds by region; the word clouds from the map version are in the left column, while on the right there are clouds made from the full, unweighted lyrics from every nation in the region. Often this makes a pretty big difference. For instance the population-weighted lyrics from Oceania are all from “Advance Australia Fair,” while the unweighted cloud shows words from a diversity of (apparently far more God-exhorting) islands, many of which have adopted the Indonesian trick of featuring the country’s name prominently in the lyrics. Indeed, most of the time a country’s name shows up in these word clouds, an island is involved.

It is interesting to note which regions’ word clouds are most swayed by population-weighting. The Central American and Caribbean weighted cloud bears the heavy imprint of Mexico’s battle-saturated Himno Nacional (“War, war without quarter to any who dare to tarnish the coat of arms!”). In Europe giving the mini-nations’ anthem equal footing increases the relative presence of both God and the German (and Liechtensteinian) concept of fatherland. Africa, meanwhile, looks about the same either way, which would suggest that population heavyweights Nigeria, Ethiopia, and South Africa have anthems that aren’t much different than the pan-African average, emphasizing people, oneness, and, of course, Africa itself.

Now let’s step back, in turn, from our two starting hedges in this whole word-analysis game: translation and common-word filtering. The first allows the fiction of a more universal commerce in anthem-words; the second provides us the happy delusion that the words that stand out are especially interesting and meaningful. To do this, I created a new data set of untranslated and unweighted variants of the anthem corpus. Here’s what we get:

I tried to include all the official translations of a given anthem on the Wikipedia list (thus some countries with multiple official languages got to increase their word count accordingly). When we weight by population, a number of lovely non-Latin characters and scripts jump to the fore, most noticeably Chinese (simplified), Hindi and Bengali. Both India‘s and Bangladesh‘s official anthems are by the Bengali poet and Nobel Laureate Rabindranath Tagore, and I couldn’t quite tell if the Hindi version of the anthem is considered first among equals of India’s twenty-two official languages, or if any translation is official at all. I’d have loved to have added Tamil, Gujarati, Kannada, and lots of others, but I wouldn’t have known when to stop. So I added Hindi in the above chart, but not in the by-language breakdown table below.

My attempt at linguistic inclusiveness pushed the limits of Wordle’s admirable text-handling; often it took a few renderings to get the Chinese characters to appear in the cloud, and I know that several significant anthem-languages (notably Sri Lanka‘s Sinhalese) weren’t rendered at all. Preparing the common-words-removed versions of the corpus created graver doubts. Wordle will only remove common words from one language at a time, so I wound up having to delete articles, particles, and short prepositions by hand to generate the bottom two clouds—not so difficult with the languages I was familiar with, but much more challenging for the ones with non-roman scripts. Then there was the problem of homonyms, which allowed disparate languages to combine forces to rank certain words higher—this explains much of the dominance of the pan-Romance de and la and en in the upper-left column. Then there were words like die, which is an feminine article in German and a morbid verb in English.

Next, the same grid with all the words in English. Apparently the smaller countries are more likely to include plural pronouns and possessives.

Returning to the realm of the untranslated, here are word clouds for several of the major languages by population. Chinese is the only single-country language I included (since Taiwan and Hong Kong don’t have full and separate anthems, and Singapore and Vietnam offer only secondary Chinese-character translations). Official anthems in Bahasa Indonesia, Urdu, Russian, and Japanese represent more citizens than do the ones in German, but German has those extra mini-nations.

The Spanish and Portuguese language clouds both lead with the word patria, which is rendered as land in the translated cloud but contains additional overtones of home, nationhood, identity, and independence. If I had to summarize the body of Latin American anthems in three words, they would be those of the battle cry, ¡Patria o muerte! It’s also worth noting that Spain’s “La Marcha Real” currently lacks official lyrics, much to the consternation of their soccer players.

Where would national anthems be without sporting events? Well, they’d still be in Wikipedia, but my guess is a lot fewer people would be exposed to their wondrous (and yet oddly similar-sounding) diversity. So I made another weighted word-cloud, this time pasting each anthem in once for every gold medal a given nation has won (summer and winter games both included)—that is, once for each time that anthem has been played as the three flags rose behind the podium (forgetting for a moment that said performances are pretty much always instrumental). For this cloud I used my first lyrics from an ex-nation; the USSR’s medal count was too significant not to include the now-defunct Soviet Anthem, but I decided against teasing apart which Germans won gold with which anthems, or what to do with, say, the mixed Danish-Swedish team that won the tug-of-war at the 1900 games. But here’s a general sense of how the victorious athletes have been singing along (in their heads) over the years:

By way of equal time, here’s the cloud for the un-winningest nations. Again, it seems the smaller or less-Olympically-powerful a country, the more likely God is to appear in their anthem.

Ok, enough word clouds! One thing that I found myself more and more interested in as I worked my way through all the anthems was why some countries had anthems that, at least when I counted every stanza and chorus, were really, really long.

The country with the longest national anthem is Peru. It has 956 words in its English translation. Word for word, you could fit more than fifty Japanese national anthems into a single Peruvian one. The nineteen-word Japanese anthem is both the shortest and has the oldest lyrics, which are from a 9th century poem:

May your reign
Continue for a thousand, eight thousand generations,
Until the pebbles
Grow into boulders
Lush with moss

Japan’s competition in the oldest-anthem race is from the Netherlands, whose “Het Wilhelmus” dates to the sixteenth century but has been considered a national song for much of its history. It is still the third-longest anthem, whose length is due to the fact that it is an acrostic, with the first letter of each stanza combining to reveal the name of the Dutch republic’s founding leader, Wilhelmus von Nassouwe. The entire poem is autobiographical, narrated (a bit paradoxically) by the man known to history as William the Silent.

Now let’s look at a series of frequency maps for certain words. The numbers in the legend are normalized to instances per thousand anthem-words.

I found it a little surprising that Canada beats out the United States in frequency of references to Land, God and Freedom—this in spite of the “Land of the free and the home of the brave” closer to every verse of “The Star-Spangled Banner.” But the freedom-singingest anthem of all is “Stand and Sing of Zambia, Proud and Free.” No anthem mentions America particularly frequently, but it is interesting that every one that does is in South America, or as they call it, América. Nobody north of the Darién Gap seems to’ve felt a similar need. Now let’s consider another quartet:

It’s nice to have proof that anthems mention life more than death, especially given the question of bloodthirstiness that got me started on this research. I fear that the Enemies quadrant of the above chart may be a little misleading—not that the Chinese anthem isn’t all about enemies, but rather that other anthems just don’t use the word directly all that much. For instance, here’s a passage from the “Himno Nacional de la República de Bolivia“:

If a foreigner may, any given day

even attempt to subjugate Bolivia,

let him prepare for a fatal destiny,

which menaces such brave aggressor.

For the sons of the mighty Bolívar

have sworn, thousands upon thousands of times,

to die rather than see the country’s

majestic banner humiliated.

… and one from the final stanza of the Italian anthem:

Mercenary swords,

they’re feeble reeds.

The Austrian eagle

Has already lost its plumes.

When it comes to bloodthirstiness, it isn’t enough to talk about battle, weapons (“Farmers their axes sharpened / whenever an army advanced”—Norway), enemies, death, or even things being soaked in blood, whether the land (“The blood of our sires which hallows the sod”—Belize) or the flag (“Ever since the day when her lofty banner,

In letters of blood, wrote ‘Freedom'”—El Salvador). Anyway, much of the death-talk is about dying for one’s own country (“to die for the fatherland is to live”—Cuba; “O Martyrs! Your cries echo in the ears of time”—Iran), rather than killing for it.

When I asked my friends to guess which country had the most violent and bloodthirsty national anthem, a few guessed the United States, what with the rockets and the bombs. The anthem’s third stanza does add details less comfortable for a Fourth of July reenactment, noting that the blood of the invading enemy “has washed out their foul footsteps’ pollution.” All that said, though, I think my own country’s anthem doesn’t quite come close to, say, the all-or-nothing apocalypticism of Burma’s anthem (“Until the world ends up shattering, long lives Burma!”), or that of Mexico:

O, Motherland, ere your children, defenseless

bend their neck beneath the yoke,

may your fields be watered with blood,

may their foot be printed in blood.

And may your temples, palaces and towers

collapse with horrid clamor

Then there’s this bit, from Vietnam’s “Tiến Quân Ca“:

Our flag, red with the blood of victory, bears the spirit of the country.

The distant rumbling of the guns mingles with our marching song.

The path to glory is built by the bodies of our foes.

We’re a long way from simply plucking the Austrian eagle now.

The Algerian anthem goes still further. It opens like this:

We swear by the lightning that destroys,

By the streams of generous blood being shed,

… and pounds things home with a for-anthems-rare burst of 20th-century technology:

When we spoke, none listened to us,

So we have taken the noise of gunpowder as our rhythm

And the sound of machine guns as our melody

It is interesting to note that these last two are anthems that were written during their respective countries’ struggles against French colonial rule; one wonders whether there was a conscious effort on the part of the songwriters to out-Marseillaise the Marseillaise. But do they succeed? I’d shift the prize but for that last minor-key line before the clarion call to arms:

Ils viennent jusque dans nos bras

Égorger nos fils, nos compagnes!

They come up to our arms
To slit the throats of our sons and wives!

For me, that image—more even than the watering-with-blood that happens in retaliation, pushes it over the line—violence with a far more personal touch. Maybe I should have been content to learn “As Time Goes By.” +

