2019-10-30

Pedantry and Funny Letters


I suppose this issue first came to my attention, in pre-computing days, when I was a small child. I was lucky to be provided with lots of books and had noticed that encyclopedias were sometimes styled “Encyclopedia”, sometimes “Encyclopaedia”, and sometimes – even more mysteriously - “Encyclopædia”. I later learned that formations like “æ” are sometimes called “ligatures”, but we were taught nothing about such things in school and I am still rather unclear about why “ae” is sometimes rendered as “æ” and why this rendering is far less common these days.

Living close to Haworth, as we did and as I now do again, I was also aware of the “Bronte” sisters – sometimes so rendered, sometimes as “Brontë”.

As a borderline dyslexic (and frankly obsessive) child I found such things confusing and the inconsistencies downright disturbing. These traits and experiences (and my subsequent experiences trying to teach English as a foreign language for a couple of years in Germany) impelled me – later in life – to do copious research on English usage – even though I was, and am, very much on the other side of cultural divide famously described by CP Snow.

Being a “one-eyed man in the land of the blind” entitles me – I often feel – to shout “FEWER!” at my science and engineering colleagues when they use “less” in the “wrong” places, and to decry their dreadful punctuation.

But playing the role of a pedantic old fart, you eventually learn that pedantry is rarely justified when it comes to the English language. It still grates when my IT colleagues reinterpret words like “issue” and “deprecate”, but I long ago realized I was fighting various losing battles. Language changes and nearly all “rules” have exceptions.

Nonetheless, the pedantry (serious or ironic) of humans is as nothing when compared with the pedantry of computers.

#######

Most languages other than English are riddled with funny letters. These often (though optionally) crop up (in English) in words, like “protégé[i]” that we have borrowed from other languages, or in names like “Motörhead” where, I presume, the German umlaut is used for stylistic effect. Restricted as they were to the original ASCII set of 26 letters of the English alphabet (upper and lower case), ten digits, and 32 special characters; early computer programs were unable to cope with stuff like this. Germans could write “oe” for “ö” since (historically) this is what “ö” is an abbreviation for, but I suppose French people and Nordic types (with their Øs and Ås) were a bit stuffed. I do not believe that “oe” was – in this context – ever written as the ligature “œ”, but I may be wrong.

Now we have ISO/IEC 10646 and can render over 136,000 different characters, including – I recently learned as part of my day job – the Korean Hangul characters “ and “기문 that stand, respectively, for “Ban” and “Ki-moon”.

Some problems remain in my field of data exchange and sharing however. It is wonderful that I can now send a computer someone’s given-name rendered either as “Ban” or “”, but I must still tell the receiving computer what I am sending it somehow. If I say I am sending a “given-name”, and someone else says “GivenName” or “first-name” or “Vorname” or “Christian name” our exchanges are going to be technical (and possibly diplomatic) failures. So the problem of getting everyone to do the “right” thing (i.e. the “same thing”) returns at a new level.

I was reminded of this yesterday when Sarah Churchwell[ii] (@sarahchurchwell) brought it to my attention on Twitter that The New Yorker’s style-guide mandates “coöperation” where most people – American or English, journalists or otherwise – would write “cooperation”. The problem – that “coop” invites the pronunciation employed in “chicken coop” – is more often solved by inserting a hyphen: “co-operation”, but more often still, not addressed at all. The diaeresis[iii] in “coöperation” informs the human reader that the second “o” should be pronounced separately, but the computer receiving the symbol “ö” is quite unable to distinguish between its usage as a diaeresis or as a German umlaut – except (I suppose) by context (as, indeed, we humans have to).

So what is the poor data sharing engineer to do? Sometimes we can allow multiple synonyms and treat “Encyclopedia”, “Encyclopaedia”, and “Encyclopædia” (or “cooperation”, “co-operation”, and “coöperation”) as the “same” things – without needing to judge which is “right”. Even then, we have to have agreed terms for these sets of synonyms.

So it is “déjà vu all over again”: at some level, we do have to force everyone to do the “right” thing when they supply information as text.

As users and authors of ISO standards, my colleagues and I are mandated to use Oxford English Dictionary (“OED”) preferred spellings and usages. Because the OED prefers “ize” ending over “ise” endings (e.g. “standardize” over “standardise”), I am constantly having to explain that we do not use “American” spellings and that our American cousins are being truer to “proper English” in using “ize” than most Brits are. In other areas, the OED has changed its recommendations – in response to changes in usage – from under our feet. We are left insisting on stuff which we can no longer justify by pointing at the OED but which our computer programs still expect.

In short, we have to live with funny letters and funny (and changing) usages of letters and words; and we have no ultimate basis for judging any of this. But judge we sometimes must.


[i] The name, as it happens, of one of the programs I use.
[ii] Professor of American Literature at the University of London.
[iii] We were taught about this in school – in connection with the nearby Brontës.