I think it would be useful to provide a systematic alternative to English spelling, one that would get children reading, not at some specific level, but reading anything that's written, after a year or so of not very rigorous instruction. It would, ideally, be so simple that it could be taught in full to illiterate or functionally illiterate English-speaking adults in a single adult education course at a community center, or an online course. It would be great if it were found to be useful in teaching English as a Second Language, specifically to people who do not particularly need gain command of written English, but do want to be able to speak correctly. For reasons that I hope are obvious, it would not serve as a replacement for English spelling. It will not be used for law, business, engineering, medicine or science. It is intended as an alternative, simpler way of reading and writing English. It is a way of providing English with a human-readable soundtrack. It would be helpful if it had a visual appeal and a countercultural mystique, so that artists and designers, including tattoo and graffiti artists, added it to their graphical repertoire. It would be lovely if it went viral on the internet.
The idea came to me as I was thinking of a good demonstration project for an exercise in practical anarchy, where a few individuals acting autonomously can make a big difference and provide an alternative to a vast, entrenched, dominant, horribly flawed system. First, it is a problem begging for a solution: functional illiteracy rates in English-speaking countries are the shame of the developed world, and a lot of that has to do with the fact that English orthography, frozen in mid-17th century, does not reflect how the language now sounds, has numerous patterns and even more numerous exceptions and takes a ridiculously long time to learn. Second, English is the lingua franca throughout the world, and foreigners who learn English generally have no interest in etymology or history of the English language, and to them the spelling system is simply an obstacle.
On the other hand, English is a fairly simple language that could easily be written in a way that follows its phonological form. I believe I can solve it because I happen to be a trained linguist, and although I haven't delved too deeply into English phonology (until now) I know the principles. I also happen to be a software engineer, and, as it happens, the task of making this project work is 1% linguistic analysis and 99% software engineering. I think that it is realistic to make the 40,000 or so books available through Project Gutenberg also available in this new form by piping them all through a piece of software, which is yet to be written. The actual conversion process should only take a few minutes and can probably be done on demand. I think that it should also be possible to provide a browser plug-in that will convert English text on the fly. A somewhat bigger challenge would be to create a smartphone app that would allow users to photograph pages of English text and render them in a phonological alphabet.
This is an idea that is made for just this moment in time, when most text is digital, or available in digital form via the internet. In previous ages, the process of converting entire libraries of books would have been so labor-intensive as to be unthinkable. Each book would have to have been converted by hand, proofread, and then printed and distributed. If a new alphabet were to be used, this would require new typefaces to be cast and new typewriters to be made. To carry this project out would have required a huge clerical staff that would have had to be specially trained for this task, which, once it were completed, would have made their skills instantly obsolete. None of this would be possible without ample government funding. In short, it would have been a boondoggle writ large. Now, however, it is a matter of writing some software.
This is also an idea for a time to come. Access to information in digital form is only as reliable as the electric grid, which, recent experience has shown us, is not particularly reliable at all, with an exponentially increasing incidence of blackouts. Let us extrapolate some trends from the past decade to a time a decade or two into the future. Fossil fuels are still plentiful but so expensive that nobody would think of running a tractor, or a tractor-trailer, to bring food to the people, and so the people have to go where the food is growing. At the same time climate change is making large-scale industrial agriculture increasingly untenable, so that people have to grow their own food using labor-intensive pre-industrial methods. The educational system, which is currently producing high school students with 5th-grade reading skills, has long fallen by the wayside. But people will still want to be able to read to their children by the campfire at the end of a long day in the fields, and they will want to be able to teach their children to read. Will they want to teach them to read by spending years explaining to them hundreds of spelling patterns and making them memorize thousands of exceptional cases? Or would they rather have them learn a small set of symbols, teach them to use them to sound out syllables and to put them together into words, and then turn them loose on any book they can find?
* * *
The many responses that came in after I published last week's post showed that a great many people have no idea that there even is a problem, never mind what the problem might be. A constant refrain was “Works for me!”—I learned English spelling, and so should everyone else. Many more people wrote to tell me that spelling reform is politically impossible. One educationalist accused me of being against phonics—which is a way of teaching English reading by pointing out the various ways that letters can map to sounds, versus the “whole word” approach. Actually, I consider phonics to be the lesser of the two evils. One reader thought that English should be written using Cyrillic alphabet. Too late, it already is. Just look at the storefronts in Moscow and St. Petersburg: they are crammed full of transliterated English, some of it barely recognizable as such. Another made the commonsense but not entirely workable suggestion that we use the International Phonetic Alphabet. IPA is the professional tool which linguists use to describe speech sounds, but it has never been used directly to create an orthography for any language that I know of, for many reasons, one of which is that it's really quite ugly and hard to decipher. Only one actually went as far as acknowledging that there is a problem, but several expressed incredulity that the problem even existed. Perhaps I should have provided more references. Well, better late than never, and so here is a short summary of the problem, from the English Spelling Society:
English grammar and punctuation are relatively easy. But English spelling is quite the reverse - probably the most irregular of all alphabetic systems. Not only can you not tell how to spell a word from hearing it spoken; you can’t even be sure how a word is spoken from the written word – a unique “double whammy”.
The reasons for this irregularity are complex and largely historical. But the economic and social costs are serious. English speaking children take on average three years longer to learn to read and write than others and some never succeed. Our dyslexics struggle in a way that Italian and Spanish children do not. Adult illiteracy remains stubbornly high (23%).
I think the 23% number is being too kind; the functional illiteracy rate is much higher. If you click the irregularities link, above it will take you to a sort of English spelling cathedral of shame which, if you read through the entire list and try to make sense of it all, will probably leave you shaken. Is it really that bad? (Yes, it is.) And does making our children learn it classify as child abuse? (You decide.) Lastly, is all of this artificial complexity even necessary? (No, definitely not.)
This level of complexity and irregularity imposes a large cognitive processing overhead on those trying to learn to read and write English. Here is a diagram from a paper by Ram Frost titled Orthography and Phonology: the Psychological Reality of Orthographic Depth. He looked at the difference in the process of deriving sound from graphical form between the “shallow” orthography of Serbo-Croatian, where each letter represents a phoneme, and the “deep” orthography of English, where no such one-to-one correspondence exists. Apparently, the mind of a person who is learning to read and write English is crammed full of such nonsense. By the time the learning process is complete, the reader starts looking up the phonological form of the whole word, as if it were a random hieroglyphic; thus, no matter what the teaching process is, in the end learning to read English involves rote memorization of the written form of each word. It is little wonder that so many people never complete the process. Is there a better way? Well, not at the moment, but, obviously, there ought to be.
* * *
Babies are born ready to learn a language (or two or three): it is part of their innate developmental program. They do not need to be taught to babble. From just a couple of months old they start to spontaneously produce consonants and vowels. They start with just a few consonants and with just about every vowel and diphthong imaginable. Over time, their consonant repertoire increases while their vowel repertoire shrinks down based on what they hear around them. They start with single syllables, and eventually learn to string them together into words and phrases. Eventually the two complementary systems involved in language perception and production—the perceptual and the articulatory—become dialed in to a specific language, with its specific inventory of phonemes and phonological rules.
Phonemes are not physical but psychological in nature. They are not something that can be picked up by a microphone or analyzed by shooting an x-ray film of a speaker's mouth. Those are allophones, which are speech sounds produced by feeding a sequence of phonemes through a set of phonological rules. Phonemes are at a higher level of abstraction, and evidence for them, as with all psychological phenomena, is indirect. However, the existence of a phonemic inventory for each language is perfectly uncontroversial. The set of phonological rules is learned automatically and unconsciously, along with all the other automatic processes of language acquisition. One set of rules determines which phonemes are mapped to which allophones under what conditions. Another set of rules, in English as well as many other languages, such as Russian and Portuguese, governs vowel reduction: unstressed vowels decay to something shorter and generally indistinct, often called a “schwa.” (Think of the difference between the sound of the first 'o' in “psychology”/ “psychological” or the second 'o' in “photography”/”photographical”).
Consider the following minor (very minor) miracle: speakers of different English dialects can learn to understand each other without being taught to do so in school, and, in fact, without much effort at all. This is true even for those of them who are entirely illiterate. This is because they all have substantially the same underlying, psychological representation of English in their heads, which they express differently, via different sets of phonological rules. These rules do not need to be taught but are learned spontaneously, simply by listening. Most people learn the perceptual portion of the rules, allowing them to understand other dialects. Some people learn the articulatory portion as well, allowing them to sound British or Scottish or Irish or Australian, or, my personal favorite stealth dialect, Canadian. Thus, what makes English one single language has little to nothing to do with the way it's written. It is one language because it has a common phonological representation in the minds of its speakers. This allows them to understand each other without any reference to the way the language happens to be written.
Having thought about this for a couple of months now, I have come up with a set of conjectures that make the task of creating a shallow English orthography much easier. Here they are:
1. There is a specific phonemic inventory that is largely invariant across all the major English dialects
2. English dialects only vary mostly in their phonological rules; the underlying phonemic representations are substantially the same across English dialects
3. There is no phoneme corresponding to “schwa”: there are only vowel reduction rules which are learned spontaneously and automatically and do not need to be reflected in the orthography
4. Differences between English dialects that cannot be captured using a common set of phonemes are lexical differences that no orthographic representation can ever hope to bridge. Simply, certain words have to be written differently across certain dialects.
5. Thanks to Hollywood films (which make money by being shown without subtitles in all English-speaking countries) the best-understood English dialect throughout the world is General American, so that's the best one to serve as the basis for the alternative orthography. However, the phonological representation of GA can be relatively dialect-agnostic.
What is this common phonemic inventory? Here is the entire phonemic inventory for every dialect of English. It is an excellent tool for capturing the exact sounds of every dialect of English. But it is simply too large to serve as a basis for an orthography. But I have discovered that it can be pared down substantially for representing the phonological representation of English that is valid across dialects. Here is what I think is a minimal set, which I derived by looking at the presence of minimal pairs across major dialects. (The phonemes are shown between slashes, the allophones—in square brackets.) The vowels are the most troublesome, because there is potentially a very large inventory of them across dialects. But they can be pared down substantially by paying attention to the distribution of minimal pairs.
/ɪ/ , /i/ (shit/sheet) — rather important distinction
/ᴧ/, /ɑ/ (cut, father) can be expressed as one phoneme /a/ because there are no minimal pairs except, perhaps, come/calm and bum/balm, but since the 'l' is sometimes pronounced, why not just write it that way?
/æ/ (cat) — in RP (British “Received Pronunciation”) it is often pronounced [ɑ], causing ambiguity
/o/, /ɔ/ — can be taken to be two allophones of /o/ which sounds different depending on its context within a word
/ʊ/, /u/ (pull/pool)
Thus our minimal vowel inventory across all dialects is taken to consist of just these eight:
/ɪ/, /i/, /a/, /ɛ/, /æ/, /o/, /ʊ/, /u/
Liquids: /l/, /n/, /r/, /m/
/ḷ/, /ṇ/, /ɚ/ (bottle, button, butter) — these are consonants that act like vowels. Plenty of dictionaries insert a schwa in front of them, but, as I said, at the phonological level the schwa doesn't exist
To simplify things further, so-called “r-colored” vowels I take to be just regular vowels coarticulated with a following /r/, while diphthongs are taken to be just two coarticulated vowels: /oʊ/ = /o/+/ʊ/
The rest: /j/, /s/, /z/, /w/, /ʧ/, /ʤ/, /t/, /d/, /h/, /ŋ/, /k/, /g/, /f/, /v/, /p/, /b/, /ʃ/, /ʒ/, /θ/, /ð/
This gives us just 35 phonemes that need to be represented using unique symbols, which is a perfectly reasonable size for an alphabet. But it can be paired down further. Observe that there are eight consonant pairs that differ in just one feature: one is unvoiced, the other is voiced: /s/-/z/, /ʧ/-/ʤ/, /t/-/d/, /k/-/g/, /f/-/v/, /p/-/b/, /ʃ/-/ʒ/, /θ/-/ð/. It also turns out that, of these, the voiced ones occur only half as frequently as the unvoiced ones in English speech. Therefore, there is no reason to waste an entire separate symbol on each eight voiced ones. We can represent them as unvoiced ones with a “voicing mark” such as the one used in the two Japanese syllabaries: /g/ = /kv/, etc. This gets us down to just 27 symbols—one more than the Latin alphabet.
However, the Latin alphabet happens to be the wrong choice. Yes, it contains 26 different letters, but they are not the ones we need. It is possible to borrow diacritical characters from other languages, but the result will look foreign. (You may think that foreign looks cool, but I think that extraterrestrial looks even cooler.) Also, any attempt to recycle the Latin alphabet would result in something that looks like English horribly mangled and misspelled. For all its faults, written English does have a certain consistent aesthetic, which the alternative would lack. It would start out as a graceless hack, and would be instantly despised. It is better to start with something that is, at the outset, completely illegible, but where a few hours of effort later the sounds of words start to spontaneously pop right into one's mind with no additional processing required.
To illustrate my point, I spent a few minutes coming up with an IPA-to-Latin mapping that wouldn't look too ugly, borrowing a few letters from Old English/Icelandic, a couple more from Turkish, and a few more from IPA, but the result is still startlingly ugly. There is a strong interference effect, which no amount of fiddling with the mapping will ever eliminate. The symbols have to be fresh ones, with no preexisting associations of any kind, so that people who see them for the first time can pass no judgment on them. By the time they figure them out, they have breathed the air of freedom, realize what they have been missing, and the change in them becomes irreversible. So far, people have proposed using IPA, Extended Latin, Sampa, Cyrillic, Greek, Shavian and Deseret. None of these will work.For ol ıts folts, rıtṇ Iṅglış daz hæv a sṛtn konsıstent esþetık, wıc ðe æltṛnatıv wud læk. It wud start aut æz a greisles hæk, ænd wud bi ınstantli despaizd. It ız betṛ tu start wıð samþıṅ thæt ız, æt ðe autset, komplitlı ılejibl, bat weṛ a fyu auṛz ov efṛt leitṛ ðe saundz ov wṛdz start tu sponteıniaslı pop rait ıntu wanz maind wıð nou ædışṇal prosesıṅ rekwaıṛd.
* * *
The experiment, then, is as follows:
1. Compile a phonemic dictionary of English from various text-to-speech dictionaries by running vowel reduction rules backwards
2. Invent a set of symbols to represent all the phonemes
3. Compile a corpus of English literature and a browsing tool that uses the alternative orthography, plus some learning tools
4. Wait for the epiphany: “OMG I can read this, and it's written exactly how it sounds! Wow!”
And after that, who knows what will happen. And that, I think, is the beauty of practical anarchy.