Tuesday, December 11, 2012

Applied Anarchy Part III: The Design Phase

[Six-month update. The project is alive. To see what it looks like now, scroll down.]

[Update: There is now a reasonable bitmap font that hints of brush calligraphy; the chart and the sample below have been updated. The sample now shows stressed vowels as elongated.]

[Update: By popular demand, I included a little poem at the end, so that it's clear what text looks like, and for your deciphering pleasure. Please note that font design is yet to be done.]

If you have been following along for the last two weeks, you probably have some idea of what happens next; if not, you will need to catch up: here is a description of why English spelling a problem, and here is an explanation of what can be done about it. In short, English has the world's worst orthographic system that happens to be in common use, and it causes a great deal of damage. Just the cost of the several extra years of schooling needed to learn English spelling (much of it to no avail), together with the opportunity cost of not learning something more useful, runs into many billions of dollars a year. The economic damage caused by widespread functional illiteracy is harder to quantify.

There has been a lot of discussion since I published these two posts, along with numerous expressions of support. Several software developers who are also linguists stepped forward with offers of help. Given this level of interest, I intend to push forward with this project.

The task at hand is to create a new, better way of writing and reading English (of the General American variety)—one that is entirely regular and represents each psychologically real speech sound (phoneme) with exactly one symbol (glyph) and, unlike the current system, takes a minimal amount of time to learn for either a native speaker or a student of English. The goal is to design and write software that will provide an alternative way of rendering English text and to make it available for web sites, electronic books and electronic documents of all kinds.

The task at hand is not to reform English spelling. Nor is it to indulge people who want to waste time discussing such futile endeavors. It is to create an alternative, not a replacement.

The task at hand is not to create a new way of writing English using the Latin alphabet, since that's already been done, in spots and in stripes. It's called Lolcat. If that is of interest to you, o hai, blessinz of teh Ceiling Cat be apwn yu, srsly. nuf sed. Kthbye!

The task at hand is not to scare up the requisite number of random squiggles and doodles, either from foreign alphabets, or from previous failed experiments at phoneticizing English, such as Shavian. This has to be an original design.

* * *

The process of coming up with something out of nothing is always a mystery. Be that as it may, what appears to work best is a process that starts with setting forth a set of requirements, followed by a very strange period of time during which nothing productive seems to be happening. This can take a long time, and if there happen to be project managers wandering about asking futile questions like “How long do you think it will take?” or “When do you think it will be done?” then it can take forever, because that is how long it can take to calculate what is incalculable. This period of time ends either with a design that fulfills the requirements, or with failure. In the worst case scenario, the design is done by a committee, it is agreed that failure is not an option and that furthermore we are out of time, and next thing you know there is an unspeakable horror of a design scampering about demanding to know why mommy doesn't love it. Given how many such horrors have come to inhabit our world, I have come to view design failure as just a special kind of success.

The Requirements

1. A set of unique symbols for each of 35 generalized phonemes of General American English.

2. Must not have interference effects with Latin or produce unwanted associations with English spelling.

3. Must not resemble the writing of any known foreign or artificial language, dead or alive, any existing decorative style, motif or esthetic, and must be as free of pre-existing associations as possible.

4. Must be easy to read and to write. Must not pose a problem for dyslexics, people with limited manual dexterity or people who have learning disabilities. Must not require superior penmanship to produce perfectly legible, professional-looking results by hand.

5. Must be easy to learn: Related sounds must be represented by related shapes and make use of Latin cognates when possible.

6. Must be optimized for mobile devices. Must allow easy text entry using numerical keypads, touch pads and multitouch screens. Must not require a keyboard layout. Must not degrade visually when rendered using a minimum number of pixels.

7. Must work well for multiple applications such as carving in wood or stone, rendering on paper using calligraphy brushes and pens and on walls using spray paint. Must not require breaking up symbols into sections when making stencils, which, by the way, are becoming an increasingly important form of public art.

Kill the state within yourself

8. Must be able to represent stress (since English has lexical stress) and intonation (since English has fixed word order and uses tone for topicalization) without requiring additional symbols such as accents or other diacritics.

9. The design must be sufficiently general, and at the same time sufficiently highly constrained, to allow for a wide variety of artistic renderings without causing ambiguity or loss of legibility.

10. Must employ a minimum number of distinguishing features in specific configurations, to allow it to be directly mapped to fingertip-readable Braille-like patterns, Morse code-like sequences, bugle calls, drum beats, semaphore signals, dance steps (with or without pompoms) and so on without requiring one to memorize a separate code table in each case. A business card-sized laminated card of easy-to-follow instructions should be all that is necessary in each case.

The Design Process

The process of coming up with something new is, by definition, ill-defined; if you know what it is you are looking for, then it is, by definition, not new. All of the most important human activities, such as coming up with scientific discoveries, mathematical proofs, artistic breakthroughs and the like fall into this category. There are just two broad, sweeping generalizations that apply to them. One is that broad, sweeping generalizations do not apply to them.

The other broad, sweeping generalization (that doesn't apply to them either) is that people like to have an explanation for how something came about, and find it deeply unsatisfying to think that all the great new things are, in fact, accidents. This usually makes it necessary to retrofit an explanation for how something new came about before people will leave you alone about it.

And so, to indulge this very general penchant for wanting to know the details of something about which no detailed information exists, here, for your enjoyment, is a story of how I came up with this particular set of symbols (glyphs) to represent the sound system of English.

* * *

Some things are immediately obvious just by looking at the requirements. For example, the need to be able to make stencils without breaking up the shapes means that there can be no closed geometries such as circles. The need to be able to easily and accurately render the symbols using pens, brushes and chisels, together with the need to limit the number of distinguishing features, indicates the use of strokes within a largely rectilinear pattern. The need for easy numerical keypad and touchpad entry limits the number of basic elements to what fits on a numerical keypad. But how to put it all together?

Aboard the No. 49 bus

In search of inspiration, I took the No. 49 bus to the Russian Museum. It has a huge art collection spanning many centuries. The older works are mainly portraits of dead Russian aristocrats. Many of them look like somebody took a toilet plunger to their faces, and I find that amusing but not inspiring.

The Russian Museum, St. Petersburg

There are also large collections of works by some of my favorite painters, including Repin, Levitan and Aivazovsky. But it is the modern art exhibited in the Benois Wing that is capable of inspiring an abstract design of the sort needed. Of that, the Suprematist works by Kazimir Malevich turned out to provide just the right foundation for this project.

Kazimir Malevich, The Black Square

Malevich came from humble beginnings (his father, a Polish immigrant, worked at a provincial sugar factory). He was largely self-taught, and rose to international prominence during the heady period around the Russian revolution during which Russian avant-garde art took the world by storm. He was the founder of the Suprematist movement, which sought to succeed the Constructivist movement by liberating art from any physical or social constraints.

Kazimir Malevich, Self-Portrait

Malevich contended that true art existed in the realm of pure feeling, of form and color unrelated to physical objects and freed from utilitarian demands. His Suprematist works were wildly successful. Writing in 1915, the establishment art critic Alexandre Benois, brother of the one who architected the Benois Wing, had this to say about his masterpiece The Black Square:
This is not a simple joke, not a simple challenge, not an accident or a small, insignificant episode... this is one of the acts of self-assertion of a new direction, which leads to filth and squalor, and which boasts that, through pride, through arrogance, through trampling all that is lovely and tender, it will lead us all to perdition.”
I believe Malevich's work stood on its own merits, but such lavish praise certainly couldn't have hurt. If your art opening serves up perdition, you can safely skimp on the wine and the cheese and crackers. Needless to say, his art languished during Stalin's reign, when the artistic dictate from the very top was to adhere to Socialist Realism—art in the service of the people—which is just the sort of utilitarianism and attachment to social convention and physical form that Malevich despised. In later years his masterpieces were again exhibited, although rumor has it that The Black Square, in an affront to all that is good and decent in this world, was for a time exhibited upside-down.

There is much to say about The Black Square that I hardly know where to begin. I suppose it may be hard to tell what it is you are looking at from an image on a web page, but when viewing it directly the normal reaction is along the lines of “Oh my God, there it is, that's the one!”—for you have now seen it. Note the definite article “the”—it is there for a reason. What painter would ever paint another black square? (An ironic one, I suppose, and call it “Yet Another Black Square”—but that would be so derivative.) In an infinite universe of black squares, Malevich has achieved what amounts to a singularity. Look at it another way: note that The Black Square signifies itself and only itself. Is it not the textbook definition of a black square? And is it not also a painting of one? It is denotation and connotation rolled into one. It does not exist in contrast or in opposition to anything else. Its only relationship is to you, the person standing in front of it at the Russian Museum, thinking “Oh my God, there it is!” and in this sense it is not the black square at all; it is your experience of it. Or, rather, your lack thereof, because, after all, what is there to experience? After all, it's just a bloody black square, isn't it? And so maybe you are not experiencing it; maybe it is experiencing you. To paraphrase Nietzche, “If you gaze long at the black square, the black square also gazes at you.” Magic, isn't it?

The set of symbols I tasked myself with creating is quintessentially Suprematist. They are severed from all social convention or pre-existing esthetic. They are intended to trigger a direct, unmediated response in you, their viewer, and that response is to want to utter phonemes—objects with no physical reality that only exist within your mind. Their function is to create an experience: the experience of visualizing a speech sound as an abstraction. It is the experience of standing before an abstract shape and feeling the /æ/ (as in "cat") or the /ɚ/ (as in "squirrel") rise up inside you—in spite of there being no cats or squirrels or any other objects or words present, depicted, or alluded to. They seek to wire your visual cortex directly to your speech center, bypassing all that could possibly stand between you and the language that resides in your head. The only conventions by which they are constrained are the conventions of the symbols' own geometric shapes. Aside from the need to learn which symbol represents which speech sound, the meaning of each symbol is defined not by anyone else but by your own subjective experience of it—the experience of recognition of something that is already within you: “Oh my God, there it is!”

* * *

But the work I found most germane to the task at hand was not The Black Square but another masterpiece by Malevich, displayed alongside it: The Black Cross. It also is earth-shatteringly, breathtakingly, eye-wateringly obvious, but it gave me so much more to work with. Rather than just one square, it has nine: five black ones, four white ones. Viewed on another level, the image consists of two intersecting strokes: one horizontal, one vertical. And so here is our first symbol:

Kazimir Malevich, The Black Cross

My next steps were obvious as well: fill each of the remaining eight squares with a distinctive pattern of strokes, as few strokes as possible, using the black cross, in the central position, as the seed. Make the strokes flow top to bottom and left to right with a minimum of discontinuity or backtracking. Make the set of symbols look coherent by making sure that they all have an element in common. I chose it to be the lower part of the cross, giving each symbol a convenient mounting point. I also chose the horizontal line of the cross as the baseline that runs through all the symbols, so that all the symbols hang from that line, making them easier to align visually while writing and to scan through quickly while reading. Here are snapshots of my creative process:



These symbols fulfill all of the requirements except one: there aren't enough of them. We need exactly three times as many symbols as this.

* * *

I wandered the exhibition halls in search of an idea that might help me break out of this impasse, and eventually found myself in the museum gift shop. And there I found it!

Troika bearing Grandfather Frost and the Snow Maiden

It is holiday season, and there were holiday greeting cards on sale. The biggest holiday in Russia is the New Year, and New Year's greeting cards often depict two stock figures of Russian folklore: Ded Moroz (Grandfather Frost) and Snegurochka (the Snow Maiden), his sexy female side-kick, putatively his granddaughter but let's not pry. Greeting cards often depict this duo riding a sleigh laden with gifts pulled by a troika of horses. Unlike a mythical character like Santa Claus, children have no trouble believing in these two because they do in fact exist: on New Years Eve many thousands of instances of them are spawned and fan out across the landscape, ringing doorbells and distributing presents. The Snow Maiden is particularly popular with the children.

Typical specimens

And then it hit me: hitch each of the nine symbols up to a troika! Nine symbols times three horses gives us 27, and with 8 additional variants that will be created by adding a voicing mark we will reach 35—the exact number we are looking for! Note that all of the nine symbols have a constant element: the foot. This foot can now be made to have three variants: short, medium (with a curve, to symbolize the shaft bow that is part of the harness of the central horse in a troika) and a long one.


With the addition of the troika our set of symbols expands to 27, and my work is almost done:


But how do we distinguish the 8 voiced consonants from their unvoiced counterparts? What will the voicing mark look like? Malevich to the rescue again. Here, we make use of his third masterpiece of the series, which is sometimes called his Triptych: The Black Circle.

Kazimir Malevich, The Black Circle


And here are the eight voiced variants, with the voicing mark inspired, once again, by the great Malevich:


* * *
My last task was to match our new symbols to the 35 phonemes. This I did with the help of a phonemic frequency table, making sure that the most frequent phonemes require the fewest, shortest strokes with the least amount of backtracking. I arranged it so that, when writing, a third of the symbols is rendered using a single unbroken line in a single flowing movement. In addition, I visually distinguished the vowels by mapping all of them to symbols with the short foot, and matched up the five symbols that have obvious Latin cognates (I, F, T, S, J) with their corresponding phonemes. I also made sure that related phonemes (in terms of shared phonological features) mapped to related symbols (in terms of graphical elements); thus, /æ/ is a variant of /a/ and so on.

Stress will be represented by horizontally elongating the vowels, while tone will be shown by shifting the baseline—the horizontal line that runs through most of the symbols—up and down slightly. I believe that this set of symbols (glyphs) fulfills all of the requirements I set out, but please do check for yourself and let me know if you discover anything that is problematic.


And here, for your deciphering pleasure, is a fragment from a poem by Lewis Carroll:

Six-Month Update

After a year of steady refinement, here's what Unspell looks like now


And here are your ABCs unspelled. Sing along! (The ones at the bottom represent English sounds that are missing in Latin1.)





44 comments:

xulfus said...

You must be quite mad. I love it.

Vagabond Anne said...

Very interesting. It looks strangely like Hebrew. It could work, but then what will teenagers do for lulz? I was kind of perversely enjoying lolcat nonsense.

jhughston said...

Your exploration of this project is fascinating. Thoroughly enjoying it and learning a great deal. Thank you! Your application is outstanding.

One observation, whilst very easy to read in the 3 x 5 grids, in the rendering of the glyphs in the final chart, some do look similar and perhaps may be easy to confuse. Examples include the 5th vs 7th vowel/semi vowels, and the 2nd syllabic consonant vs the 5th stop/affricate.

Perhaps a solution is to add one extra line to your grid pattern. Not possessing the skills to render examples please excuse this attempt to explain.

In essence you have a 3 x 6 grid pattern when you add in the voice mark. Using a 3 x 7 grid, again the voicing mark only ever appears in the first row. In the second row is only ever the horizontal line that appears at the top of your glyphs in the bottom row of your chart of 27 characters. The fourth row (middle) becomes the baseline with the third line being for the 'tails' that rise above.

This suggested layout serves to create a greater distinction between some of the characters and would likely provide less problems for the visually challenged in distinguishing the examples listed above. The 3 x 7 grid also has an aesthetic balance.

Lance M. Foster said...

It looks like it would work to me. Quick work Dmitry! I also enjoyed the virtual museum trip and the insight into your thinking process.

One suggestion is that reading any language is helped by the actual shapes of the words too, not just the individual letters and their sounds. Sounds help in learning how to say the words, while word-shapes help in reading once the word has been learned.

A thought that comes to mind is to make it so that if you have a baseline, the vowels, which are all rather compact, rest on the line itself, while all the consonants extend below and/or above the baseline.

That way one can look at a word and immediately not only identify vowels and consonants, but each word takes on a shape of its own more clearly, which helps in faster reading...seeing the entire word as a lexical item as well as each glyph as a phoneme.

Dmitry Orlov said...

jhughston -

I am playing around with a variation on your idea. Basically when there is a double horizontal, the top one of them becomes the baseline. Excellent idea, thank you for contributing!

Unknown said...

I like what you are trying to do. (I'm the guy that referred you to BTRSPL last week at http://ententetranslator.com/btrspl.html which can translate whole books into a choice of 3 proposed simplified spelling systems in a matter of seconds (all Latin based). It can also translate the converted books back to regular English spelling.)

As for your proposal for an orginal non-Latin alphabet, I am not so sure, although the idea of an alphabet that is less ambiguous and is deliberately designed to have more distinct characters is appealing. I would have to think about it. However, whatever you come up with, I wonder if your project could go even one step further: make a spelling phonetic system that could work with all languages and human sound. In a way, all Latinized languages are really close to being the same language, if only the relationship to sounds and alphabetic characters were stronger and more consistent. For instance, to me with my Latin alphabet based learning,Croatian seems a lot friendlier than Serbian. In making an original universal alphabet, all languages would then have something in common.


Anonymous said...

All well and good if there is a grid to refer to for scale, but since that will rarely be the case, there will be trouble differentiating /a/ from /m/, or /i/ from /n/, or /o/ from /p/, or /Russian E/ from /s/ (and possibly others I didn't see). These pairs differ only in scale, essentially. So differentiation relies heavily on penmanship and is not robust. Imagine if writing a large A meant /a/ but writing a slightly smaller A, or an A with a slightly smaller leg, meant /s/.

Ozymandius said...

Why not replace all/some of the right angles with constant radius curves. Or maybe just for the vowels. Easier on the eye and easier to write. Aka the Black Squiggle... Btw: thanks for opening my eyes to the fact that the English language is hard to master.

Anonymous said...

Off-topic a bit, the images accompanying the posts lately have been quite inspiring.

Lukas Vangheluwe said...

fascinating work, very well done.
a few comments, not really to be critical, but something seems to be missing.
While learning how to write chinese and japanese, I soon found out that the right way to go about it is not to reproduce the right shape, but rather to perform the right set of movements, which then result in something very close to the ideal shape. (This is what makes it possible for chinese readers to figure out very cursive script : it's not the shape, but the movement that counts). Actually, the same holds true for latin and cyrillic and I suspect arabic script. We do teach our children (mostly) to just copy the visual shape, but the important thing is to learn to make the right moves. For latin, this is approached in the simple italic script taught in calligraphy : it teaches a reduced set of letter components, which have to be put together in the right order to produce letters. If you learn the right movements, your writing can become quite sloppy before it becomes unreadable.
Which is where your alphabet raises questions : at smaller sizes, a square is hard to distinguish from a dot, and what is more, to faithfully reproduce this alphabet, you would need to use a square-nibbed pen at right angles (RSI-inducingly uncomfortable) and a round one to put the dots on.
That is the great advantage of existing alphabets : over the centuries, writing and reading have bounced back and forth to produce something that is as easy to write as it is to read.
In that context, the stencil argument is a little silly : traditionally, the stencilled letters were broken exactly where they were thinnest when written. In some cases, this can even enhance the readability of stencils, especially in bad light or from afar. The occasional use of an enclosure would make a set of characters that were easier to distinguish from each other. Round forms should be included in an alphabet such as you propose, and it should be as easy to write with a square nib as with a ball-point or a pencil or a sharpie.
The conservative in me would thus frivolously propose a few, more traditional, proposals. One is to look at how cheko-slovakian is written. They are very proud about how their alphabet has few ambiguities or inconsistencies. They use the latin alphabet with three diacritic marks : on top the inverse circumflex, the ball, on the bottom a sort of cedilla, and in polish you can slash an L to make it into sort of a w. And if it is important that the 'new english spelling' should be visually distinguished from 'ye olde anglisshe scripte', why not use a script that has already left most of its childhood maladies behind, like cyrillic, or maybe even arabic with vowels added? (even if lolcat seems to be able to lose quite a few vowels).
regards and thanks for a brilliant concept.

I hope I haven't posted this twice... if so, please delete.

Robin Datta said...

When I was taught cursive writing, it was emphasised that the pencil should not be lifted off from the surface of the paper until the word was complete. Dotting the "i"s and "j"s and crossing the "t"s is done after the word was complete. The word is a continuous line from beginning to end.

Picador said...

While I'm impressed by your enthusiasm for this project, I echo the concerns above that this character set would be both 1) difficult to write quickly by hand, and 2) ambiguous, especially when handwritten. The short vs long vertical line distinction in particular seems like it would be hard to establish clearly in handwritten symbols.

As for the general scope of this social project: the US and Canada have both been trying to migrate to the metric system for decades. There are IMMENSE incentives to make this migration away from the (insane) imperial system of measurement, and the transition costs actually aren't very high -- in Canada, most of the work is already done. Yet no American, and almost no Anglo-Canadian, can tell you their height in centimetres. This is unlikely to change any time soon. See also: Esperanto. I'm just saying that projects like this have a disappointing track record, when when they seem like they should produce enormous rewards for minimal effort.

lemcody said...

I want to play with this a bit. Could you post a pronunciation key along with the glyphs with english word examples?
The first job is to play and see if it functions. The play will resolve any remaining issues and influence aesthetic at the same time.

sjoh said...

Cool system. In a comment last week someone mentioned the 15th century Korean writing reform. Similar kind of solution for the sounds of their language (to replace Chinese characters).

It is geometric/rectilinear, so perhaps inevitably there is some overlap/similarity with symbols. (e.g. your /a/ or maybe /m/ - length of character differentiation issue - is the korean g, your /u/ the korean e rotated 90 degrees, /w/ = k. /n/ = i, /r/ = half of korean r, or backwards d).

In Korean, all the vowels are a variation on a central line stroke, horizontal or vertical, with 'dots' (short strokes in practice, like your squares) added. So they are look distinct from consonants. Also, they come below the baseline, important for when they're put together into words for reading.

Another interesting overlap, your same symbol for voiced/unvoiced consonants, plus a dot for voicing, is the exact opposite of Korean, which takes away a dot or stroke away from an 'aspirated' consonant, which then becomes voiced.

I'm sure king SeJong would be pleased to see us English users might finally catch on.

Martin said...

Clearly, Dmitrei, you have way too much time on your hands - but goog luck with this anyway...

Mathnerd314 said...

I see two main things missing from your design requirements:
1) Non-alphabetic characters such as ,.!?()"- look a bit similar to some of your designs.
2) Variants and typefaces: cursive, italic, bold, capital, small caps, sans-serif, monospace, etc.

Those might take a few redesigns to get working.

Anyways, my advice for the next step is to take some font-making tool (e.g. FontForge) and add your glyphs to it, export it as a web font, and then try reposting/recoding some of your articles to see if they're readable.

onething said...

Oh dear, now you've lost me. Before finishing this post, I had been planning to say, please don't make us dot an i or cross a t. This alphabet is filled with double figures, and looks like it would be a nightmare to read. I had a look at the Shavian alphabet, and disliked it for the same reason, while yours is too boxy, that one is full of squiggles that are very similar and yet could never be written neatly.

lemcody said...

OK, so I looted your previous post for some of the vowel sounds, but I think I am missing "a" as in cake, and "i" as in pie. Please excuse me if it is under my nose... just give me a nudge.
It's strange how compelling this idea is to me, that words could be logical. A six year old I know is learning to read right now and the things I have to explain to him are endless, as you have pointed out well. This six year old is not the kind to conform, and I hate to think that he may not enjoy reading because he is always being beat down by obscure rules. I would love for him to find reading enjoyable and easy, even rewarding, so we may not have an alternative yet... but I am drawn to make use of this project of yours, for my own tinkering at first.

Dmitry Orlov said...

To respond to some of the questions:

- Think of the symbols not as blocky shapes but as a sequence of brush strokes, as in Chinese calligraphy, and it will become obvious how they work in practice and what variants might exist. It is the number and direction of strokes and the stroke order that counts.

- The distinction into short/curved/long takes a bit of getting used to but is almost impossible to confuse in practice.

- To learn what the IPA symbols mean, look up IPA on Wikipedia.

- The symbol shapes are given are abstractions—generalized shapes and topologies used to specify the symbols rather than actual renderings. Font design will come next. Rest assured that I made sure these symbols work well for display fonts, body fonts (regular, bold, italic), cursive fonts, whimsical Dr. Seuss fonts, etc. None of them look boxy. For now, use your imagination.

- Somebody asked about diphthongs. Diphthongs are combinations of two vowels, with (in English) the first one longer and stressed: /eɪ/, /aɪ/, /oʊ/, /aʊ/, /ɔɪ/.


Kyddyl said...

Ever wonder why the Pope can say entire paragraphs in any language? And get the pronunciation perfectly? It's called the phonetic alphabet and it has been around many years. All languages may be converted to it and all languages can be learned from it. Let's just use the phonetic alphabet for everything and let each language literally speak for itself? Idioms and all.

Not Enough Rope Not Enough Trees said...

Dmitry,

How would numerals be treated?

jhughston said...

Having now seen some text it is quite intuitive to read. In fact, was surprised at how quickly it seemed to be learned. Will be interesting to see what it like to write.

Lacy Thompson Jr said...

I keep thinking it would be nice to find an intersection between phonetic signing, braille, and a new keyboard design, along with the symbology.
I am clear Dmitry's target is one that works backwards even to the stone age. Nice if it could also be optimized for maximal reading speed / comprehension as well as keyboard entry. I came to the conclusion sometime back that a chording keyboard - with one hand entry made a lot of sense.
http://tinyurl.com/d2ce4hv
I am playing around with having the symbology be derived from connecting the "dots" derived from the chording entry.
As the future unfolds, we are going to undergo a vast amount of re prioritization. I am struck by how many people in the third world have cell phones and I expect that to be a harbinger for what technologies will remain at the top of the list barring an absolute and total collapse to Mad Max.

Dmitry Orlov said...

NERNET -

Numbers will either be spelled out, or as numerals, the usual way. By the way, today is 121212 (wantʊwantʊwantʊ)

jhughston -

Glad you've reached the other shore already. Yes, it learns, reads and writes very easily. That's the whole point. I've been using it (it needs a name, doesn't it?) to take notes, because it's faster than my fastest illegible scribbling in Latin, and comes out perfectly legible. Once you realize that it's still legible even when dragging the pen (just like Chinese cursive) the speed takes off.

Lacy -

Stone age is a state of mind. Good thing is, we'll never run out of stones. As far as mobile devices and multitouch (what you call "chording") - that's already there and, yes, that's how the world will enter these symbols. The basic 9 glyphs are in a 3x3 grid; each is a 1-finger tap, the ones with the curved foot are either a 2-finger tap or a 1-finger down-left swipe; the long foot is either a 3-finger tap or a 1-finger downward swipe. Intuitive, no?

Glenn said...

Dmitry,

I think you left the "r" out of Jaeberwoki.

Otherwise, going through the IPA pronunciation guides for English in Wikipedia, I can't hear the difference between /a/ and /ae/, which they give as rhyming with "stack" and "cat". My wife claims to hear "cat" as "caet" when I say it, I can't. We are West Coast U.S., she's a native of Montana, me of California; we live on the Olympic Peninsula.

Just saying, 34 symbols would work for me, I don't know how many other minor reductions I could find if I played with it for a bit. But I'm impressed with your work. And as you said, it would look cool on an alien space ship, my boat or as graffiti.

Glenn said...

Kollapsnik said:

"As far as mobile devices and multitouch (what you call "chording") - that's already there and, yes, that's how the world will enter these symbols. The basic 9 glyphs are in a 3x3 grid; each is a 1-finger tap, the ones with the curved foot are either a 2-finger tap or a 1-finger down-left swipe; the long foot is either a 3-finger tap or a 1-finger downward swipe. Intuitive, no?"

No, not intuitive to me. A minimum of a 5 X 3 pad is required to display your glyphs. Not to mention the "." mark for voiced (I thought you wanted to avoid diacritic marks?) Unless glyph value is positional within a 3 X 3 grid. But then you can't do tails on the bottom row and left hand column. Please clarify, preferably with an illustrated example.

Thank You,

Glenn
Marrowstone Island

Vagabond Anne said...

Re: numerals. The mayans have an interesting system of bars and dots, easy to learn and write. Check it out.
http://en.wikipedia.org/wiki/Maya_numerals

Anonymous said...

kollapsnik,

This Orlovian project gets more and more impressive with every new day. Kudos! I have a few points to add.

I found the sample text quite readable on the first pass through with little difficulty, though having a vague memory of the poem did aid reading and also aided the learning. (Much better than my performance on kana flash cards last time I tried it!) It does seem to me that the /e/ seems to be used in a number of places where ee would be more correct (beware, the). In fact, "the" could avoid any possibility of confusion with "they" if the former instead took any of ee (when emphasized, possible conflict with thee), /æ/, /ʊ/, or (in homage to brevity, Gregg shorthand and the southern US) no vowel at all.

As a native English speaker and an armchair linguist at best, I have doubts about the cost/benefit balance of requirement 8, but I'm willing to have my eyes opened. I propose instead the addition of a single mark, an extension of the baseline to the right of the stressed vowel (or first or last of a diphthong), as an accent mark. It seems that such a mark would be trivial to write and read for both humans and computers, less obtrusive to the uniformity of the script than mere scaling, more compatible with both Unicode and 7-bit, monospace-oriented dot matrix displays like LCD modules and road advisory signs (and less likely to spawn a Shift-JIS nightmare to compensate), more adaptable to representation of both major and minor stresses (such as through a half-width extender for minor stress, at the possible cost of OCR and/or readability), more useful for literary effect, more economical for manual typesetting, more readily elided in popular usage if and where it isn't sufficiently useful in practice, and (in my mind's eye) more readable and beautiful. The only flaw that comes to mind is possible misinterpretation as a hyphen or dash, which admittedly depends on how much and how faithfully you would import the concepts and designs of other contemporary punctuation, English or otherwise. (I wish I could get everyone to see the beauty of angle quotes!)

As for topicalization, even after reading the Wikipedia explanation of the phenomenon, I'm pretty unconvinced that explicit topic identification is, as per the original use-case of mechanical transliteration of a large corpus of English text, easier for a computer to infer than for the reader, or much more valuable to take the extra step to explicitly indicate than for the reader to glean from context cues, but I am willing to be taught otherwise. Notwithstanding that, I would offer the same baseline character as a left-extension prefix for emphasized words (with a simple embellishment something like a hook from bottom to right as a second level of emphasis, if needed). It's in the right location relative to the word to be emphasized, trivial to write and read for humans and especially computers, unlikely to be mistaken for the baseline mark for syllabic stress, and (again IMHO) less "busy" looking and less disruptive to graphical uniformity.

I'll be watching the further development with interest. Cheers!

Anonymous said...

I HATE the irrational english spelling. I try to spell words phonically but that all too often fails, thank goodness for spell check or I would be even more frustrated.The silent letters need to be removed from words, most words should be spelled as they are spoken.No wonder so many immigrants never learn to spell english.
I am not a "robot".

Dmitry Orlov said...

Glenn -

The 'r' in Jabberwocky is a ɚ - a syllabic r, so no vowel is needed.

The difference between /a/ and /ae/ is punt/pant, run/ran etc. It's critical. The only other critical contrast is /ɪ/ vs /i/ (shit/sheet). The /ʊ/ vs /u/ is pull/pool, which isn't so critical. If you can hear those three, you are all set.

For multitouch, there are 9 basic glyphs, and 3 variants. The glyphs are laid out in a 3x3 grid, the variants are accessed based on number of touches or direction of swipe. Zero is the voicing mark. On a dumb numerical display, tap once for short tail, twice for hooked tail, three times for long tail. Since there are very few doubles - "reignite", "unnatural", a few others - this works well.

marxmarv -

Your suggestion to skip th vowel on th definite article has been accepted. I was vacillating on the matter, but you pointing out that it's not any one vowel put me over the edge. the, thee, thuh can all be th. I changed the text sample to reflect that.

void_genesis said...

What would really help this take off with those who already read english fluently would be a program that can gradually replace one letter at a time in a piece of text.

This way you could train someone to read this new lettering system while they read an enjoyable story. The process of gradual replacement should be automatable with a bit of clever programming, then applied to any short story or novel.

Sonny said...

It's a work of genius.

It took me about 20 minutes to read the sample text through the first time. Maybe I'm not very good at it.

After re-reading it, I noticed you're reading "slithy" with a short i. The popular pronunciation is with a long i, relating it to "slimy" and "lithe." With a short i, it's related to "slither," which is also good. I wonder if Lewis Carrol ever wrote down or had recorded any of his own pronunciations of his nonce words.

An example of my stumbling while reading: The word I was reading as "lawk" didn't seem to make sense. I checked and rechecked the symbols in the chart. Then I discovered a detail I'd misread, so I could change my reading to "lawng," which I mentally translated to "long."

Glenn said...

kollapsnik said...

"The 'r' in Jabberwocky is a ɚ - a syllabic r, so no vowel is needed.

The difference between /a/ and /ae/ is punt/pant, run/ran etc. It's critical. The only other critical contrast is /ɪ/ vs /i/ (shit/sheet). The /ʊ/ vs /u/ is pull/pool, which isn't so critical. If you can hear those three, you are all set."

Ah, thanks for the clarification. In the Wikipedia page on IPA it uses "stack" and "cat" for the difference between /a/ and /ae/; oddly our daughter hears me say "staeck" and "cat", and my wife say "stack" and "caet". "Punt" and "pant" make more sense to me, and is quite distinguishable. Perhaps in the Wiki article it was a typo which should have read "stuck" and "cat".

Thanks also for the correction on the Rhotacized Schwa, the Wiki article gave me the impression that the sound was only the vowel _influenced_ by the following "r"; I hadn't realized the "r" was included. In which case I would tend to use the schwa and "r" as separate letters to produce the sound; but your project, as I remember, was to use a single glyph for each sound as much as possible.

Glenn
Marrowstone Island

Dmitry Orlov said...

Glenn -

stack and cat both have an /æ/. People who get this distinction wrong often say rude things without meaning to. "Dear members of the fuckulty" and such. There are two types of r, l and n in English, syllabic and non-syllabic. Here are some syllabic ones: button, bottle, butter. Takes getting used to, but that's how it works.

void_genesis -

Not a bad idea overall, but I hate the idea of mixing the spelled and the unspelled versions, because that creates interference effects that actually block learning. Having them side by side is OK, but not mixed together. A better idea is to give people pictionary puzzles: c?t: a? e? æ? That's an app I intend to write.

Dmitry Orlov said...

maxmarv -

Some more thoughts: representing stress using slightly elongated vowels has already been implemented and works great. There is zero overhead to including it, and it helps learners.

As far as there not being a single phonological structure for all English speakers, if that were the case then Hollywood movies would require subtitles when shown in other English-speaking countries. And they don't. I rest my case.

As far as tone, that's just a freebie for people like playwrights, who want to indicate how an actor should interpret a certain phrase. It may help with teaching music, or for scoring chants, etc. It comes free of charge from having a set of glyphs that hang on a line like musical notes.

Pantalones Frescos said...


This is great. I've spent a fair amount of time in classroom settings with a wide age range of students. I have often thought reading is a kind of minefield or obstacle course for many kids. This could be a big help.
I have two little boys who are not reading yet and are not schooled. They are familiar with the alphabet and some of the sounds. In addition, they love libraries and we've read hundreds of books to them.
I am considering introducing this orthography to them but have some reservations. How do I explain to concerned friends and family? Will my kids buy in when they realize its not the same alphabet Dr. Seuss uses?
I was heartened by a comment here last week. Someone noted that Japanese students learn two writing systems and are competent in both by 3rd grade.
Seems like learning a simple,cool, and straight forward system like this could only help them make sense (at a later time) of our current "royal pain" if needed. Any thoughts?

Dmitry Orlov said...

Jonathan -

Your questions are all valid ones. A reasonable-sounding explanation is that you are giving your kids a head start, allowing them to read at whatever level they are capable of without going through the many stages of rote memorization of idiosyncratic spellings. I hope that in due course Dr. Seuss publishers will catch up with our system. And I have specifically designed it to have minimal interference effects with English spelling, so that the two can coexist peacefully. The biggest problem I see is motivation: why waste time to learn a ridiculous system that does the same thing as a non-ridiculous system you already know. It's what Europeans go through when moving to the US and having to learn Imperial when they already know metric. They quickly realize Imperial is garbage, plus nobody uses it, and can't be made to bother to learn it. But if English spelling gets depreciated over time and atrophies from disuse—that would be fine. It is the transition, when everyone has to know both systems, that's troublesome.

void_genesis said...

Maybe a rosetta stone type approach then with the new system parallel to the old? I personally prefer to learn something incidentally while doing something else. The main hurdle is learning to recognise the new lettering of course. The visual jarring of mixed lettering might be manageable for going through once in a single training document though....I would want to try it personally before discounting the idea.

As for a name for the system- would pangram or pangraph be about right?

The former has a nicer sound to my ears. A pangram is a sentence using every letter once, but a pangraph is a term with no current use I can find. The current sense of pangram has an echo with the idea of a lettering system that uses every phonetic element once as well.

Anonymous said...

The idea of making a brand new design which is useful and has no problems of old baggage is really cool. This reminded me of Neil Shubin's book "Our Inner Fish" which describes our evolutionary development in terms of "parts" we inherited from animal and fish evolution at various times. If we were to be redeisgned we could be made a lot more efficiently but life is a mixed bag and it is too late now so we make do with the plumbing we have. A unitary human language would be cool. Reading about Chinese language at Wikipedia is frutiful tounderstand a complex history of a language development in the East.

http://en.wikipedia.org/wiki/Chinese_language

I thought it was ideographs and not phonetic but apparently a strange admixture and previously a lingua franca for the whole east asian area.

Perhaps the article on the development of languages in India is similarly fascinating.

Mathnerd314 said...

I figure that adoption can be on a person-by-person basis; if there is e.g. a Linux distro that has some auto-translation hacks to the various software so that English is displayed as phonemes everywhere, then a few people can use it and the rest can be unaware of its existence.

The main problem with this approach is that auto-translation is hard; you can grab pronunciations from various dictionaries, but inferring the pronunciation of a word in general is text-to-speech which is AFAIK an open problem. Similarly speech-to-text, if you want to author in phonemes and publish in English, is another difficult problem. (And probably harder since English has so many weird corner cases)

Unknown said...

This post is inspiring! to see someone attempt to tackle a problem that is such a big problem. I really like your approach as well. Sadly Americans still don't know what a Meter is. I'm sensing some resistance. Take a look around your local Wal-Mart, these are the dullards that would have to commit. You see my narrow mindedness? This is why this post is so inspiring. I'm pulling for you. Awesome!

JRS Medical

[nom organisation] said...

Thanks !
I very much enjoyed the deciphering exercise !!

I think there will be a problem with people like me (international bad-english speakers who can't pronounce correct english/american and are used to mispronounce vowels and the rest) : we will need to make the effort to learn the real pronounciation of words before we understand written text :-)

But maybe the exercise would have been easier with a text composed of real words only :-)


my homework and errors :
-------------------------

jaberwoki

twoz brilig and the slithi touvz
did jaier and gimbel in the weib
all mimzi were the borogouvz
and the moum reths outgraib

beware the jaberwok my son
the jaws that bite the kloz that catch
beware the jabjab bird and shan
the frumias bandersnatch

he took his vorpel sword in hand
long time the manksam for hee saught
so rested hee by the tamtam tree
and stood awhile in that

Unknown said...

I really like this idea and I like where you are going with a new alphabet. However, I would suggest that the unvoiced consonants be the ones that get the 'dot' above. Since all the other sounds are voiced, it makes more sense to me that the voiced consonants follow that pattern and then use the 'dot' to distinguish the unvoiced consonants.

Also, it would appear that comments are not yet enabled on the new 'unspell' blog.

Thanks for you efforts in this area. It's been a dream of mine for years.

Dmitry Orlov said...

Fred -

I enabled the comments just now.

Voiced consonants are half as frequent in English as unvoiced ones, so I marked the voiced ones to keep the number of visual elements to a minimum. Doing the opposite would have caused clutter.