How does the language feel?
Last post I briefly touched on the way a language might develop with regards to the environment in which it has been spoken. That basically gives an idea of the types of sounds a language might be made up from. Vowels might be long or short, they might have inflections in them halfway through their utterance. Consonants have a plethora of ways they can change the sound of speech through the shaping of the mouth, the variance of air flow, separation of lips, placement of tongue and teeth.
There has been a system of symbolism developed to describe the sound forms involved in every language spoken on the planet, it is called the “International Phonetic Alphabet”. While it’s great and comprehensive, it’s a cow of a thing to type on a standard ‘qwerty’ keyboard. There are specific fonts that make it easier to type, but even then I’d need to make sure that you had access to those fonts and had them installed while reading this series of blog posts. Unless you’ve had a bit of practice, reading things written in standard International Phonetics can be a bit tricky too. With that in mind, I’ll be trying to stick to standardized English/Roman lettering, maybe with a few unusual letter combinations to add a bit more mystery to the final developed language. I’m not just going to throw a string of random consonants together, and add in a bunch of vowels at arbitrary points and some quirky punctuation, each sound used in the language will be there for a reason (even if that reason is sheer laziness on the part of certain speakers that has generally infected the wider vocabulary).
Let’s look at a distribution of sound forms. I actually think we should probably take note of zipf’s law before we go much further…
Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportionalto its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc. For example, in the Brown Corpus of American English text, the word "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.
The same relationship occurs in many other rankings unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, ranks of number of people watching the same TV channel, and so on.
(Wikipedia may not be the most accurate source, but it often says the right general things and provides a good starting point for people who want to do more research.)
Basically, the more natural a language is, the more likely it is to conform to Zipf’s Law. I’d be tempted to vaguely apply the law to the sound forms that make up the language. It’s pretty common knowledge that the six most common letters used in the English language are “E”, “T”, “A”, “O”, “I”, then “N” in that order, and that the letters vary in other languages. So it might be worth considering what the most common sounds are in a conlang. I know that letters aren’t sound forms, but for the simplicity of typing the conlang’s wordforms, I’ll be using standard letter forms (for the reasons described above). If we need new letter forms, I’ll try to keep them simple (such as underlining letters with a different pronounciation…which gives us 10 vowel types to play with and 42 consonant types).
So, I’m going to be using standard lettering for the sound forms in this language (a bit like the “romaji” form of Japanese), and I’m going to try to make sure the sound forms follow a distribution where some are more common than others to give the language a “natural” feel. I’m also going to make the language coherent and capable of conveying meaning through the sounds and linguistic structure. Perhaps even providing a guide to how that language might adapt other words into it’s lexicon when it encounters a concept that it just doesn’t have the words for (again probably taking cues from Japanese, which has some notoriety in linguistic circles as a magpie tongue…almost as much as English).