Ega X-SAMPA Transcription Conventions
- Ega ASCII (X-SAMPA) Transcription Conventions
- X-SAMPA conventions for Ega
- EGA-SAMPA simplified conventions
- More about SAMPA
- Follow the path of the Ega data:
The SAMPA alphabet was developed in the late 1980s by John Wells, in consultation with a wide range of colleagues, to meet a need for a simple machine-readable encoding of phonetic transcriptions with symbols of the International Phonetic Alphabet (IPA) for file interchange purposes. At that time, standardisation of symbol codes and IPA fonts was not highly developed. The underlying principle of SAMPA was to select those IPA symbols which were conventionally used to represent phonemes in the major languages of the European Union, and to assign a 7-bit ASCII code number (below 128) to each. One of the secondary criteria was the visual similarity of the IPA symbol and the letter representing the ASCII code.
Since that time, the standardisation of IPA encoding has progressed, with the system developed by John Esling (the `Esling codes'), and, more recently, Unicode representations. For practical purposes, however, little has changed at the time of writing, and there is still a need for a straightforward machine-readable encoding.
In the meantime, SAMPA is widely used, and extensions of SAMPA have now been developed for many other languages. In order to aid the development of such extensions, the extended code-set X-SAMPA was devised by John Wells, and encompasses the complete set of IPA conventions. For a number of symbols, human readability had to be sacrificed in favour of simple, unambiguous machine-readability, owing to the restricted number of ASCII codes. The present collation of SAMPA and X-SAMPA is by Inge Mertins.
For further details, consult Gibbon et al. (1997) and the relevant IPA and SAMPA Internet sites, including project sites with working versions of SAMPA for specific languages.
For prosodic annotation, a number of systems are available. A number of these are discussed in Chapter 1 of Gibbon et al. (2000). The most widely used in extensive corpus annotation, computational linguistics and speech technology is currently ToBI (Tones and Break Indices); the SAMPROSA system Gibbon et al. (1997) contains additional symbols which are suitable for more detailed dialogue transcription.
Readers should be aware that there is still considerable need for standardisation with respect to the use of IPA codes and fonts in consumer software such as word processors and Internet browsers.
X-SAMPA is a keyboard friendly version of the IPA. The following conventions have been developed for using the X-SAMPA ASCII encoding of the IPA.
The reasons for using (simplified) X-SAMPA conventions are:
- international de facto standardisation in language and speech engineering
- transcription without proprietary fonts to avoid unnecessary complications in word processor treatment and automatic computer processing
- transcription without unicode in order to avoid the need for hi-tech software and hardware
- dissemination as plain text via email and web internet services
- retention of human readability
- ease of convertability into RTF specifications of other fonts or into unicode by means of scripting languages.
2.3 Tone (after vowel)
- Get Started: Summary of the Ega conversion
- Build a Lexicon: Lexicons page (Classroom)
- Encode Characters: Unicode pages (Classroom)
- Create an IGT: IGT pages (Classroom)
- Convert Audio Data: Audio pages (Classroom)
- Convert Video Data: Video pages (Classroom)
- Utilize an Ontology: Ontology pages (Classroom)
|About the Data|
Follow the Ega Data Lexicon
Interlinear Glossed Text
|About the Language|
|User Contributed Notes
E-MELD School of Best Practices: Ega X-SAMPA Transcription Conventions
|+ Add a comment|
|+ View comments|