Ega X-SAMPA Transcription Conventions

Page Index




The content of this page was developed from the research of
Dafydd Gibbon, Bruce Connell, and Firmin Ahoua.

Introduction

The SAMPA alphabet was developed in the late 1980s by John Wells, in consultation with a wide range of colleagues, to meet a need for a simple machine-readable encoding of phonetic transcriptions with symbols of the International Phonetic Alphabet (IPA) for file interchange purposes. At that time, standardisation of symbol codes and IPA fonts was not highly developed. The underlying principle of SAMPA was to select those IPA symbols which were conventionally used to represent phonemes in the major languages of the European Union, and to assign a 7-bit ASCII code number (below 128) to each. One of the secondary criteria was the visual similarity of the IPA symbol and the letter representing the ASCII code.

Since that time, the standardisation of IPA encoding has progressed, with the system developed by John Esling (the `Esling codes'), and, more recently, Unicode representations. For practical purposes, however, little has changed at the time of writing, and there is still a need for a straightforward machine-readable encoding.

In the meantime, SAMPA is widely used, and extensions of SAMPA have now been developed for many other languages. In order to aid the development of such extensions, the extended code-set X-SAMPA was devised by John Wells, and encompasses the complete set of IPA conventions. For a number of symbols, human readability had to be sacrificed in favour of simple, unambiguous machine-readability, owing to the restricted number of ASCII codes. The present collation of SAMPA and X-SAMPA is by Inge Mertins.

For further details, consult Gibbon et al. (1997) and the relevant IPA and SAMPA Internet sites, including project sites with working versions of SAMPA for specific languages.

For prosodic annotation, a number of systems are available. A number of these are discussed in Chapter 1 of Gibbon et al. (2000). The most widely used in extensive corpus annotation, computational linguistics and speech technology is currently ToBI (Tones and Break Indices); the SAMPROSA system Gibbon et al. (1997) contains additional symbols which are suitable for more detailed dialogue transcription.

Readers should be aware that there is still considerable need for standardisation with respect to the use of IPA codes and fonts in consumer software such as word processors and Internet browsers.

Ega ASCII (X-SAMPA) Transcription Conventions

X-SAMPA is a keyboard friendly version of the IPA. The following conventions have been developed for using the X-SAMPA ASCII encoding of the IPA.

The reasons for using (simplified) X-SAMPA conventions are:

X-SAMPA conventions for Ega

1.1 Consonants

  Labial Dental Palatal Velar Labio-velar
Voiceless Stops p t c k kp
Voiced Stops b d J\ g gb
Implosive Stops b_< d_< J_< g_< gb_<
Voiceless Fricatives f s x    
Voiced Fricatives v z      
Glides   l j w  
Nasals m n J N  

1.2 Vowels

  Front Central Back
Close i I u U
Close-mid e E o O
Open a

EGA-SAMPA simplified conventions

2.1 Consonants

  Labial Dental Palatal Velar Labio-velar
Voiceless Stops p t c k kp
Voiced Stops b d J\ g gb
Implosive Stops b< d< J< g< gb<
Voiceless Fricatives f s x    
Voiced Fricatives v z      
Glides   l j w  
Nasals m n J N  

2.2 Vowels

  Front Central Back
Close i I u U
Close-mid e E o O
Open a

2.3 Tone (after vowel)

High Mid Low
' - `

More about SAMPA




Follow the path of the Ega data:

  1. Get Started: Summary of the Ega conversion
  2. Build a Lexicon: Lexicons page (Classroom)
  3. Encode Characters: Unicode pages (Classroom)
  4. Create an IGT: IGT pages (Classroom)
  5. Convert Audio Data: Audio pages (Classroom)
  6. Convert Video Data: Video pages (Classroom)
  7. Utilize an Ontology: Ontology pages (Classroom)

User Contributed Notes
E-MELD School of Best Practices: Ega X-SAMPA Transcription Conventions
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search