E-MELD Homepage Wayne State University Homepage Eastern Michigan University Homepage
Results: Online Questionnaire on Databases

Below are the answers we received to the database questionnaire. The submitter's name and software are displayed in parenthesis after each comment.
 
1. What is the pupose of your project's database-e.g., what kind of queries is it designed to answer?
  • Our project's database houses the metadata for all the resources in the archive. It supports queries on language, country, genre, and depositor, and displays the complete metadata record and list of files for each resource. (Johnson; MySQL )
  • We have numerous databases and data structures which can be construed as databases. (Hughes; Oracle MySQL )
  • Others from our group who are participating may also respond (differently!): Gary Simons, Will Lewis, Shauna Eggers, Brian Fitzsimons. Ours is a database of interlinear glossed text (IGT) obtained by the application of metaschemas to IGT in XML, formatted as the orignal encoder chooses (including the Bow & Bird model). (Langendoen; generated by metaschema from XML documents )
  • Each Wordcorr user has a PostgreSQL database installed automatically. It contains all the user's metadata, data, and analyses. It is created, queried, and maintained by the Wordcorr program, so the user never sees it. Its structure is in the CVS of our SourceForge site. Wordcorr.org will have a simpler database on its server, probably MySQL, that receives only the metadata from each collection when the user registers the collection. It will function as an OLAC repository, but we need help on the details of that. (Grimes; MySQL PostgreSQL )
  • AVENUE-Mapudungun database: parallel Spanish-Mapudungun corpus for 1. training corpus based machine translation algorithms 2. access by students in a bilingual education setting (Levin; None )
  • Elicitation corpus: The elicitation corpus consists of English sentences with feature structures. (There is also a Spanish version.) The feature structures contain features similar to those in the GOLD ontology. You can search for English/Spanish sentences with certain features. You can also sort the sentences by their features, or find sentences that are minimal pairs. If a sentence is supposed to illustrate a feature that is not realized in English/Spanish, there is a context field that explains what the sentence is supposed to illustrate. Informants translate the corpus into their languages, which then makes it possible to search for sentences with certain features in those languages. (Levin; Excel )
  • We are using a wide variety of database types and relational database management systems for various scientific and administrative purposes. In my abstract I specified a few. At my talk I will analyse its usage. Some of the DB are small, others such as the CELEX Lexicon Database for scientific purposes, the Speech Error Databases or the Index Database within the ECHO project are large, ie. beyond 100.000 records. Partly, we have created ER-DIagrams, for a large extent the rDB are based on a careful logical structure design that can be compared with a Schema in XML. Part of the DB are open and have a nice user interface, others are for internal use only. Whereever possible UNICODE is used. Where necessary we are using an archivable format as well. In my talk I will elaborate on the points mentioned in the questionnaire. (Wittenburg; Oracle MySQL Postgres, ISAM )
  • It is designed to store grammatical information about a given language in a database. The information will take the form both of traditional description and of formal, machine-readable rules. (Good; )
  • combination of geographical, typological, genealogical data (Bibiko; Filemaker Pro and own developments )
  • To study the morpho-phonology of (mostly finite) verb phrases in an agglutinating language. Domains of study include segment type, segment duration, tone as well as morpheme co-occurrence and sequence. Textual entries will include narrow phonetic transcription, morphological composition and gloss. Most of the items with a 'yes" answer should be read in the future tense. Ultimately, it will also contain a lexicon and discourse samples. (Salting; at the moment, data are being stored in Excel )
  • provide a dictionary for an under-documented language (Roddy; Shoebox )
  • The Digital Language Arhives has the following components: Early Mandarin Chinese Archive – to collect the texts and the lexicon of early Mandarin Chinese. Bronze Inscriptions and Bamboo Scripts Archive – to digitize the bronze inscriptions and bamboo scripts of 14th century BC to 3rd century BC. Modern Chinese Language Archive – to collect Modern Chinese language texts from different areas and tag the words with 46 syntactic categories. New Century Chinese Language Archive – to collect current spoken Mandarin of various styles and to combine written transcripts and digital video and audio processing. Southern Min Dialect Archives – to tag corpus of Southern Min written documents from 16th century to 20th century and to survey geographical distribution of Southern Min and Hakka dialects in Xinfeng township in Xinzhu, Taiwan. Formosan Language Archive – to collect, conserve, edit, and disseminate the language resources of the Austronesian languages in Taiwan. (Cheng; MySQL Access )
  • aw (w; Oracle werwerwer )
  • asking more information on how to design from 15 upwards key based entity relation diagram for any system that you may have (CHAUKE; clarion )
2. Does the database contain data from?
a. A single language 5
b. Multiple unrelated languages 7
c. Multiple languages which are genetically related 2
d. Multiple languages which have an areal relationship 3
e. Multiple languages which have both genetic and areal relationships 3
f. No language data-it holds associated documentation, like metadata 3

3. Does your database contain multimedia content?
Yes 8
No 5

4. Who are the intended users of your database?
a. The public 9
b. Only other linguists 3
c. Only your project team 1
d. Yourself only 0

5. How large is the database?
a. Over 500 records 3
b. Over 1000 records 2
c. Over 5000 records 1
d. Over 10,000 records 1
e. Huge! 4

6. Is it a relational database?
Yes 9
No 4

7. If so, do you have an Entity Relationship Diagram (ERD) which you would be willing to share?
Yes 4
No 7

8. Does the database have an online interface?
Yes 3
No 8

9. Is it open to the public?
Yes 5
No 6

10. What database management software do you use?
a. Oracle 3
b. MySQL 5
c. Filemaker Pro 1
d. Access 1
e. None 1
Other
  • generated by metaschema from XML documents
  • PostgreSQL
  • Excel
  • Postgres, ISAM
  • and own developments
  • at the moment, data are being stored in Excel
  • Shoebox
  • werwerwer
  • clarion

11. Is your data encoded in Unicode?
Yes 9
No 2

12. If your language data requires special characters (e.g., IPA), how are they encoded?
a. Unicode (see above) 8
b. HTML entities 1
c. National encoding schemes, e.g. CJK 2
d. ASCII with special user-defined characters 1
e. SAMPA or X-SAMPA 2
Other
No Record.

13. Is a special font required in order to properly view your data?
Yes 4
No 8

14. How much of your data has been exported to a program-independent archive format, such as a .txt file format with XML markup?
a. All 7
b. Over 50% 1
c. Over 25% 0
d. None 5

15. Does your database design (including your choice of database management software) have any advantages that others might profit from knowing about? If so, what?
  • MySQL is free, well-documented, and widely used. Hard to beat. Our design implements the IMDI metadata standard. (Johnson; MySQL )
  • We discovered recently an open source, cross platform database design tool called DBDesigner (www.fabforce.net/dbdesigner). This is an excellent tool for development of database schemas and ERDs. (Hughes; Oracle MySQL )
  • Yes, that's why we're presenting it! (Langendoen; generated by metaschema from XML documents )
  • It works. (Grimes; MySQL PostgreSQL )
  • It is in ASCII with punctuation to delimit fields, so it is easy to write a program to convert it into any other format. (Levin; None )
  • It can easily be converted into any other format. (Levin; Excel )
  • Our rDB normally are designed to fit with project specific needs, i.e. other than in the case of XML-based approaches no effort was made to create a kind of standardized or generalizable format. The XML domain is much more relevant for us, since here we speak about integration, exchange formats etc. (Wittenburg; Oracle MySQL Postgres, ISAM )
  • I use the database server VALENTINA. (Bibiko; Filemaker Pro and own developments )
16. What additional features would you like to see in an ideal linguistic database?
  • cannot answer this question (Wittenburg; Oracle MySQL Postgres, ISAM )
17. Comments
  • Our ERD is available on the AILLA website at http://www.ailla.utexas.org/images/erd.jpg. It's slightly out of date, but still pretty close. (Johnson; MySQL )
  • Note: we have multiple linguistic database applications, this survey content refers to the aggregate of all of them. (Hughes; Oracle MySQL )
  • Zhenwei has proposed an XML format for the elicitation corpus. (Levin; Excel )
  • We keep rDB as special instantiations of our databases and use them in what they are extremely good: transaction processing, access speed, ... (Wittenburg; Oracle MySQL Postgres, ISAM )
  • to 3.: not yet but in the future to 4.: for linguists AND non linguists to 7.: I would do, but no ERD is done. to 8.: online version is under construction to 9.: not yet to 11.: The ongonig VALENTINA version does not handles Unicode, so I have to work with entities and recalculating them for showing. (Bibiko; Filemaker Pro and own developments )
  • The answer to #14 is provisional. All the data will be structured with and stored in a program-independent format. This database is in its infancy, and as knowledge and skill increase, more specific information will be available. Re: #2, it will ultimately contain similar data from closely related languages. (Salting; at the moment, data are being stored in Excel )


Program
Papers and Handouts
Participants
Instructions for Participants
Working Groups
Registration
Local Arrangements
E-MELD 2001 E-MELD 2002 E-MELD 2003 E-MELD Homepage