Developing CT

Developing CT (Common Terminology)

CT has been developing to achieve and improve interoperability at the suggested multiple metadata model levels: schema, schema definition language, record, and repository metadata model levels. These levels are defined as follows:

  • The Metadata Schema level, with a focus on diverse schemas, lexical and semantic interoperability via crosswalks.
  • The Metadata Schema definition language level, with a focus on semantic interoperability, and implementing schemas in schema definition languages such as XML or RDFs.
  • The Record level, with a focus on integrating records, mapping of the elements with conversions, and lexical, semantic and syntactic interoperability.
  • The Repository level, with a focus on harvesting records, mapping value strings related to specific elements, and lexical and semantic interoperability.

At the schema level

CT is developed so as to maximize lexical and semantic metadata interoperability among widely used standards (MARC, MODS, DC, and QDC). Common Terminology is a set of common terms of element names of MODS for MARC, and DC & QDC. They are selected to minimize the gap of different degree of generality or specificity of MARC, MODS, DC and QDC. The terms have over 50% usage in Harvard, WorldCat and UIUC metadata records and in all 5 search interfaces.

Bases

  • Crosswalks of Library of Congress (e.g., MARC to MODSMARC to DCMODS 3.4 to MARCMODS 3.4 to DCDC to MODS, and DC to MARC) (LC, Conversions).
  • Usages of MARC tags and (Q)DC elements in 5 search interfaces and in actual metadata records of Harvard, UIUC, and MIT through cooperation of three universities in the USA.
  • Actual metadata records of Harvard (MARC, 12 million records), UIUC (MARCXML, 10 million), and MIT (QDC, 20,000).
  • Specially designed Python programs to examine usages.
  • MARC tags usageof WorldCat and MARC tags usage in searchings referenced (Smith-Yoshimura, et al., 2010).
  • Schema definition languages such as XML schema and RDF schema
  • SKOS concepts
  • Mapping experiments with new designed conversions by Python programs.

Criteria

  • First, based on crosswalks of Library of Congress, the Common Terminology is selected, which achieve and maximize lexical and semantic interoperability. Also, CT that minimizes the gap of different degrees of generality or specificity is selected.
  • Second, often used tags or elements names, over 50% usage in Harvard, WorldCat and UIUC metadata records, are selected as Common Terminology.
  • Third, often used tags or element names by all 5 search interfaces are selected as Common Terminology.
  • Fourth, the selected Common Terminology is generalized with QDC elements usage of MIT records, and MARC & MODS from/to DC & QDC crosswalks of LC.
  • Fifth, the selected Common Terminology is generalized in order to have 12 Common Terms that are less than the 15 element names of DC.
  • Sixth, CTScheme is defined as a controlled set of values that are specific to Common Terminology. It is a unique chacrateristic of CT used as an authority that designates and limits values to describe resources.
  • Seventh, besides above six criteria, common sense is used to decide Common Terms (properties) and qualifiers (sub-properties). (Jin, 2014)
The Developed CT 1.1

The developed Common Terminology version 1.1 is a set of 12 Common Terms (properties) (less than DC core elements) and 53 qualifiers (sub-properties) (many fewer than 1000 MARC tags and MODS elements). Qualifiers specify and subdivide 12 properties in detail, with CTScheme. CTScheme is defined as an enumerated set of resources used as a controlled set of values, including authorities.

The Selected 12 Common Terms are: contributor, date, description, format, identifier, language, publisher, relation, rights, subject, title, and typeGenre.  53 qualifiers of CT 1.1 are selected to preserve much information of the 1000 MARC tags and many subfields, and elements and attributes of MODS. The below table shows simply common terms and qualifiers. Please visit The Developed CT 1.1 webpage for more detail definitions and information for 12 common terms and 53 qualifiers.

Download (CommonTerminology1-1.pdf, PDF, 910KB)

Crosswalk MARC, MODS, DC and QDC to CT

The crosswalk for MARC, MODS, DC and QDC is designed to clearly show how they are semantically and lexically mapped into CT. Please visit Crosswalk MARC, MODS, DC and QDC to CT webpage for more detail information and to have downloadable pdf version crosswalk.

At the schema definition language level

CT is variously represented in order to be understood and used by many communities such as ct.xsd (XML schema), ct.rdf (rdf schema), and ctskos.rdf (SKOS concept).  These open for many communities to use CT either in XML or RDF form. Also, to inspire understanding of CT, it is represented with SKOS concepts (ctskos.rdf). Please visit CT Schemas webpage for more details.

Bases

  • 12 Common Terms and 53 qualifiers of the Common Terminology
  • Schema definition languages such as XML schema and RDF schema
  • SKOS concepts
CT Representations (schemas)

At the record level

The performance of CT in achieving and improving metadata interoperability is presented through empirical evaluations with the designed conversion. The conversions are to convert MIT (QDC) records or UIUC (MARCXML) into the Common Terminology 1.1. The experiments for CT are conducted with Harvard (MARC), MIT (QDC), and UIUC (MARCXML) metadata records through cooperation of three universities in the USA. The results show that CT minimizes considerably loss of information reducing the gaps among them. CT increases significantly accuracy in mappings showing high lexical and semantic match rates. Please visit CT Performance webpage that shows methodology and results of empirical evaluations as well as conversion results.

At the repository level

The planned prototype is to provide a portal for Harvard, MIT and UIUC libraries with the built Linked Open Data and CT union catalog, connecting their several million online accessible records on the Web.

Comments are closed.