DPLAMAPtoCTConversion

DPLAMAPtoCTConversion

DPLAMAPtoCTConversion program is developed to convert Metadata Application Profile(MAP) metadata records of Digital Public Library of America into the developed Common Terminology (CT). 

Digital Public Library of America cooperated in providing metadata records and we downloaded their 8012390 Metadata Application Profile(MAP) records in json form on September of 2015. The element/term usage of their records is analyzed on January of 2016 and CT SKOS crosswalk is developed based on the analyzed usage on March of 2016. And July of 2016, DPLAMAPtoCTConversion is developed. This version may be modified by feedback and/or comments of metadata experts of Digital Public Library of America.

Using  the Metadata Application Profile(MAP) metadata records of DPLA, DPLAMAPtoCTConversion program with Python programing language is developed that converts them into the developed Common Terminology (CT). At last, we had a final result of DPLA MAP to CT conversion and statistic results of the conversion:

The total Match rates of 8012390 records  in the folder, C:\Python27\metadata\DPLA
The number of total Statement= 228248135
DPLAMAPtoCTConversion Converted rate= 96.4896142525
exactMatch rate= 62.6515663931
narrowMatch rate= 35.3335726678
broadMatch rate= 2.01486093913
noConverted rate= 3.51038574751
Not converted Element Names are  {u’originalRecord’: 8012390}

The Main Structure of DPLAMAPtoCTConversion

  • The original record information of MAP is not converted, because we believe that MAP of DPLA describes enough the core information of original records.
  • CT has 12 common terms with qualifiers. Some terms are the same but the other is different with MAP terms. To preserve your information better, some terms of MAP are still preserved in the value of the terms, since I think it may work well for building Linked Open Data and search engine. For example, <ct:description ct:provenance=”(dataProvider)NMNH – Mineral Sciences Dept.”/>
  • Two statements are added into the transformed records for RDF/XML format:
    1. ‘rdf:Description rdf:about=url’ statement is added for each record with isShownAt url as default (alternatively, object, hasView/@id, or @id of DPLA information is used if isShownAt info. is absent.)
    2. ‘ct:identifier ct:source=”DPLA_dataProvider”‘ is added for each record (dataProvider is default, alternatively, provider is used, if no dataProvider is provided).
  • The main errors in W3 RDF validation are caused by ‘ in dictionary and list. Because ‘ in the value caused confusion with single and double quotations, ‘ should be changed into other symbol. Thus, ‘s in the value is replaced into ^s and ‘ in the value is replaced into ^. For example,

    The caused error by ‘:
    <ct:description ct:source=”DPLA” ct:recordinfo=”(admin){u’validation_message’: u”‘rights’ is a required property”, u’valid_after_enrich’: False}” />.
    The fixed error:
    <ct:description ct:source=”DPLA” ct:recordinfo=”(admin){u’validation_message’: u’^rights^ is a required property’, u’valid_after_enrich’: False}” />
  • Also, & in the urls, especially in the Smithsonian records (eg., DPLA 1001.json), caused W3 validation errors. To fix these errors, & is replaced into &amp;.
    The fixed error by &amp;,
    However, &amp; works partially for web browsers. The above link works, but <ct:identifier ct:object=”http://collections.nmnh.si.edu/media/?irn=10177320&amp;thumb=yes“/> doesn’t work for web browsers. I have to reverse them for searching.
  • To increase readability, the converted ct statements of metadata records are ordered : 1) ct:identifier ct:source 2) ct:description ct:provenance 3) ct:identifier 4) others 5) ct:descrirption ct:recordinfo.
The Foundings by DPLAMAPtoCTConversion
  • The original records use various standards and formats, which may need many crosswalks.
  • The admin info. of DPLA is clear to describe statuses for the original records’ validations.
  • Some exceptions that some records do not provide urls for isShownAt, object, or data provider information in the original records.
The sample record of the original MAP:

{“@context”: “http://dp.la/api/items/context“, “dataProvider”: “NMNH – Mineral Sciences Dept.”, “admin”: {“validation_message”: “‘rights’ is a required property”, “valid_after_enrich”: false}, “@id”: “http://dp.la/api/items/a6fbeed426acb5aca845bf228fc34617“, “_rev”: “1-76bfa2190c0e6e0868465967cff2243e”, “object”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245893&action=11&qtab=0&thumb=yes“, “aggregatedCHO”: “#sourceResource”, “ingestDate”: “2014-09-24T08:26:07.444760Z”, “@type”: “ore:Aggregation”, “ingestionSequence”: 16, “isShownAt”: “http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&repo=DPLA“, “provider”: {“@id”: “http://dp.la/api/contributor/smithsonian“, “name”: “Smithsonian Institution”}, “sourceResource”: {“description”: [“1”, “4 Feb 2014”], “title”: “Quartz”, “type”: “image”, “collection”: [{“@id”: “http://dp.la/api/collections/7a809a0cfa64e6839927b009a651df3e“, “id”: “7a809a0cfa64e6839927b009a651df3e”, “title”: “Mineralogy”}, {“@id”: “http://dp.la/api/collections/eeccb37a60ecd807e77624191613bcd6“, “id”: “eeccb37a60ecd807e77624191613bcd6″, “title”: “Gems”}, {“@id”: “http://dp.la/api/collections/952a287873470fe42490660a92914d6b“, “id”: “952a287873470fe42490660a92914d6b”, “title”: “Mineral Sciences”}], “spatial”: [{“name”: “Minas Gerais, Brazil”}], “stateLocatedIn”: [{“name”: “Washington, D.C.”}], “@id”: “http://dp.la/api/items/a6fbeed426acb5aca845bf228fc34617#sourceResource“, “subject”: [{“name”: “Rose”}, {“name”: “Quartz”}, {“name”: “Mineralogy”}]}, “ingestType”: “item”, “_id”: “smithsonian–http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&repo=DPLA“, “originalRecord”: {“freetext”: {“physicalDescription”: [{“#text”: “84.00ct”, “@label”: “Weight”}, {“#text”: “Rectangular Step”, “@label”: “Fassion/Cut”}, {“#text”: “Med Lightly Orange Red”, “@label”: “Color”}], “name”: [{“#text”: “Quartz – Primary Gem”, “@label”: “Taxon”}, {“#text”: “Rose – Primary Gem Synonym”, “@label”: “Taxon”}], “setName”: [{“#text”: “Mineralogy”, “@label”: “See more items in”}, {“#text”: “Gems”, “@label”: “See more items in”}, {“#text”: “Mineral Sciences”, “@label”: “See more items in”}], “notes”: [{“#text”: “1”, “@label”: “Specimen Count”}, {“#text”: “4 Feb 2014”, “@label”: “Record Last Modified”}], “place”: {“#text”: “Minas Gerais, Brazil”, “@label”: “Place”}, “dataSource”: {“#text”: “NMNH – Mineral Sciences Dept.”, “@label”: “Data Source”}, “identifier”: [{“#text”: “Gems”, “@label”: “Barcode”}, {“#text”: “G3421”, “@label”: “USNM Number”}]}, “indexedStructured”: {“topic”: “Mineralogy”, “scientific_name”: [“Quartz”, “Rose”], “place”: [“Brazil”, “Minas Gerais”], “online_media_type”: “Images”}, “collection”: [{“@id”: “http://dp.la/api/collections/7a809a0cfa64e6839927b009a651df3e“, “id”: “7a809a0cfa64e6839927b009a651df3e”, “title”: “Mineralogy”}, {“@id”: “http://dp.la/api/collections/eeccb37a60ecd807e77624191613bcd6“, “id”: “eeccb37a60ecd807e77624191613bcd6″, “title”: “Gems”}, {“@id”: “http://dp.la/api/collections/952a287873470fe42490660a92914d6b“, “id”: “952a287873470fe42490660a92914d6b”, “title”: “Mineral Sciences”}], “descriptiveNonRepeating”: {“data_source”: “NMNH – Mineral Sciences Dept.”, “title”: {“#text”: “Quartz”, “@label”: “title”}, “record_link”: “http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&repo=DPLA“, “title_sort”: “QUARTZ”, “online_media”: {“media”: [{“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245893&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245893%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245893&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245895&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245895%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245895&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245896&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245896%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245896&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245905&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245905%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245905&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245907&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245907%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245907&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245919&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245919%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245919&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245922&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245922%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245922&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245921&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245921%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245921&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245923&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245923%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245923&action=11&qtab=0&thumb=yes“, “@type”: “Images”}, {“#text”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245891&width=768&height=1024&action=10“, “@idsId”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245891%26width=768%26height=1024%26action=10“, “@thumbnail”: “http://collections.nmnh.si.edu/search/ms/search.php?irn=10245891&action=11&qtab=0&thumb=yes“, “@type”: “Images”}], “@mediaCount”: “10”}, “record_ID”: “nmnhmineralsciences_1001180”, “unit_code”: “NMNHMINSCI”}, “provider”: {“@id”: “http://dp.la/api/contributor/smithsonian“, “name”: “Smithsonian Institution”}, “_id”: “nmnhmineralsciences_1001180”}, “id”: “a6fbeed426acb5aca845bf228fc34617″},

The Converted CT by DPLAMAPtoCTConversion:

<?xml version=”1.0″ encoding=”UTF-8″?>
< rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:ct=”http://www.ct.iopdl.org/1.2/”>
< rdf:Description rdf:about=”http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&amp;repo=DPLA“>
< ct:identifier ct:source=”Digital Public Library of America_NMNH – Mineral Sciences Dept.”/>
< ct:description ct:provenance=”(provider){u’@id’: u’http://dp.la/api/contributor/smithsonian‘, u’name’: u’Smithsonian Institution’}”/>
< ct:description ct:provenance=”(dataProvider)NMNH – Mineral Sciences Dept.”/>
< ct:identifier>(_id)smithsonian–http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&amp;repo=DPLA</ct:identifier>
< ct:identifier ct:uri=”http://dp.la/api/items/a6fbeed426acb5aca845bf228fc34617#sourceResource“/>
< ct:identifier ct:collection=”{u’@id’: u’http://dp.la/api/collections/7a809a0cfa64e6839927b009a651df3e‘, u’id’: u’7a809a0cfa64e6839927b009a651df3e’, u’title’: u’Mineralogy’}”/>
< ct:identifier ct:collection=”{u’@id’: u’http://dp.la/api/collections/eeccb37a60ecd807e77624191613bcd6‘, u’id’: u’eeccb37a60ecd807e77624191613bcd6′, u’title’: u’Gems’}”/>
< ct:identifier ct:collection=”{u’@id’: u’http://dp.la/api/collections/952a287873470fe42490660a92914d6b‘, u’id’: u’952a287873470fe42490660a92914d6b’, u’title’: u’Mineral Sciences’}”/>
< ct:identifier ct:identifierOther=”(_rev)1-76bfa2190c0e6e0868465967cff2243e”/>
< ct:identifier ct:object=”http://collections.nmnh.si.edu/search/ms/search.php?irn=10245893&amp;action=11&amp;qtab=0&amp;thumb=yes“/>
< ct:identifier>(id)a6fbeed426acb5aca845bf228fc34617</ct:identifier>
< ct:identifier ct:uri=”(isShownAt)http://collections.si.edu/search/results.htm?q=record_ID%3Anmnhmineralsciences_1001180&amp;repo=DPLA“/>
< ct:identifier ct:uri=”http://dp.la/api/items/a6fbeed426acb5aca845bf228fc34617“/>
< ct:subject>[u’Rose’, u’Quartz’, u’Mineralogy’]</ct:subject>
< ct:publisher ct:place=”(stateLocatedIn)Washington, D.C.”/>
< ct:title>Quartz</ct:title>
< ct:typeGenre>image</ct:typeGenre>
< ct:description>[u’1′, u’4 Feb 2014′]</ct:description>
< ct:subject ct:spatial=”[{u’name’: u’Minas Gerais, Brazil’}]”/>
< ct:description ct:source=”DPLA” ct:recordinfo=”(admin){u’validation_message’: u’^rights^ is a required property’, u’valid_after_enrich’: False}” />
< ct:description ct:source=”DPLA” ct:recordinfo=”(aggregatedCHO#)#sourceResource” />
< ct:description ct:source=”DPLA” ct:recordinfo=”(ingestDate)2014-09-24T08:26:07.444760Z” />
< ct:description ct:source=”DPLA” ct:recordinfo=”(ingestionSequence)16″ />
< ct:description ct:source=”DPLA” ct:recordinfo=”(@context)http://dp.la/api/items/context” />
< ct:description ct:source=”DPLA” ct:recordinfo=”(ingestType)item” />
< ct:description ct:source=”DPLA” ct:recordinfo=”(@type)ore:Aggregation” />
< /rdf:Description>

</rdf:RDF>

 

Comments are closed.