EuropeanaEDMtoCTConversion

EuropeanaEDMtoCTConversion

EuropeanaEDMtoCTConversion is a conversion program that converts Europeana Data Model (EDM) records of Europeana to the developed Common Terminology (CT).

Europeana offered ways to access their data at http://labs.europeana.eu/api. Thanks to their kind offer, the HarvestEDM program was developed, which harvests their metadata records through their OAI-PMH Service by sets.

However, harvesting 42 million objects of Europeana was not straightforward, because of

  • lack of appropriate equipment to harvest speedily with enough storages. After harvesting 31 sets out of 1871 sets, the storage of the used device was out of space. Harvesting was temporarily stopped until a device that has enough storage was donated.
  • unstable internet connection. No matter which device we use, harvesting program could not keep running, because the Internet was often disconnected.

Using the first harvested 31 sets, the element/term usage of their records is analyzed on January of 2016 and CT SKOS crosswalk is developed based on the analyzed usage on March of 2016. EuropeanaEDMtoCTConversion program was developed from September, 2016.

Thanks to GOD, in the beginning of December 2016, all sets of Europeana EDM records were harvested in rdf/xml form. The harvested EDM records were converted into the developed Common Terminology (CT) in rdf/xml form in the beginning of December 2016.

EuropeanaEDMtoCTConversion Conversion Rates

The Average of Match rates was calculated with the measured values of the grouped sets. For example, the first set group is 1 to 30 sets. The measured values of 11271544 records in 1-30 sets are: Converted rate=  100.0;exactMatch rate=  69;narrowMatch rate=  22.18;broadMatch rate=  8.6;noConverted rate=  0.0%;Not converted Element Names are  {}.

The Average of total Match rates of 39937489 EDM records of Europeana are: 

The number of Statements=  1925054834.0

The Average of Converted rate=  99.9901938321

The Average of exactMatch rate=  67.1639836621

The Average of narrowMatch rate=  22.9867034728

The Average of broadMatch rate=  9.84931286504

The Average of closeMatch rate=  0.0

The Average of noConverted rate=  0.00980965763118

Not converted Element Names are  {‘ore:Proxy/dcterms:isRequiredBy’: 12, ‘ore:Proxy/edm:isDerivativeOf’: 5, ‘edm:EuropeanaAggregation/edm:hasView’: 1282, ‘ore:Aggregation/edm:ugc’: 48500, ‘ore:Proxy/edm:isRepresentationOf’: 2, ‘ore:Proxy/edm:incorporates’: 21, ‘ore:Proxy/edm:isSuccessorOf’: 2}

Difficulties

There were some difficulties in the conversion, EuropeanaEDMtoCTConversion. The main reason of the difficulties comes from the diversity of values that providers described.

  • The different language codes are used in some records, which ISO 639 series do not define and causes W3 rdf validation error such as {W116} RFC 3066 section 2.3 mandates the use of ‘en’ instead of ‘eng’. The used language codes that are not defined in ISO series are [‘als’,’ang’,’arz’,’ast’,’azb’,’bar’,’bcl’,’bjn’,’bpy’,’bxr’,’cas’,’cdo’,’ckb’,’diq’,’en-gb’,’en-us’,’eur’,’ext’,’frp’,’gag’,’gan’,’glk’,’gml’,’gom’,’gr’,’hak’,’hbs’,’hif’,’iten’,’japani’,’jp’,’jut’,’koi’,’ksh’, ‘lad’, ‘lbe’,’lij’,’lmi-2010′,’lmo’ ,’lrc’,’ltg’,’lzh’, ‘mhr’,’mo’,’mrj’,’mzn’,’nan’,’nap’,’nov’, ‘nrm’,’olo’,’osx’,’pcd’,’pdc’,’pfl’, ‘pih’,’pms’,’pnb’,’pnt’,’prg’, ‘ran’,’rgn’,’rmy’, ‘rue’,’sgs’,’sh’,’sk-SK’,’Spa’,’stq’,’szl’,’tcy’,’uri’,’vec’,’vep’,’vls’,’wuu’,’xmf’,’xxx’,’yue’,’zea’,     ‘zh-hant’,’zh-latn-pinyin-x-hanyu’,’zh-latn-pinyin-x-notone’,’zh-latn-wadegile’]
  • Some languages that are used in SKOS concept to provide the multilingual services causes W3 RDF validation warning such as

    “Warning: {W131} String not in Unicode Normal Form C: “(sl)pozidano območje, strnjeno naselje;(sk)zastavaná oblasť;(da)bebygget område;(eu)eremu eraiki; eraikitako eremu;(ro)zonă construită;(it)area edificata;(tr)yerleşim alanı;(mt)żona mibnija;(no)bebygd område;(hu)beépített terület;(lv)apbūvēta teritorija;(ar)منطقة مشيَّدة;(lt)apstatyta teritorija;(cs)území zastavěné;(de)Bebaute Fläche;(el)(πυκνο)δομημένη περιοχή/οικιστική περιοχή;built-up area;城市建成;(fi)rakennettu alue, asutusalue;(pl)teren zabudowany;(pt)povoações;(bg)Застроен район;(fr)agglomération;(sv)tätbebyggelse;(en)built-up area;(ru)застроенная территория;(et)täisehitatud ala;(es)zona edificada;(nl)bebouwde kom”[Line = 22, Column = 783]”

  • The used diverse prefixes such as ‘odrl’ and ‘cc’ in ‘drl:inheritFrom=”http://www.europeana.eu/rights/out-of-copyright-non-commercial/”‘, cc:deprecatedOn=”2027-11-10″.’
  • The rarely used terms that were omitted in the ct crosswalk such as ‘ore:Aggregation/edm:ugc.’
  • HTML tags that include ‘>’ . For example,

‘<edm:isShownAt rdf:resource=”http://galenet.galegroup.com/servlet/ECCO?c=1&amp;stp=Author&amp;ste=11&amp;&lt;>af=BN&amp;ae=T152600&amp;tiPG=1&amp;dd=0&amp;dc=flc&amp;docNum=CW119814160&amp;vrsn=1.0&lt; >&amp;srchtp=a&amp;d4=0.33&amp;n=10&amp;SU=0LRF”/>’

Especially, it causes significant semantic errors, because I use ‘>’ as a separator to find the used terms/element names and values in xml form. ‘>’ in the value causes losing original values, and specially it results broken links when ‘>’ was used in URLs. However, changing the logic of the program fixes the problem recovering the original values, but the broken link problem is remained, if the original value was already the broken link.

  • The broken links in the value and in the rdf:resource.
  • Few files in a set have no records such as <ListRecords></ListRecords>.
  • Few records have no metadata description with ‘null’ value. For example,

<record><header><identifier>http://data.europeana.eu/item/2048605/data_item_bbaw_dta_30400</identifier><datestamp>2015-07-18T07:24:17Z</datestamp><setSpec>2048605_Ag_EU_DM2E_bbaw_dta</setSpec></header><metadata>null</metadata></record>

  • Few records have no data provider and provider. In this case, the default provider is Europeana.
  • In some records, few descriptions have no values such as ‘<dc:rights xmlns:dc=”http://purl.org/dc/elements/1.1/”></dc:rights>’

An Example Original Record in RDF/XML form

<?xml version=”1.0″ encoding=”UTF-8″ ?><OAI-PMH xmlns=”http://www.openarchives.org/OAI/2.0/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd”><responseDate>2015-11-02T16:49:25Z</responseDate><request verb=”ListRecords” set=”9200386_Ag_EU_TEL_a1194_BSB” metadataPrefix=”edm”>http://oai.europeana.eu/oaicat/OAIHandler</request><ListRecords>
<record><header><identifier>http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665</identifier><datestamp>2015-05-11T16:52:40Z</datestamp><setSpec>9200386_Ag_EU_TEL_a1194_BSB</setSpec></header><metadata><rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:edm=”http://www.europeana.eu/schemas/edm/”><edm:ProvidedCHO rdf:about=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/><edm:WebResource rdf:about=”http://lod.b3kat.de/title/BV022760260″><dc:description xmlns:dc=”http://purl.org/dc/elements/1.1/”>Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section)</dc:description></edm:WebResource><edm:WebResource rdf:about=”http://www.mdz-nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:bvb:12-bsb10000727-3″/><edm:WebResource rdf:about=”http://bsb0mdz-upload.bsb.lrz.de/~europeana/bsb10000727/download/thumbs/bsb10000727_00003.jpg”/><edm:Agent rdf:about=”http://d-nb.info/gnd/5167145-1″><skos:altLabel xmlns:skos=”http://www.w3.org/2004/02/skos/core#”>Schlesische Gesellschaft für Vaterländische Kultur. Technische Section</skos:altLabel></edm:Agent><ore:Aggregation xmlns:ore=”http://www.openarchives.org/ore/terms/” rdf:about=”http://data.europeana.eu/aggregation/provider/9200386/BibliographicResource_3000044582665″><edm:aggregatedCHO rdf:resource=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/><edm:dataProvider>Bavarian State Library</edm:dataProvider><edm:isShownAt rdf:resource=”http://www.mdz-nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:bvb:12-bsb10000727-3″/><edm:object rdf:resource=”http://bsb0mdz-upload.bsb.lrz.de/~europeana/bsb10000727/download/thumbs/bsb10000727_00003.jpg”/><edm:provider xml:lang=”en”>The European Library</edm:provider><edm:rights rdf:resource=”http://www.europeana.eu/rights/out-of-copyright-non-commercial/”/></ore:Aggregation><ore:Proxy xmlns:ore=”http://www.openarchives.org/ore/terms/” rdf:about=”http://data.europeana.eu/proxy/provider/9200386/BibliographicResource_3000044582665″><dc:creator xmlns:dc=”http://purl.org/dc/elements/1.1/” rdf:resource=”http://d-nb.info/gnd/5167145-1″></dc:creator><dc:description xmlns:dc=”http://purl.org/dc/elements/1.1/”>In: Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section). – 1845 – 1847 nachgewiesen, 1847</dc:description><dc:description xmlns:dc=”http://purl.org/dc/elements/1.1/”>Besitzer: München, Bayerische Staatsbibliothek — 4 Bor. 5 o#Beibd.14</dc:description><dc:description xmlns:dc=”http://purl.org/dc/elements/1.1/” xml:lang=”en”>Illustrations: Illuminations</dc:description><dc:format xmlns:dc=”http://purl.org/dc/elements/1.1/” xml:lang=”en”>Printed</dc:format><dc:identifier xmlns:dc=”http://purl.org/dc/elements/1.1/”>BDR-BSBe10003051-29310</dc:identifier><dc:language xmlns:dc=”http://purl.org/dc/elements/1.1/”>de</dc:language><dc:title xmlns:dc=”http://purl.org/dc/elements/1.1/”>Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section) im Jahre .. – 1847</dc:title><dc:type xmlns:dc=”http://purl.org/dc/elements/1.1/” xml:lang=”en”>Analitic serial</dc:type><dcterms:isPartOf xmlns:dcterms=”http://purl.org/dc/terms/” rdf:resource=”http://lod.b3kat.de/title/BV022760260″></dcterms:isPartOf><dcterms:isPartOf xmlns:dcterms=”http://purl.org/dc/terms/” rdf:resource=”http://data.theeuropeanlibrary.org/Collection/a1194″></dcterms:isPartOf><dcterms:issued xmlns:dcterms=”http://purl.org/dc/terms/”>1847</dcterms:issued><dcterms:spatial xmlns:dcterms=”http://purl.org/dc/terms/”>S.l.</dcterms:spatial><dcterms:spatial xmlns:dcterms=”http://purl.org/dc/terms/” rdf:resource=”http://id.loc.gov/vocabulary/countries/gw”></dcterms:spatial><edm:currentLocation rdf:resource=”http://lod.b3kat.de/bib/DE-12″/><edm:isSimilarTo rdf:resource=”http://lod.b3kat.de/title/10003051″/><edm:europeanaProxy>false</edm:europeanaProxy><ore:proxyFor rdf:resource=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/><ore:proxyIn rdf:resource=”http://data.europeana.eu/aggregation/provider/9200386/BibliographicResource_3000044582665″/><edm:type>TEXT</edm:type></ore:Proxy><ore:Proxy xmlns:ore=”http://www.openarchives.org/ore/terms/” rdf:about=”http://data.europeana.eu/proxy/europeana/9200386/BibliographicResource_3000044582665″><edm:europeanaProxy>true</edm:europeanaProxy><ore:proxyFor rdf:resource=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/><ore:proxyIn rdf:resource=”http://data.europeana.eu/aggregation/europeana/9200386/BibliographicResource_3000044582665″/><edm:type>TEXT</edm:type></ore:Proxy><edm:EuropeanaAggregation rdf:about=”http://data.europeana.eu/aggregation/europeana/9200386/BibliographicResource_3000044582665″><edm:aggregatedCHO rdf:resource=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/><edm:collectionName>9200386_Ag_EU_TEL_a1194_BSB</edm:collectionName><edm:country>Germany</edm:country><edm:landingPage rdf:resource=”http://europeana.eu/portal/record/9200386/BibliographicResource_3000044582665.html”/><edm:language>de</edm:language><edm:rights rdf:resource=”http://www.europeana.eu/rights/out-of-copyright-non-commercial/”/></edm:EuropeanaAggregation></rdf:RDF></metadata></record>

Converted CT record in rdf/xml form by EuropeanaEDMtoCTConversion

<?xml version=”1.0″ encoding=”UTF-8″?>
<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:ct=”http://www.ct.iopdl.org/1.2/”
xmlns:edm=”http://www.europeana.eu/schemas/edm/”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:dcterms=”http://purl.org/dc/terms/”
xmlns:skos=”http://www.w3.org/2004/02/skos/core#”
xmlns:ore=”http://www.openarchives.org/ore/terms/”
xmlns:owl=”http://www.w3.org/2002/07/owl#”
xmlns:wgs84=”http://www.w3.org/2003/01/geo/wgs84_pos#”
xmlns:rdaGr2=”http://rdvocab.info/ElementsGr2/”
xmlns:odrl=”http://www.w3.org/ns/odrl/2/”
xmlns:cc=”http://creativecommons.org/ns#”>
 <rdf:Description rdf:about=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″>
<ct:identifier ct:source=”Europeana_Bavarian State Library”/>
<ct:identifier ct:uri=”http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:date ct:available=”2015-05-11T16:52:40Z”/>
<ct:identifier ct:collection=”9200386_Ag_EU_TEL_a1194_BSB”/>
<ct:identifier ct:uri=”(edm:ProvidedCHO)http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:uri=”(edm:WebResource)http://lod.b3kat.de/title/BV022760260″ dc:description=”Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section)”/>
<ct:identifier ct:uri=”(edm:WebResource)http://www.mdz-nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:bvb:12-bsb10000727-3″/>
<ct:identifier ct:uri=”(edm:WebResource)http://bsb0mdz-upload.bsb.lrz.de/~europeana/bsb10000727/download/thumbs/bsb10000727_00003.jpg”/>
<ct:contributor edm:Agent=”http://d-nb.info/gnd/5167145-1″ skos:altLabel=”Schlesische Gesellschaft für Vaterländische Kultur. Technische Section”/>
<ct:identifier ct:uri=”(ore:Aggregation)http://data.europeana.eu/aggregation/provider/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:uri=”(edm:aggregatedCHO)http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:description ct:provenance=”(edm:dataProvider)Bavarian State Library”/>
<ct:identifier ct:uri=”(edm:isShownAt)http://www.mdz-nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:bvb:12-bsb10000727-3″/>
<ct:identifier ct:object=”http://bsb0mdz-upload.bsb.lrz.de/~europeana/bsb10000727/download/thumbs/bsb10000727_00003.jpg”/>
<ct:description ct:provenance=”(edm:provider)(en)The European Library”/>
<ct:rights>(ore:Aggregation/edm:rights)http://www.europeana.eu/rights/out-of-copyright-non-commercial/</ct:rights>
<ct:identifier ct:uri=”(ore:Proxy)http://data.europeana.eu/proxy/provider/9200386/BibliographicResource_3000044582665″/>
<ct:contributor ct:creator=”http://d-nb.info/gnd/5167145-1″ ct:authority=”LCMARCrelator”/>
<ct:description>In: Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section). – 1845 – 1847 nachgewiesen, 1847</ct:description>
<ct:description>Besitzer: München, Bayerische Staatsbibliothek — 4 Bor. 5 o#Beibd.14</ct:description>
<ct:description xml:lang=”en”>Illustrations: Illuminations</ct:description>
<ct:format xml:lang=”en”>Printed</ct:format>
<ct:identifier>BDR-BSBe10003051-29310</ct:identifier>
<ct:language>de</ct:language>
<ct:title>Auszug aus der Übersicht der Arbeiten und Veränderungen der Schlesischen Gesellschaft für Vaterländische Kultur, (Technische Section) im Jahre .. – 1847</ct:title>
<ct:typeGenre xml:lang=”en”>Analitic serial</ct:typeGenre>
<ct:relation ct:isPartOf=”http://lod.b3kat.de/title/BV022760260″/>
<ct:relation ct:isPartOf=”http://data.theeuropeanlibrary.org/Collection/a1194″/>
<ct:date ct:issued=”1847″/>
<ct:subject ct:spatial=”S.l.”/>
<ct:subject ct:spatial=”http://id.loc.gov/vocabulary/countries/gw”/>
<ct:publisher ct:place=”(edm:currentLocation)http://lod.b3kat.de/bib/DE-12″/>
<ct:relation>(edm:isSimilarTo)http://lod.b3kat.de/title/10003051</ct:relation>
<ct:description ct:source=”Europeana” ct:recordinfo=”(edm:europeanaProxy)false” />
<ct:identifier ct:uri=”(ore:proxyFor)http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:uri=”(ore:proxyIn)http://data.europeana.eu/aggregation/provider/9200386/BibliographicResource_3000044582665″/>
<ct:typeGenre>(edm:type)TEXT</ct:typeGenre>
<ct:identifier ct:uri=”(ore:Proxy)http://data.europeana.eu/proxy/europeana/9200386/BibliographicResource_3000044582665″/>
<ct:description ct:source=”Europeana” ct:recordinfo=”(edm:europeanaProxy)true” />
<ct:identifier ct:uri=”(ore:proxyFor)http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:uri=”(ore:proxyIn)http://data.europeana.eu/aggregation/europeana/9200386/BibliographicResource_3000044582665″/>
<ct:typeGenre>(edm:type)TEXT</ct:typeGenre>
<ct:identifier ct:uri=”(edm:EuropeanaAggregation)http://data.europeana.eu/aggregation/europeana/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:uri=”(edm:aggregatedCHO)http://data.europeana.eu/item/9200386/BibliographicResource_3000044582665″/>
<ct:identifier ct:collection=”(edm:collectionName)9200386_Ag_EU_TEL_a1194_BSB”/>
<ct:publisher ct:place=”(edm:country)Germany”/>
<ct:identifier ct:uri=”(edm:landingPage)http://europeana.eu/portal/record/9200386/BibliographicResource_3000044582665.html”/>
<ct:language>(edm:language)de</ct:language>
<ct:rights>(edm:EuropeanaAggregation/edm:rights)http://www.europeana.eu/rights/out-of-copyright-non-commercial/</ct:rights>
</rdf:Description>

Comments are closed.