Data modeling at Europeana and DM2E
Europeana has developed a data model of its own, EDM (http://pro.europeana.eu/edm-documentation). The aim is to be able to harvest and disseminate metadata from libraries, archives and museums all over Europe.
The model is however much less monolithic than it sounds. The result of years of collaborative work in the cultural heritage community, EDM is not built from scratch. It incorporates data patterns from existing vocabularies (OAI-ORE, SKOS, CIDOC-CRM), and often directly re-uses their elements. In other word, EDM applies to the metadata vocabulary level the very principles it is supposed to enable at the data level, i.e. re-use and connect data on the web. Actually EDM is designed to encourage the adhesion to Linked Data principles within the cultural community and to ultimately enable the seamless plugging of Europeana in the Linked Data Paradigm.
EDM also shows flexibility in that Europeana partners can and are encouraged to create their own extensions: DM2E is designed as a ‘roof’ overspanning various communities and explicitly inviting their specialisations. The same way Europeana itself assembled its model, partners can and should extend it for their own purposes. For example the DM2E project (http://dm2e.eu) is geared towards aggregating data and developing richer services for the digital humanities. It created its own extension to EDM so as to better serve its specific needs. This results in a framework where minimal standardization helps to represent the data fundamentals in a coherent way across the board. As a consequence, DM2E may succeed in making Europeana one of the prime corpora sources for the Digital Humanities. The uptake of this approach by major initiatives such as Perseus (cf. Crane et al 2012) is a clear indicator of this potential.
Yet other projects and (sub-)domains are given freedom to devise their own vocabularies—sometimes even standards. The re-use of shared vocabularies is also expected to facilitate interoperability with other metadata framework, such as for instance the schema.org initiative sponsored by the main search engines. Europeana is able to gain from other efforts connecting their data to schema.org, or requesting extensions to it, e.g. for library-specific data. The same way, Europeana partners will hopefully benefit from Europeana’s own interoperability with schema.org or other comparable initiatives, as long as they comply to the Linked Data paradigm.
Re-using and extending vocabularies is however still an art. Choosing vocabularies is not easy: community uptake, a crucial guiding element, is not often visible. Semantic redundancies across vocabularies remain extremely difficult to identify. The ability to share and compare data is crucial, making initiatives like Linked Open Data even more useful. As a consequence, not only proper vocabulary hosting, documentation and process is also needed to ensure the vocabularies are re-used but also means for vocabulary alignments get essential. W3C is setting the scene, as witnessed in the recently released Government Linked Data Working Group vocabularies (DCAT, ORG..) and new initiatives to host and help developing vocabularies. However projects with domain expertise are still expected to drive efforts, and there is room for standardization organizations to facilitate discussions in their communities. Europeana itself, so as to help new partners create their own EDM extensions and refinements, is starting an effort to gather existing efforts and documentation, a first step towards making available useful best practices.