Metadata for You & Me - Course Content

Metadata for You & Me - Shareable Metadata in Practice: Workflow

Module Content

Screencast
Powerpoint Slides and Other Resources
Module Text

Powerpoint Slides and Other Resources

Download PowerPoint
(links from page)

Metadata sharing is an iterative process. You'll create more and more metadata over time that you'll want to share, and you'll learn something new each time you do this. Even within a single project, it's likely you'll need to perform several rounds of testing before you are able to generate metadata for sharing that you're truly happy with.

Here is a high-level overview of a metadata sharing workflow that you can implement at your institution:

Planning: Choose native metadata standards, who you want to share with, shared metadata standards, and write metadata creation guidelines.
Create metadata, thinking about shareability.
Transform your metadata: start with a conceptual mapping, perform a technical mapping, validate the output, and test conformance to the sharing protocol.
Share: implement the sharing protocol in a production environment, and communicate with aggregators.
Assess your work by looking at your metadata in aggregations.
Revisit earlier steps as needed based on what you've learned.

You may need to revisit your metadata transformation if the result is not valid or does not conform to the sharing protocol. You may also want to revisit a transformation if you see your metadata being used by an aggregator in a way you didn't expect. While it's not necessary or possible to tailor metadata for every single aggregation, seeing your metadata at work in some of the more heavily-used aggregators can provide you will valuable information about what your shareable metadata should look like.

2. Possible architectures for sharing via OAI-PMH

This diagram represents three possible architectures for sharing your metadata via OAI-PMH. OAI-PMH sharing architecture diagram

The workflow at the top of the diagram is the one implemented in digital asset management systems such as CONTENTdm. In this model, a single system manages both metadata creation tasks and exposing metadata records for harvesting via OAI-PMH. To be conformant to the OAI-PMH protocol, the data provider module must provide a representation of each item in the repository in simple Dublin Core (regardless of the format of the metadata stored natively in the repository). More advanced systems may include the capability of supplementing this simple Dublin Core record with records in other formats, for example qualified Dublin Core, MODS, or MARCXML. The system may or may not allow some sort of transformation to be performed on records to provide a view specifically tailored for sharing.

The workflow depicted in the bottom of the diagram is another common one, used in institutions where no one single content management system is in use or that system does not have built-in OAI support. In this workflow, metadata is created in one system, and then exported and loaded into a stand-alone data provider. If the metadata creation system supports XML export, it is often easiest to export in XML then use XSLT to transform the XML from the native system into a format suitable for sharing via OAI-PMH. This may involve simply tweaking records to make them more shareable, or it may involve converting between different metadata formats. Some metadata creation systems might require you to export metadata records in other formats, for example, as plain text with comma separated values (CSV). These export formats will require a bit more programming support and planning to convert into simple Dublin Core and other formats suitable for sharing via OAI-PMH.

The workflow represented in the middle of the diagram is the simplest. Again, it represents a workflow useful when a metadata creation system does not have built-in OAI support. Here the metadata is exported in XML or any other format the system supports. It is then transformed into XML meeting the requirements of the OAI Static Repository Specification. This XML format is oai_dc, plus a few other features designed to assist with providing OAI-PMH responses. This XML file is loaded on to a Web server, and then the URL of the file is then registered with a Static Repository Gateway. The gateway then handles OAI-PMH requests for the metadata in the XML file. Several static repository gateways exist, but they are not well-publicized. If you think the Static Repository option may be best for your institution, ask on the OAI-implementers listserv about gateways where you might register your static file. Due to the static nature of this workflow, it is best used for smaller collections that aren't frequently updated.

3. Appropriate Views

If you plan ahead, you will be able to use one shared record for many aggregations. It isn't practical to create a different shared record for every aggregator. With the general principles of shareable metadata presented here, you should be able to create shared records that will be re-usable.

Don't necessarily assume you should share every single record stored in your local system. Instead, think critically about the content you want to share, and at which level of granularity.

Some records in your local system may represent content that is not appropriate for including in aggregations. In some communities, every record in an aggregation is expected to be publicly accessible, suggesting that it would not be appropriate to share records for restricted-access resources in those communities. In other communities, it is perfectly acceptable and even encouraged to share records for restricted-access or analog-only resources, for the purpose of building a union catalog. Be sensitive to community expectations when determining which records to share.

As discussed earlier in this course, it is essential for you as the metadata provider to make a good decision about the level of granularity for shared records. There is no one right level of granularity for all situations. An illuminated manuscript might appropriately receive page-level description in specialized communities but not in others, whereas the individual page of a mass-produced book would rarely warrant description on its own. In many communities, a group of items in a natural history collection with very similar descriptions is best described as a set, whereas in specialist communities, item-level description would be appropriate. The golden rule of tailoring the view of metadata to a specific use and audience applies here.

In addition, some sharing protocols provide easy ways for aggregators to identify groups of materials that are of interest to them. In the OAI-PMH, the optional "sets" feature groups resources according to features meaningful to the repository designers. In SRU, the CQL query language defines "context sets," commonly agreed-upon sets of search indexes. Context sets often are designed for a specific type of material or community, and thus indicate to the aggregator preferences for some communities over others.

Whatever decisions you make regarding the appropriate view of metadata records to share, be sure to include your rationale in project documentation that you share with the aggregator.

4. Technical Challenges

The principles of shareable metadata outlined in this course represent a "best-case" scenario. But as we all know there are often barriers to achieving that best case, both technical and organizational.

Technical barriers to truly shareable metadata are still, unfortunately, a reality for many institutions. Many digital asset management systems don't support a second shareable copy of their records, instead relying on simpler models such as including or excluding information at the field level, or simply exposing raw records from the local system with no option to create a different view for sharing.

While technical challenges may seem insurmountable at first, there are a few techniques you can employ to mitigate the effects of these challenges. Compromise can be made both on the local record and on the shared record. In the local system, you can employ creative interface design to offset compromises in local metadata records. For example, if your system displays all repeated fields in separate HTML <p> tags, you could use CSS to align the paragraphs to show a more compact display than the default view of a paragraph. You can, conversely, use documentation of your practices to offset compromises in your shared records. By alerting an aggregator to your local practices, that aggregator can more intelligently process your metadata. But don't rely on documentation as a crutch. Many aggregators won't be able to take the time to perform any customization based on your documentation, or the customization required might be too time-consuming for the aggregation to implement.

The bottom line is that some compromise will likely be necessary on both ends in today's reality. Do your best to split the difference between optimizing for your local system and optimizing for shared records. Finally, lobby your vendor for more robust sharing mechanisms! Only by improving the capabilities of our delivery and sharing systems can we achieve our goal of truly shareable metadata.

5. Rights Over Metadata

Some sharing protocols allow you to explicitly state rights over metadata records. Remember that the rights over your metadata records are different than the rights over your resources.

The best practice in metadata sharing is not to restrict the use of shared metadata records. Doing so might restrict your records from being used in a new and innovative form of aggregation that users flock to. There are some situations, however, in which you may have no choice other than to exert some restrictions over the use of your metadata records. They might, for example, have been derived from a source which itself limits the use to which records can be put. In general, you should be as permissive as possible with the rights you exert over your metadata records.

6. Sanity Check

As a final step before sharing your metadata, do a brief sanity check. Ask yourself the following questions:

Appropriate view?
Consistent?
Context provided?
Does the aggregator have what they need?
Documented?

Can a stranger tell you what the record describes?

Next module: Looking Ahead

Metadata for You & Me - Shareable Metadata in Practice: Workflow

Module Content

Screencast

Powerpoint Slides and Other Resources

Module Text