05 February 2008

Data management and the curation continuum: how the Monash experience is informing repository relationships

Andrew Treloar, Director, Australian National Data Service Establishment Project, Monash University
Cathrine Harboe-Ree, University Librarian, Monash University
Abstract:
Repositories are evolving in response to a growing understanding of institutional and research community data and object management needs. This paper (building on work already published in DLib, September, 2007) explores how one institution has responded to the need to provide management solutions that accommodate different object types, uses and users. It introduces three key concepts. The first is the curation continuum, which identifies a number of characteristics of data objects and the repositories that contain them. The second divides the overall repository environment based on these characteristics into three domains (research, collaboration and public), each with associated repository/ data store environments. The third is the curation boundary, which separates each of the three domain types.
This one was really aimed at the academic environment, but I hoped that there would be something for us to learn here too. I think it was beneficial. I have the pdf file for those more interested.
The core of Andrew's presentation was his slide on the Data Curation continua identified so far:

Object:

Less Metadata <-> More Metadata
More Items <-> Fewer Items
Larger Objects <-> Smaller Objects (different reqts)
Objects continually updated <-> Objects static
Management:
Researcher Manages <-> Organisation Manages
Less Preservation <-> More Preservation
(eg. no commitment to those presentations being around forever on Slideshare)
Access:
Closed Access <-> Open Access
Less Exposure <-> More Exposure
(His paper also stresses the importance of going well beyond access into exposure and discoverability using a range of techniques such as OAI-PMH, RSS feeds, search engine spidering and federated search.)

How does this continua help them map out their Repository requirements? That is where their three different environs came to life.
From the paper's conclusion (as this is of some relevance to our DAMS/Mediabin):
When the ARROW philosophy was initially conceived it was thought that a single institutional repository that was integrated, interoperable and flexible would provide the best platform to support teaching and research at Monash. The single common repository approach, while initially attractive, has been found to suffer from a range of implementation challenges and fails to provide adequate management solutions for data generated by researchers over the entire research lifecycle. These challenges can be best addressed when considered in terms of the data curation continua. The ARROW, DART and ARCHER projects have seen the evolution of this concept into a more nuanced understanding of the different types of content that would need to be managed, and the different audiences and uses for that content. This has led to an acceptance that multiple, albeit interoperable, repositories would be better. One set of decisions about what to do for each of the continua leads to three different sorts of repository domains. Monash University is calling these research (DART), collaboration (ARCHER) and public repositories (ARROW) respectively. A further management concept, the curation boundary, provides a mechanism for determining when and how objects can be moved between the domains.

We may not always need to use something like these three stages and currently we just use two – private (museum staff only) and public (web). It could be, however, that we will soon require a medium stage where we are more open to collaborative ventures and cooperative creation of our digital collections. Perhaps that also comes in via tagging of public assets?
As knowledge about institutional and data management repositories evolves over the next few years, these ideas will be further explored, by Monash and many other institutions. I guess what he was saying is why apply the one set of rules to everything if not everything is to be kept/preserved forever – perhaps as objects cross the curation boundaries, different rules can be applied by workflow? A good example would be the generation and attachment of metadata?
Andrew is now setting up the ANDS.
During questions, both Catherine and Andrew talked about developing the new people needed to take such projects forward. It is a growth area for librarians, but there are not a lot around who have the full compliment of both IT and IM skills. Data management and other curatorial skills will be required for us (for the ECM system).

No comments: