- Must be globally unique
- Best to mint/create at the Source System of Record (SSoR)
- Should be passed to all 'downstream' data consumers and data aggregators
- Data consumers and data aggregators need to preserve them
- Should be easily resolvable back to the Source System of Record (SSoR)
- Desirable to honor legacy identifiers, which may not be globally unique, but are already in use and cited in existing body of published literature.
When unique identifiers are used correctly, it is less important whether they follow competing standard A or B so long as they are unique. One way to envision this is to use a key and keychain metaphor. The GUIDs become virtual keys and the keychain is the apparatus used to associate one with another. Instead of each key belonging to a car, deadbolt lock, or locker they might belong to one or more specimens, events, citations, or localities. Taking it one step further, ScioQualis lets users define relationships between different keys.
So, what does an example of the implementation of GUIDs, minted by a collections database, passed to data consumers and resolvable on the web look like? (click on images for a larger view)
GUID created and maintained in collections database (SSoR). |
Because the GUID is indexed with Google, it is resolvable by Google - or another search engine. |
If public, the user sees up-to-date information from the source system of record - the collections database. |
With this digital framework in place, when the collections manager of this collection shares this data with an data aggregator like GBIF, VertNet, iDigBio, or any other data consumer, the GUID is passed along. When data aggregators PRESERVE that GUID, you begin to see how easily data can be tracked, updated, etc. If aggregators then indexed their data with search engines, you can imagine the google results - you could find all digital instances of a single record held by data aggregators as well as the record within its home institution, in seconds. You could more easily find duplicates of that record. The list of advantages goes on and on.
Let's see what happens when we index alternate, legacy identifiers with google. Can we resolve them?
In future blog posts, we will discuss the following questions and some suggestions for best practices:
- How do we assign GUIDs to duplicates - as in botanical specimens collected from the same individual (or colony) at the same time by the same person, at the same place and sent to another collection?
- Do we assign and distribute GUIDs for all of the DwC classes?
- How to we link GUIDs to legacy identifiers for specimens?
- Should we go back and somehow affix this GUID to every specimen?
- Does the GUID represent the actual object or the digital record?
- Can they be human readable?
No comments:
Post a Comment