Saturday, July 6, 2013

Maintaining Sticky Data


As far as we can tell, Rod Page coined the term 'Sticky Data' as it applies to biocollections. He brilliantly writes: "Shared identifiers are like the hooks on the burrs, if two pieces of data have the same identifier they will stick together. Given enough identifiers and enough data, then we could rapidly assemble a "ball" of interconnected data."

Liquidambar styraciflua-seeds

The BiSciCol (Biological Science Collections) group is interested in tracking scientific collections and their derivatives. In a recent blog post entitled: BiSciCol, Triples and Darwin Core, they identify the eight relationship triples that relate the six Darwin Core classes.

In theory, we know this linking is important to the usefulness of natural history collections and we know where the linking should occur within collections records. Darwin Core also gives us clues for how linking should occur between 2 or more occurrence records, publications, images, measurements/facts, taxon concepts, and a host of other types of data.

In practice, WHERE do you implement and maintain sticky, highly linked, collections data? The most effective place is in the System of Record (SOR), with the collections themselves and allow those links to flow downstream to data aggregators like GBIF. Read more about that in the previous blog entitled, Resolving Identifiers for Natural History Collections.

In ScioQualis, such links are made via database fields of data type GUID, linked by primary/foreign keys. Here is an example of what that might look like to the user who was linking three occurrence records to one another. In this case, the example is a fairy pin fungus growing on a bracket fungus growing on a fallen oak branch. 

The following screen capture shows two online citations linked to an occurrence record.


The last image shows a list of associated taxa. Each is linked to a taxon record and selected from a drop down. 



The images above represent how a record looks in ScioQualis when the user is logged into the system and has at least read-access to the record(s).  The collection administrator may choose to make those same records available to the public.  To see sample public views click on one or more of the following links:


http://www.scioqualis.com/Resolve.aspx?guid=e26bfc03-f331-e211-9944-00155d472a06
The above link shows the public version of an occurrence with one associated occurrence and the substrate as an associated taxon. All have links to various pieces (some of which aren't completely built out at this moment).

http://www.scioqualis.com/Resolve.aspx?guid=67f8da2f-a112-e211-aeff-8ca98299dd30
This link shows the public version of an occurrence with five measurements and two associated citations (some of which aren't completely built out at this moment).

My final thought here is that providing an infrastructure in which all of these data classes are linked and maintained is fairly complicated, but worth it. The whole thing is a web from which you can pluck any part and pull, making a new web.



No comments:

Post a Comment