Thursday, October 24, 2013

Calling all label images in need of digitization!

Last summer, we presented a talk and demonstration of ScioTR, a new Windows 8 app for digitizing images of documents (specifically images of natural history collections labels), at the SPNHC conference in Rapid City, South Dakota. You can watch a video very similar to the content presented in South Dakota on our YouTube channel. For a description of the data entry workflow in ScioTR, visit ScioTR.com.

I would like to take this opportunity to solicit images of collections labels for testing purposes during our final development cycle. We are looking for various types of labels, from various disciplines, including those that are handwritten, typewritten in different fonts, not straight, dirty, etc. Of course, in return, you will receive an excel spreadsheet of the records in the format of your choice, complete with GUIDs. Here are the only requirements:
  1. Please limit the number of labels to the equivalent of no more than 200 specimens (some specimens may have more than one label/image). 
  2. Include a field list (or we can send the data back to you in Darwin Core)
  3. Images should have a resolution of 72 dpi or greater
Please email robin@scioqualis.com if you would like to participate.

Sunday, October 20, 2013

Storing Decimal Longitude and Latitude in your Collections Database

I spent a fair amount of time determining exactly what database datatype to store longitude and latitude decimal degrees in Scio Qualis. As I was doing a recent data migration, I had to describe it to someone and it occurred to me that I should blog about how I determined the best way to store decimal lat and long in a database while maintaining precision and accuracy, particularly when the data coming into the database may have been stored in other data types.
According to the Biogeomancer, Guide to Best Practices in Georeferencing, (pg 27) longitude and latitude decimal degree values to 0.00001 of a degree have a minimum uncertainty of 2 meters, no matter where you are on the globe. My Magellan GPS measures lat/long decimal values to one more digit from the decimal: 0.000001 of a degree and my cell phone GPS goes to seven digits past the decimal.  I would guess that the cell phone is using cell phone towers to determine this value and the traditional GPS uses satellites. The level of accuracy of these is another (long) discussion.
Most databases have float and decimal data types. With decimal lat and long, I would suggest using a decimal data type, but one could use a float with ample precision. Here is a good discussion of the difference in the context of long and lat. By using a combination of the decimal degrees, stored in the decimal database datatype AND a field that stores the maximum uncertainty as calculated with a tool like the MaNIS Georeferencing Calculator,  you can be sure to accurately capture the precision and accuracy of the measurement. This is the way we do it in ScioQualis.
For typical database decimal data types, one designates the precision and the scale of the decimal, like this: decimal(9,7).
In this case, 9 is the total number of digits (both left and right of the decimal point) stored and 7 is the number of decimal digits stored to the right of the decimal point. So, because decimal latitude can range from -90 to 90 and decimal longitude can range from -180 to 180, you would have the following data types:

Latitude: decimal(9,7)
Longitude: decimal(10, 7)