6. Domain standard ABCD and its extension EFG

Home Manual

Download the presentation "BIOCASE PROVIDER SOFTWARE & BIOCASE MONITOR SERVICE" given by Falko Gloeckler at TDWG 2013 conference

Summary

Almost all natural history collections curate fossil specimens. Thus, there is a high demand to accommodate the geoscientific marginal information in the biological data schema. Additionally, there are also geological collection objects in the same institutions which have a huge overlap with this type of data.

As the ABCD schema (Access to Biological Collection Data) is an extensible schema, the Extension For Geosciences has been compiled to better accommodate information on paleontological, geological and mineralogical collection objects. This extended schema is used for the data provision to the Geoscientific Collection Access Service (GeoCASe), GBIF and the OpenUp! Project.

 

Table of contents

1. Reference documentation of ABCD concepts

2. Correspondence between ABCD and ESE (Europeana Semantic Elements) fields

3. Integer primary key on metadata

4. Version of pymssql driver for MS SQL Driver

5. Declaration of several static tables

6. Static string fields and concatenated elements cannot be handled as searchable elements by the BioCASe query engine

7. Problem with the escape character in static strings

8. Placement of images to be mapped by BioCASe via ABCD

9. The license statement on thumbnails must be mapped as an URI in ABCD

10. Thumbnails for Europeana from non-publicly available images

11. Relation between images and objects (representation of cardinality)

12. Interaction with image servers

 

Further Reading

1. Reference documentation of ABCD concepts

Go to top

Related question: Where can I find a complete list of ABCD concepts? [1]

A concept explorer displaying the complete tree structure of ABCD is available at: http://gbif.africamuseum.be/biocaseABCDexplorer/configtool/ratingsweb/
Select the option "show documentation" -> "extended" on the top right to see the documentation of the concepts. Please note that the concepts of ABCD extensions (like ABCDEFG for geosciences) are not listed there.

The developers wiki at BGBM lists the most commonly used fields (recommended for a first attempt of mapping): http://wiki.bgbm.org/bps/index.php/CommonABCD2Concepts

The list of concepts provided by the TMG represents a very important reference for the mapping between ABCD in the prospect of ESE visualisation (the Europeana data model):
http://open-up.cybertaxonomy.africamuseum.be/Forum_ABCD2ESEmapping

The full list of ABCD concepts is also available on the TDWG website.

 

2. Correspondence between ABCD and ESE (Europeana Semantic Elements) fields

Go to top

Related question: How do I map ABCD to ESE? [1]

An updated mapping between ABCD and ESE (the current Europeana data model) has been released in the beginning of 2012. There are two template documents related to this mapping:

  1. A complete list of ABCD fields available for mapping
  2. A “restricted data” list, containing the minimal set of ESE data fields that need to be supplied in order to have your metadata ingested into the Europeana Portal.

Three important notes on this mapping:

ESE will be progressively replaced by the new EDM (European Data Model) in 2012, a fact that may lead to modification of these documents. The first ingest of OpenUp! data into Europeana will take place in February 2012, still via ESE.

Europeana needs data either in dc:title or in dc:description as key identifier of the data. These two elements are actually used by Europeana for the indexing objects and to check their uniqueness in the system. dc:title corresponds to the ABCD fields containing the scientific name of a specimen, other elements can be mapped in ESE via dc:europeana

The elements belonging to the DublinCore namespace (marked with the prefix dc:) can be potentially repeated. This allows the implementation of “one to many” relations between one multimedia document and several attributes depicting it, for instance several scientific names describing one object.

Attached documents to this deliverable:

Complete ABCD/ESE mapping: map_ABCD206-ESE-120202-result-man-p.pdf

Restricted ABCD/ESE mapping: map_ABCD206-ESE-120124-man-restricteddata_0.pdf

Important remark on metadata at general level:

Many providers map the general presentation of their data set into: /DataSets/DataSet/Metadata/Description/Representation/Coverage and/or /DataSets/DataSet/Metadata/Description/Representation/Details. But these fields are currently not mapped into ESE.


3. Integer primary key on metadata

Go to top

The metadata table storing metadata information at dataset level (name of the collection, contact coordinates of collection manager, address of the institution, etc.) should be declared with an integer primary key in the ACBD mapping, even if this table often contains just one line of record. This has to be done via the DB Structure tab of BioCASe (see Fig. 9).

Fig. 9 Declaring primary key

Database managers could be tempted to define a “text” primary key on a field also containing data for convenience reasons (for instance the acronym of an institution), as this table contains a limited amount of records and is not bound via a foreign key to the specimen table. But this solution may generate problems and bugs at the level of the SQL drivers of the database, especially if the column contains diacritic signs. We experienced this issue with an MS SQL server database using the ODBC driver that could not retrieve data from a table when their primary key contained diacritic characters (“é”,”è”, “ê”,”ï”). Using an integer primary key is a workaround to overcome this bug.

4. Version of pymssql driver for MS SQL Driver

Go to top

If you use an MS SQL server in combination with pymssql [1], please ensure that you have at least the version 2.0.0 installed. We faced bugs with prior versions that cannot open a cursor on a remote server and browse the data via "fetchall".

 
 

The SQL tables (or views) containing metadata must be declared inside of the "Static table aliases" drop down windows when you are mapping a schema (see Fig. 10).

 

Fig. 10 Determining metadata alias

This window provides space to declare only one static table, but it is actually possible to declare more than one metadata table just by editing the source XML of the mapping. On BioCASe 3.0 and for a mapping into ABCD 2.06, this XML document is located at: <BioCASe installation directory>/config/datasources/<name of the dataset>/cmf_ABCD_2.06.xml.

The static tables are defined inside of the cmf:settings/cmf:staticTableAlias element defined on top of the page (see Fig. 11).

Fig. 11 Static tables

If you manually added these additional static tables into this XML file in a BioCASe version prior to 2.6, the graphical interface of BioCASe allowing you to define the ABCD mapping of your database will not be synchronized anymore with this XML file. When you use this interface to modify an already existing ABCD mapping, it will just copy into the XML file the name of the first metadata table in cmf:staticTableAlias and erase the additional one. You will have to manually correct the XML file and re-add the missing 'cmf:StaticTableAlias' element after each modification to get your provider running.

 
 

Only the fields coming directly from an SQL database can be searched with a text pattern or a value and used in a scan query. Those written as static string values are just displayed in the result set of a search query. Literal or concatenated fields are marked with the attribute “searchable=”0”” in the presentation page of the mapping (when requesting the access point of the provider). See “Adding Mandatory ABCD Elements” in http://wiki.bgbm.org/bps/index.php/ABCD2Mapping (under the screenshot of the MappingEditor).

 

When defining a static text value in the mapping (a text directly entered in the mapping form and not a database field), you should also avoid the sign “\” that may be wrongly handled by the XML parser of BioCASe (replace it by “\\” if you use it, for instance in file paths).

Related question: Do the images submitted to OpenUp merely have to be in an accessible and stable webfolder? [1]

Images can be placed in a web folder, but the relation between images and reference objects (species, observation, or collection specimen) must be documented through an ABCD record whose identifier will be used to index the image at harvester level. It is suggested that the ABCD record stores the URL of the image, which should be as permanent as possible.

The most commonly used concepts to store links to images are located under the ABCD node;

 

9. The license statement on thumbnails must be mapped as an URI in ABCD

Go to top

Related question: How do we communicate the kind of license to apply to our content to Europeana? [1]

Europeana requires data providers to explicitly mention the license they wish to apply to the content. "Content" stands for a multimedia document made available on the Internet by the contributing institutions (i.e. the URLs provided in the ESE elements 'europeana:isShownBy' or 'europeana:isShownAt') as well as the small images, like thumbnails and previews generated to be displayed on the Europeana portal. The rights on metadata are defined at another level, in the Data Provider agreement.

The mention of this license has the form of an URL linking to the appropriate statements. Within the framework of OpenUp!, this statement must be notified as an URI into the ABCD elements /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/IPR/Licenses/License/URI which corresponds to the ESE element: europeana:rights

There are 12 possible types of a license statement, each one having its URL (only one statement can apply for each object): eight from Creative Commons.

Note: Europeana accepts license URLs with versions and language flags (e.g.: http://creativecommons.org/licenses/by-nd/2.0/es for ‘CC BT-ND version 2.0 in Spanish’).

License examples

URL

Public Domain Mark

http://creativecommons.org/publicdomain/mark/1.0/

CC – Zero (copyright waiver)

http://creativecommons.org/publicdomain/zero/1.0/

CC BY (Attribution)

http://creativecommons.org/licenses/by/3.0/

CC BY-SA (Attribution, Share Alike)

http://creativecommons.org/licenses/by-sa/3.0/

CC BY-NC (Attribution for non commercial use only, others can apply different license on derivatives)

http://creativecommons.org/licenses/by-nc/3.0/

CC BY-NC-SA (Attribution and share alike for non-commercial use only )

http://creativecommons.org/licenses/by-nc-sa/3.0

CC BY-ND (Attribution without derivative)

http://creativecommons.org/licenses/by-nd/3.0

CC BY-NC-ND (Attribution for non commercial use only and without derivative) )

http://creativecommons.org/licenses/by-nc-nd/3.0

 

Four from the Europeana rights statement

Europeana rights statements

URL

Rights Reserved - Free Access

http://www.europeana.eu/rights/rr-f/

Rights Reserved - Paid Access

http://www.europeana.eu/rights/rr-p/

Rights Reserved - Restricted Access

http://www.europeana.eu/rights/rr-r/

Unknown

http://www.europeana.eu/rights/unknown/

 

We strongly recommend the use of Creative Common licenses as this is the kind of licensing mentioned in the data provider agreement for OpenUp!

Note: The article Creative Commons licenses and the non-commercial condition: Implications for the re-use of Biodiversity information by Gregor Hagedorn, Daniel Mietchen, Robert A. Morris, Donat Agosti, Lyubomir Penev, Walter G. Berendsohn and Donald Hobern also provides very valuable information on the Creative Common license applied to biodiversity content.[2]

 

10. Thumbnails for Europeana from non-publicly available images

Go to top

Question: What if I want to provide thumbnails but protect the original document? Do I need a firewall for OpenUp!? [1]

You can use a single instance of the BioCASe provider to publish multimedia documents along with their metadata. This is actually the most common scenario. The URLs of the multimedia document are in this case referenced in the appropriate ABCD element.

These documents can be stored on the same physical server as well as the BioCASe provider or on a different one. In this case, like for any other website you could host, you must ensure that your firewall and Proxy allow any incoming connection on the port 80 (standard HTTP port) as well for the BioCASe provider and the images.

Providing thumbnails by protecting the original document:

Alternatively, you may wish to provide thumbnails to Europeana and OpenUp! by preventing at the same time the access to the original high-resolution material.

It is suggested that you set up two parallel instances of the BioCASe provider during the time of the harvesting of data by Europeana:

the main one containing the text metadata, and a link to the records of the second BioCASe provider. This would be the permanent access point to the data.

the second one containing the links to the high resolution images, that would be only available from a limited number of service and for a limited duration. By configuring the firewall or Apache HTTP the appropriate way, you can restrict to a limited scope the IP addresses used for incoming connection. This technique can be used to ensure that only the Europeana harvesting system can access the original document.

Fig. 12 Firewall and proxy configuration for a single instance of the BioCASe provider (common for metadata and large images)

The deny and allow directives of Apache HTTP can be used to define two different policies for each BioCASe installation if they run on the same server (it is also possible to filter IPs through the proxy module of Apache):

http://httpd.apache.org/docs/2.3/en/mod/mod_access_compat.html#allow

http://httpd.apache.org/docs/2.3/en/mod/mod_access_compat.html#deny

 

Fig. 13 Firewall and proxy configuration for two instance of the BioCASe provider (the second being intended to prevent direct access to original multimedia document by also allowing harvesting of images by Europeana)

Note: this figure describes two different servers but it is also possible to install two different BioCASe on the same server and limit the access to the second one by defining rule in the modules of Apache HTTP.

The primary installation of the BioCASe provider should be open to any IP, but the BioCASe provider responsible for delivering images for thumbnail-creation (that can be either the primary provider software installation or a separate one for the images only) needs to be accessible from the IP number 194.171.184.0/24, which is the Europeana server taking care of the thumbnail generation. Please ensure that the firewall and proxy servers of you institution allow incoming connection to the BioCASe provider serving thumbnail to this range of IP, at least for the server containing the image.

11. Relation between images and objects (representation of cardinality)

Go to top

Question: Can I provide several images describing a single unit to OpenUp!? [1]

BioCASe can handle several images for each specimen, but ESE (the current data model of Europeana) can currently handle only displays of one image per specimen. However, if you have several images for each specimen we suggest to already define a complete mapping in the prospect of the implementation of the next EDM (the new Europeana Data Model) that shall be able to ingest several images per metadata record.

Once the mapping of images has been defined, the local query tool of the BioCASe provider can also display the images in its query tool on-line interface, e.g.:

http://gbif.africamuseum.be/biocase_rmca/querytool/details.cgi?dsa=STERNA_gbifmapping&detail=unit&wrapper_url=http://gbif.africamuseum.be/biocase_rmca/pywrapper.cgi?dsa=STERNA_gbifmapping&schema=http://www.tdwg.org/schemas/abcd/2.06&inst=RMCA&col=Aves&cat=RMCA%20A.8972

The detailed steps are:

  1. The URL of the image should be mapped under the /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/FileURI element
  2. Currently the harvester can only ingest one image into the Europeana portal. If you select a "preferred" image to be displayed in Europeana, it should be put first in the list of displayed images in the ABCD XML. The BioCASe provider currently does not have any mechanism to sort elements in a list, but this can be implemented at database level by using an SQL view with an ORDER BY clause on a flagged column. If you cannot create immediately this SQL view, you can first focus on the basic configuration before working on this. The sort order can be defined later on with minimal changes in the configuration of an ABCD dataset.
  3. Important: Do you have a website where all the images are presented together with the specimen information? If so, then please map the URL to this web site to /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/ProductURI. This URL can already be provided as a link detailing the image with the current ESE model.
 

12. Interaction with image servers

Go to top

Question: Can images be served from an image server like Morphbank installation (e.g. http://morphbank.digitarium.fi/)? [1]

For thumbnail creation, the Provider must register a URL pointing to a raw file and not the URL of a viewer-application based e.g. on Flash, nor of a webpage containing the image. However submitting images with Morphbank should not be a problem as it seems that this tool gives a link to the source image (in different formats), just by adding a suffix to the URL:

HTML page: http://morphbank.digitarium.fi/?id=3500032

Original TIFF: http://mbimages.digitarium.fi/?id=3500032&imgType=tiff

High Res JPEG : http://mbimages.digitarium.fi/?id=3500032&imgType=jpeg

Medium resolution JPEG: http://mbimages.digitarium.fi/?id=3500032&imgType=jpg

One of the images links should be documented in the ABCD mapping.