Thursday, April 14, 2011

Saturday, March 26, 2011

Development Update

Brian Stucky and I made a first pass at an online prototype back in mid-February. This prototype featured a text-file triple-store back-end, a bit of javascript, a cool modeling widget, and some JSP glue.

Since then, I've been working on a few things to make this prototype more solid:

1) Implementing a database back-end for the triple-store. I've been working with Virtuoso under an evaluation license and have found it fairly easy to work with. I would prefer a free or more open-source solution but it seems Virtuoso is the best product out there for our needs.

2) Building queries on REST services. Not wanting to make the REST services an after-though to our architecture, I am building the REST services into the our web prototype.

3) Re-building SPARQL queries to be more portable, scalable, and faster. Specifically, the transitive closure functions were previously built using specialized syntax not portable to larger systems. The new syntax is more portable and will be alot faster. E.g:

SELECT ?s ?dist FROM
WHERE
{
{
SELECT ?s ?o
WHERE
{
?s bsc:relatedTo ?o .
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_min (1), t_max (20), t_step ('step_no') as ?dist) .
FILTER (?o= dwc:spec12345)
}

Hoping to get this new prototype online in the next week or two.

Thursday, March 3, 2011

Preliminary Implementation Diagram

An indication of where we are headed with our architecture/implementation. This is a draft form and comments are welcome!

Friday, February 25, 2011

BiSciCol Technical Meeting: University of Florida, 23-24 February 2011


The BiSciCol Technical team had a successful brainstorming meeting over two warm Florida days.

Attendees:
Nico Cellinese (UF), Kate Rachwal (UF), Reed Beaman (UF), Gustav Paulay (UF), Russ Watkins (UF), John Deck (Berkeley), Brian Stucky (Colorado University), Rich Pyle (remotely, Bishop Museum) and our very special guest Steve Baskauf (Vanderbilt University).

The main goal of the meeting was to define the BiSciCol scope, techcnical goals and the model. Happy to report we made great progress. We are now ready to move to an implementation phase and are confident will be able to release a prototype in July 2011. See below a snapshot of out design document.

Purpose of the System
1. Notify all objects that are related to Object X that Object X has changed.
2. Manage relationship definitions between objects that exist in distributed databases.
3. Manage annotations of objects where annotation is treated as an Object relationship.

Technical Goals of the System
1. scalable design
2. easy coding but must be robust
3. expandable to different domains

Envisioning this as a distributed network with structured content defined. Goal is to allow for multiple data sources, each with their own thematic content. The thematic content is built on defining relationships between objects.

The Model
Linkages between objects is unstructured which allows any object to be tied to any other object via the predicate “relatedTo”. This is similar to SKOS “related” but we have made relatedTo transitive. relatedTo does not require objects to be organized hierarchically.

Following are the triples recognized in the network:

:objectId :relatedTo :objectId
:objectId rdf:type :objectType

:relatedTo :hasProperty :propertyDefinitionURI
:objectId dwc:dateLastModified “YYYY-MM-DD”

Monday, February 14, 2011

News and updates

Since our last meeting in Washington, DC we worked out some of the object relationships and how we want to formalize concept/objects such as individuals, specimens, vouchers etc. for the purposes of our network. We also agreed to consider implementing RDF in our design. To this end, a number of people have since purchased and read (hopefully past the introduction) the book "Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL". Finally, we agreed to deliver a working prototype of the BiSciCol network in July 2011, implementing data from Biocode, CalPhotos, and UF.

During January and the first part of February, John Deck (Berkeley), Rob Guralnick (Colorado U.), and Brian Stuckey (Colorado U.) have been crafting a technical implementation plan for the BiSciCol prototype. CU has offered a server and Tomcat instance for our prototypes and Brian has been implementing that along with getting the web interface going. John has been working on a model that ties together collecting events, specimens, tissues, DNA extractions, and photographs (more object types later), while incorporating location and modification date. Taxonomy is notably absent for now and we will soon be working more closely with Rich Pyle and Rob Whitton (Bishop Museum). Codebase is Java and inferencing/RDF work being implemented in Jena/ARQ (open source Java).

On the 23rd and 24th of February, John and Brian will be at UF for a technical meeting with Nico Cellinese, Kate Rachwal, Russ Watkins, and special guest Steve Bauskauf (Vanderbilt University) to review the UF implementation and proposed model. We will also work remotely with Rich Pyle and Rob Whitton. More details on this meeting will be posted soon. Given our prototype deadline of July 2011 we have a lot of ahead of us, especially in integrating taxonomic names components but we feel we are on target with progress to date.

Monday, January 31, 2011

16-17 December 2010 - Meeting outcome

Our December Meeting brought us all together and we were fortunate to have Davie Vieglais from DataOne and Bob Morris from FilteredPush (a.k.a. Push-me-pull-you).  We reviewed use cases and case scenarios, discussed object relationships and ways to model them.  


We are making progress with designing a technical architecture and on February 23-25, 2011 John Deck (Berkeley), Brian Stuckey (U. Colorado), Kate Rachwal (UF), Russ Watkins (UF), Nico Cellinese (UF), and our guest Steve Baskauf (Venderbilt) will meet at the Florida Museum of Natural History to refine solutions to some of the impending technical issues.


We created a number of subgroups that will be in charge of developing specific aspects of the project:


1. Taxonomic Names (Rich Pyle [chair], Gustav Paulay, Tom Orrell, Chris Meyer, Nico Cellinese, John Deck, Jonathan Coddington, Rob Whitton).
2. Ontology  (Nico Cellinese [chair], Bob Morris, Kate Rachwal, John Deck, Jonathan Coddington, Rich Pyle).
3. Geospatial (Rob Guralnick [chair], Reed Beaman, Russ Watkins, Kate Rachwal, John Deck).
4. Technical Architecture (John Deck [chair], Rob Whitton, Dave Vieglais, Kate Rachwal, Rob Guralnick, Russ Watkins, Tom Orrell, Bryan Heidorn, Jonathan Coddington).
5. Domain Scientists (Chris Meyer [chair], Nico Cellinese, Gustav Pailay, George Roderick, Neil Davies, Rob Guralnick, John Deck, Tom Orrell, Jonathan Coddington, Reed Beaman). Improve on test case list.
6. Sustainability Group (Neil Davies [chair], Rob Guralnick, Reed Beaman, Chris Meyer).
Reed Beaman, Rob Guralnick and Russ Watkins are compiling a conceptual plan that will include specifics on who is doing what, timeline, and priorities. We all agreed to deploy a prototype in July 2001 for testing by the community.

More later.....

Friday, December 17, 2010

Meeting in Washington DC, 16-17 December, 2010

We had a great, productive, fun meeting and notes will be posted soon.  In the meanwhile, I wanted to expose another inconvenient truth: informatics and technology are not always best friends. Check this out!



Here in the above "picture": Chris Meyer, Tom Orrell, Rob Guralnick, John Deck, Neil Davies, Gustav Paulay, Bob Morris, Jamie Whitacre, Kate Rachwal, Meghan Parker, and Nico Cellinese. Additional participants were Dave Vieglais, Reed Beaman, John Keltner, Linda Ward and Jon Coddington. "Photograph" taken by Reed Beaman.