Chado 1.4 at the 2018 GMOD hackathon

As a Tripal developer, one of the fun things I get to do is contribute not only to the Tripal project, but also other parts of GMOD, the Generic Model Organism Database Project. I attended the January 10th 2018 GMOD hackathon at the Plant and Animal Genome XXVI conference hoping to make a few small changes to Chado, the SQL database schema that HardwoodGenomics uses with Tripal.

GMOD CHADO

It was a really great day and I learned a lot about both Chado and SQL working with Stephen Ficklin, Scott Cain, and Emily Grau (others were in attendance, this was the Chado team).

I wanted to outline a bit of what we accomplished and what it means for the 1.4 Chado release! Check out the current progress and issues on the project GitHub.

Conquering Issues

We actually had a backlog of issues tracked in a google spreadsheet, so the first order of business was to convert those into Github issues to easier delegate and discuss them. Thinking about all of these issues for modules I had never used was really helpful in understanding Chado holistically, rather than as a ruleset I stuck my data into for no good reason.

Learn about pushing out a new version of Chado! It's a bunch of perl scripts that combine all the modules into a single SQL file.

New In Chado v1.4

This list may continue to grow, but I was able to make the following contributions at the meeting!

  • CVterm for prop values not just type
  • Update the MAGE module table descriptions to describe expression data from not just Microarray experiments but Next Generation Sequencing, and hopefully future methods as well.
  • Type_id added to analysis table
  • New linker tables biomaterial_project and stock_biomaterial
  • Reached out to the community regarding the eimage table and all proposed linker tables

I'd like to talk a little more about the Controlled Vocabulary term (CVterm) prop values issue, because I think it's a great improvement to Chado, especially as Tripal 3 embraces associating all content with CVterms.

CVterm Support for Values

We created a cvalue_id column for all prop tables! This sounds like a simple change but it's awesome. Right now, if you have a property set, what it does is describe an Chado record with a CVterm and a value. That value is free text. Now, you can choose to specify a cvalue_id (cvterm value id) instead of free text.

What does this actually mean?

Imagine I am entering Biological samples into my Chado database, and I want to assign a record property for the tissue the sample was collected from. I would use the CVterm tissue, and a value that says what tissue: leaf, bud, flower, stem, root. Only tissue gets a CVterm: the others are free text! Now I can easily leverage, say, the Plant Trait Ontology, so I can use a Cvterm to set not just tissue by the tissue value!

The Tripal Analysis Expression module currently allows you to assign biosample property types to CVterms. With this change, I look forward to implementing property value CVterm assignment as well.

As we keep on working on v1.4, hopefully I can write more posts on Chado updates and what Tripal developers need to know.