Sunday, October 23, 2011

TDWG 2011: my post-partum rant

Fresh out of TDWG 2011 I feel the urge to unleash the monster in me.  I think we had a great meeting overall, lots of new ideas, great symposia, good talks, bad presentations, the usual mix we get at any given meeting we go to but with the added bonus from across the pond.

I was heavily invested in the preparation of TDWG 2011.  I take full responsibility for everything that went well and (especially) bad.  I was indeed part of not so great decisions and in many cases I could have done or said something that I chose not to, and therefore I am not removing myself from any equation.  As part of the program committee, I learnt about the million of contraints (some of them ridiculous) that often enough prevent us from doing the right thing.

The Good: Lightening talks!  Having ADD that is indeed my favorite session of any meeting.  As I said it before, for me is just like watching commercials.  I can follow with excitement without losing focus.  Not to mention, some of these projects are so imaginative and inspiring.  However, the session was forced to be placed on Friday morning, last talks session of the meeting :-(  when many had already left or clearly overslept from the previous night's party. Obviously a bad choice and you would want to know why. Why the most creative session, populated by some of the youngest, brightest and most productive has to be relegated to last?
a) We had a high number of symposia and contributed papers and pretty much every submitted contribution was accepted.
b) The conflicts with people giving talks in symposia or sessions and also having to participate in parallel interest group meetings was *unbelievable*!!
c) To make b) worse, some people were there from Monday to Wednesday (or Thursday) and all of their activities had to run during those days only. Apparently, everyone had to be accommodated
d) The damn Wednesday excursion!! That really ticks me off, because of all meetings I go to (and I go to way too many) this is the only one where it is believed that people need a rest after 2 days of heavy labor.  Are you kidding me??!!  I'll drop d) now and pick it up later.

Given all of the above, the only available spot for the most exciting session was Friday morning. And if we look into this a little bit more carefully, we clearly prioritized the 'established' efforts vs the new and upcoming. Well, if you left TDWG early I want you to know you missed the best Gumbo!

The Bad: The week-long ordeal. Realistically, this meeting could have ended on Thursday, or even earlier if properly trimmed of some of the unnecessary fat (won't go into details). To make matters worse, we have excursions in the middle of the week apparently because:

e) People get tired after 2 days of meetings (well, perhaps of each other) and they need a distraction (which makes me think I am not the only one with ADD).
f) Excursions are conducive of ideas exchange and foster collaborations (I am convinced that evening excursions to the bar are far more productive that any trip on any river).

Excursions are fun, but they don't need to be mandatory and having them in the middle of the week force people to stay longer even if they do not have any interest in a quick break/vacation (I am here to work, please!). Excursions need to be arranged immediately before or after the meeting and let's have the days to ourselves to work, because that is what I am using my funds for.

Additionally, we still have to find the best balance between contributing papers and group meetings. It's hard, really hard, but these decisions need to be made by a group of people evaluating all options, and not just an appointed person. For example, why not to give space to talks who are 1) providing significant contribution to standards development or their implementation and/or 2) are risky, exciting and promise to impact the current landscape.

The Ugly: The last session in TDWG (aka 'TDWG panel'). Felt more like 'The Last Tango in Paris' but badly directed and with a more glooming ending. I am not too sure about the purpose of this session and don't actually recall how it was conceived.  With no clear goal or focus, as it always happens, people voice their opinions in the absence of any framework. So, it is like asking the floor, 'what do you think we need now?'  Oh, wait! That was exactly what we were asked! The answer is a classic, of course.  We all need money.  We all want money.  We can't operate without money.  And when TDWG had access to a big chunk of money, primarily as I understand it to fix the infrastructure.....I can't finish this sentence, because I truly don't know what happened. What I do know for sure is that we still don't have a viable infrastructure.

I am always surprised to hear people talking about national and international funding opportunities as if we were entitled to receive them. Yet, when I visit the TDWG home page and I click the 'about tdwg' link I see no reason why we should be entitled to anything.  I raised the issues about having a vision because I want to be told why it is better for me to be part of this community.  Why do I need to operate within and not outside.  Makes me think of NESCent, for example, and how it fosters not just collaboration but real, tangible growth. At a *VERY* different scale of course, TDWG could take a similar approach and become the place where young people want to go to be supported, mentored, join existing efforts, contribute and develop their potentials to become the driving force of TDWG 20XX. Yet, how many students do we see at TDWG? We couldn't even establish a student paper prize for lack of young bodies.

This is what I see:

a) a totally dysfunctional infrastructure, but maybe that's not even the core problem.
b) an Executive with exceptional individuals, best intentions, and impressive commitment (and I really mean this!!) but inefficient as a group and without a collective voice.
c) Lack of pure, simple imagination.

I certainly don't pretend to have all the answers, but these are a few ideas/observations:

a) Make investments, no matter how small these are, but make them!  We do have some funds (we don't care how much!). Identify those promising ideas that would clearly benefit from a direct injection from the proof of concept stage to full implementation.  For example, pick from the lightening talks list, choose a couple of exciting projects (I would have a hard time to pick!). They don't have to be big investments.  Support grad students to join an exciting team. Start small but write a story to tell.

b) Identify an impeding bottleneck (as if we didn't have enough around to choose) and launch a challenge a la Rod Page but with a better prize ;-)  Students often enough crave just for the opportunity to be part of an exciting team.  Offer them that opportunity! Foster their growth by bringing projects and students together. For example, develop a contest, offer a little cash and an opportunity for the best to work with, lets' say, the Apple Core or Geomancer teams!

c) Reference implementations are crucial. Limiting TDWG to those projects that only feature TDWG supported standards (that was unfortunately suggested!) would be foolish. Be inclusive and promote adoption by showing the community why it makes a difference.

d) Do we have a well documented set of use cases developed within TDWG that support the need for all these standards? How do we keep track of progress besides telling each other what have we been doing in the past 12 months? Most of all, how effectively do we communicate needs, progress, successful implementations to downstream consumers? How do we make them care? There is a clear disconnect between consumers and developers. Funding agencies are very sensitive to that connection. A potential solution would be for each task group to develop and document use cases (at least one!) to justify their existence, not only for our own sake, but for the community sake. Preliminary data and prototypes should be then used to seek external funds to support the goals of the group. In other words, no use cases, no task groups (aka let's cut some of the crap).

e) It's not just about TDWG seeking funding. What about the people who make up the TDWG community.  If we think about it, we are actually pretty damn well funded individually or by project, but it doesn't trickle down enough to the task groups per-se. Maybe, next time we write a proposal we should stress and rethink of TDWG as the venue where at least some of our activities and project development will take place.

f) Promote Open Source! The suggestion was made this year (not by me) to have one (only one!) session featuring only open source projects but that was flatly rejected.  Why?!

g) Mini-hackathon are fun and worth the investment, especially when students are involved. They can be small, not expensive, and have an easy target (as in doable in 2-3 days). Task groups should generate ideas based on well defined goals (remember those use cases?) and help supporting this activities as much as they can.

The point I am trying to make is that TDWG is an umbrella organization and it is true that ideally injection needs to come mainly from the community. However, I see a community-organization co-dependency here as in a vicious cycle.  The community would be more supportive if TDWG were more relevant, more significant, made a larger impact. It's about what we can do for each other. It's the 'help me to help you' approach that I think needs to be more emphasized. Small investments can go a long way and set a different stage, build incentives, plant ideas and foster their growth. I think we can do way better and we should just stop whining about lack of funds. We do have funds. Many projects that are relevant to TDWG are funded indeed (do I have to name you a list?). Let's just be more creative on how we use them. Let's rework on the core TDWG activities, clarify the focus of these interest and task groups. And let's hear TDWG's voice. I am listening.

I am a member of TDWG and stubborn enough to stay one.

Thursday, September 22, 2011

Postdoc in phyloinformatics available

A postdoc position in phyloinformatics is available at the Florida Museum of Natural History, University of Florida, to work on the integration of database developments and analytical workflows.  Experience with phylogenetic and workflow software (RaXML, MrBayes, Galaxy, Kepler etc.) and a background in data management is highly desirableThe successful candidate must have a PhD in biology, computer science, computational biology or related fields. Additionally, Java will most likely be your everyday cup of coffee.  The position is available for at least 2 years. Please contact me for additional details.

Wednesday, September 7, 2011

Lab openings

I have openings in my lab to work on the evolution, systematics and biogeography of preferably Campanulaceae and Melastomataceae (but I am open to other groups too).  If you are interested, contact me directly and apply to UF by the 31st of December, 2011.

Saturday, August 27, 2011

Call For Participation: Steps towards a Minimum Information About a Phylogenetic Analysis (MIAPA) Standard

Many phylogenetic analysis results are published in ways that present serious barriers to their reuse in numerous research applications that would stand to benefit from them. While some of these barriers are well understood, such as issues with adherence to standard exchange formats, those centering on the associated metadata necessary for researchers to evaluate or reuse a published phylogeny have only recently begun to be articulated. One of the critical next steps towards formalizing these metadata requirements as a minimum reporting standard is to convene meetings of key stakeholder communities with the goal to identify information attributes  necessary and desirable for facilitating reuse, and to build consensus on their priority. To this end, we are holding a workshop at the 2011 Biodiversity Information Standards (TDWG) Conference to determine how a future reporting standard for phylogenetic analyses can best serve biodiversity science and related research applications.  We invite all interested colleagues to participate.


The workshop of the Biodiversity Information Standards (TDWG) Phylogenetics Standards Interest Group held at the 2010 TDWG conference included a project focused on how to publish re-usable trees that can be linked into an emerging global web of data.  Through follow-up work, this led to the following tangible results:
  1. An online draft report of the 2010 TDWG workshop [1], and a corresponding manuscript on best practices for publishing phylogenetic trees (Stoltzfus et al. in preparation);
  2. An 2011 iEvoBio presentation on “Publishing re-usable phylogenetic trees, in theory and in practice” [2];
  3. lighting talk presentation and Birds-of-a-Feather gathering at 2011 iEvoBio, and
  4. A survey group that explored barriers to re-use and developed plans for a survey
These activities have considerably clarified our understanding of the theory and practice of publishing re-usable phylogenetic trees: how many phylogenies are published each year, the (low) frequency of archiving, what archives and tools are available, what policies are in force, etc.  We have identified a number of barriers to re-use involving such aspects as technology, standards, culture, and access.  
Many of these barriers can be interpreted as a consequence of the lack of a community-agreed standard for what constitutes a well documented phylogenetic record.  In the absence of such a standard, trees are often archived as image files rather than in appropriate data exchange formats, and lack important accompanying information (metadata), such as externally meaningful identifiers, that would be needed to make them useful to others. The idea of a Minimum Information About a Phylogenetic Analysis (MIAPA) standard has been suggested [3], but so far there has not been a deliberate process to develop and disseminate a community standard.  Meanwhile, a number of systematics and evolution journals have begun to require archiving of the data underlying published research findings [4].  The emerging cultural shift in data archiving and sharing promoted by this policy change offers a unique window of opportunity to move ahead with the development and actual specification of a MIAPA standard.
Similar to other minimum reporting standards [5], the primary focus of a future MIAPA standard would be on defining a “checklist” of metadata information attributes that, at a minimum, needs to accompany an archived phylogenetic analysis, and to which standards values for these attributes would need to adhere. The key step in developing community consensus on these elements of the standard is to convene a series of meetings that collectively involve participants from all major groups of stakeholders who would be affected by such a standard, such as users, producers, publishers, or archivists of phylogenetic analyses.  To aid this process, the Phylogenetics Standards Interest Group is holding a workshop at the 2011 TDWG conference, with the goal to obtain consensus requirements and priorities for a MIAPA checklist for the purposes of biodiversity science, taxonomy, museum collections, and related research applications.

Goals and deliverables

The main goal of the workshop is to develop a shared understanding of the role that a MIAPA standard could play in facilitating re-use of phylogenetic analyses for the biodiversity science and related communities, and what the standard would need to specify in order to  best fill that role. Possible deliverables include
  1. A draft set of information attributes that should or could be included in a provisional MIAPA checklist, with a level of consensus for each of them.
  2. A database with use-cases based on exemplifying publications, that report phylogenies to elucidate a broad spectrum of questions relating to biodiversity science.
  3. A refined MIAPA survey to be informed by biodiversity science cases for reuse.
  4. A plan for further community engagement and consensus-building among biodiversity science stakeholders.

Workshop format

The workshop will start with a few presentations focused on (i) introducing MIAPA and its potential in facilitating reuse (J. Leebens-Mack); (ii) summarizing recent developments and current status of MIAPA-related efforts (A. Stoltzfus); and (iii) past experiences and resulting best practice recommendations on developing a minimum reporting checklist standard (D. Field). The rest of the workshop will be hands-on.  Participants in the workshop will break out into groups to address separate issues according to the anticipated deliverables and best practice recommendations.
The workshop will be 1.5 days in duration, and be held during the 2011 Biodiversity Information Standards (TDWG) conference, to take place Oct 17 to 21, 2011 in New Orleans, USA. (  The workshop will start in the afternoon of Monday, Oct 17, and end on Tuesday. Oct 18.

How to participate

Participation in the workshop is open to everyone interested. However, space is limited, and we therefore ask that, if you are interested in attending, to please communicate your interest through the MIAPA discussion group [6]. This will also allow us to include you in pre-workshop planning. Since the workshop is part of the TDWG conference, participants will need to register either for the full conference, or for the days of the workshop.  
The organizers will provide an electronic venue for participants to share ideas and develop plans in advance of the workshop.  After the initial presentations, participants will self-organize into task groups.  
  1. Nico Celinese, University of Florida
  2. Hilmar Lapp, NESCent  
  3. Jim Leebens-Mack, University of Georgia
  4. Enrico Pontelli, New Mexico State University
  5. Arlin Stoltzfus, NIST & University of Maryland


[1] Whitacre et al. (2010). Current Best Practices for Publishing Trees Electronically.
[2] O’Meara et al. (2011). Publishing re-usable phylogenetic trees, in theory and practice. Available from Nature Precedings<>
[3] Leebens-Mack, J., T. Vision, et al. (2006). "Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA)." Omics 10(2): 231-7.
[4] Whitlock, M., M. McPeek, M. Rausher, L. Rieseberg, and A. Moore (2010). Data Archiving (Editorial). The American Naturalist 175(2): 145.
[5] Taylor, C.F., D. Field, S. Sansone, J. Aerts, R. Apweiler, M. Ashburner, C.A. Ball, et al. (2008). Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nature Biotechnology 26(8): 889-96. doi:10.1038/nbt.1411
[6] MIAPA discussion group:

Thursday, July 21, 2011

Why am I going to TDWG 2011?

Well, I promise you, the location has little to do with it (well, maybe more than just a little). I already envision myself sitting in some cool bar in the Frech quarter, eating soul food (drinking a little), listening to great music and talking about, let's see, biological collections digitization, data acquisition, data integration, phylogenetics standards, interoperability, cool new tools (like BiSciCol...because I am not biased :-), etc. etc.

Jokes aside, why do I really want to go to TDWG this year?!
There is so much going on right now, such a renewed interest in collections and digital data. First BiSciCol and then VertNet were funded, iDigBio is in place, 3 awesome Thematic Collections Networks have also been funded, covering 90+ Institutions in 45 US States, in addition to a bunch of other collections being supported by NSF through their regular programs. Can it get much better in these economic times? Some exciting new blogs have bee popping up lately, proposing cool ideas, different approaches, encouraging us to think outside the box.  Yes, we've been talking about these topics for a long time but I get a clear sense that now we can DO things, and can go well beyond talking about them.  We can actually experiment now and scale up our ideas to see if new approaches can be successfully implemented. We finally have the means! We have the attention of our funding agencies and a few seeds have been planted already (can't stop botanizing!).

I am excited because with this renewed feeling of being able to actually change things, make progress, provide a clear input, we can come together as an inclusive community.  This is perhaps a first solid opportunity for real cross-fertilization among different groups, like TDWG and SPNHC, BiSciCol with VertNet and other domain specific networks, and what about getting more biologists involved in the geek world?  Not that we haven't been supporting each other before, but I can see this time is different.  This time is not just about the momentum, the charge we all feel when we get together at a meeting and plan ahead. This time we can all be excited about going back home and getting down to work! I am really hopeful that the recent events and investment provide the glue that we all have been needing for a long time. We are in the same boat and everyone's input is no little contribution.

We have an opportunity to work as a community, think globally and act locally (who said that?! Feels nice right now!). It is not anyone's mission to succeed, it is OUR collective mission.  So, I am excited to gather around our common problems and bottlenecks, and being able to concretely share the load by developing parallel approaches and putting them to work, together. The goal ultimately is to create an environment where we can all do better science, cool science, where posing new challenging questions will be fun because we know we will have the means to answer them.  It's going to be a great playground! And that's why I am going!

Friday, June 24, 2011

My post-iEvoBio Meeting emotional outburst

I have just come back from the iEvoBio Meeting in Norman, Oklahoma. This is the first time ever I didn't quite mind to be stuck in a place in the middle of nowhere (seriously!) because my fellow prisoners were actually pretty entertaining.  This is what I love about iEvoBio. It does NOT bore you and it keeps you awake despite the heavy drinking session of the previous night; and for someone like me, with ADD, the constant flow of information, often delivered in 5 minutes slots, is just perfect (it's like watching great superbowl-style commercials!). The meeting format offered a little bit for every taste, from more focused presentations to fairly short discussion sessions. What do we get from short discussion sessions? Well, how about a set of quickly vomited ideas, needs, wish-lists, from a variety of people with often different backgrounds, seeds to bring home and plant to see whether anything germinates, either in your lab or someone else's. This is what's fun in science! I just love to be there and watch what will happen next, e.g. next year at iEvoBio.  We really need iEvoBio to remind us every year of the status quo in evolutionary informatics, what's cooking, what are the missing ingredients, the limitation and potential of the tools we build, and the beauty is we don't really need a week to catch-up.  Two intense days (and nights :-) are actually great!  Last year in Portland, OR, the meeting was fantastic, but this year I enjoyed it no less! For a great list of projects and new tools presented see Recology blogpost. Just a few of my favorites include Map of Life, TreeBASE with an R interface, Phenoscape, Ontogrator, and BirdVis. Of course, we presented our BiSciCol prototype and our slides are up in slideshare now. We didn't win the challenge though, despite me trying to cheat the system by voting from different browsers (I couldn't help it, it was more of an experiment, really ;-) but BiSciCol is still in its infancy stage and we will present a more mature product at the next TDWG meeting in New Orleans, so stay tuned!
Can't wait for next iEvoBio meeting in Ottawa! In the meanwhile I am charged-up and super ready to go back to work!

Wednesday, April 13, 2011

Talking about names....

A new Angiosperm phylogeny paper is finally out!  It clarifies relationships at some of the deeper messier nodes and reinforces our previous knowledge of some others.  The point I want to make here is that the authors use phylogenetic nomenclature to name major clades.  They state:

"For higher clades, we consistently use PhyloCode names (see Cantino et al., 2007 ) whenever these are available; these names are always in italics (e.g., Pentapetalae, Mesangiopsermae, Rosidae, Fabidae, Malvidae). Note that Rosidae (sensu Cantino et al., 2007 ) does include Vitaceae. Our use of family and ordinal names follows APG III (2009) as a formal point of reference; for Caryophyllales, we follow Cantino et al. (2007; hence, the use of italics), which matches the APG III circumscription. For additional recent discussion on families and their status, see the Angiosperm Phylogeny Website (Stevens, 2001 onward). We recognize that some broader family circumscriptions favored in APG III are controversial and can obscure underlying diversity (e.g., Passifloraceae s.l.), which would be evident with narrower circumscriptions."

I hear left and right how the PhyloCode is going to bring a mess in the field of Biology by changing all names and reshaping the Classification (impossible, as it is just a nomenclatural tool). Well, here it is, another shocking paper that imposes so many new names to the community. The way I see it, we now have some pretty well refined concepts attached to names, the same names that have been previously used idiosyncratically. Names that won't change even if the clade content does change. Names that will always refer to the same ancestors. Names that we could actually query and be happy with what we retrieve. That would be indeed a breath of fresh air.

Tuesday, March 8, 2011

PPR gene family

While generating more GapC and NIA sequences, we have now successfully amplified three PPR alleles and from preliminary runs they seem to be very informative too at species level in Campanulaceae. To make everything even more appealing, cloning is indeed not necessary.  Quick and cheaper results. What's not to love about the nuclear genome! So far, so still excellent!

Monday, February 7, 2011

Low-copy genes

How can something low makes you feel so high?!  Our first test run with low-copy gene sequences gave excellent (better than expected!) results.  A well resolved topology with great support!  Can't wait to show some results.  We are now working on a paper that focuses on alpine taxa and we will show the first Campanuloideae tree generated with low-copy nuclear genes.

Oxford Journals, please take my money!!

How hard can it be to renew a Journal subscription? Well, looks like Oxford Journal is gifting me with a pain in the rear.  I have tried to give them my money since end of 2010.  The renewal site turns down all my credit cards.  Obviously, I had to call in. I spoke to a lady, gave her my membership number and credit card details and hung up happy.  She also confirmed that their on-line system is not performing very well.   Guess what?  That didn't work either.  More than a month later I receive a notice of renewal by mail.  I tried to call the North Carolina office multiple times with no luck, so I am now filling a form by hand.  I have spent enough time on this! Please, take my money!!