Translating values over time/space: A continuum approach to reading records and data [Wed. April 27th]

The professional work of data and records creation occurs within a specific context that is bounded by a number of factors. These factors contribute to the shape, form, and other aspects of the data. This discussion will talk about translating and reading co-created data from a particular community of practice, and then turn to a broader conversation about evaluating the context of records.


Heather Soyka
Postdoctoral Research Fellow at DataONE
Santa Barbara, CA 93101

An Overview of DataONE: Services, resources and future activities [Wed. April 13]

The Data Observation Network for Earth (DataONE) is an NSF-supported DataNet project which is developing a distributed framework and sustainable cyberinfrastructure to meet the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Now in its seventh year of funding, DataONE has released a number of tools, services and programs that support users in their data management, discovery, preservation and education needs. This overview will provide a brief history of DataONE, its guiding principles and showcase the tools and services available to the community. I will also summarize the education and outreach activities of the project and the opportunities for community participation.


Amber Budden
Director for Community Engagement and Outreach


A primer on natural history collection digitization and data sharing.

Natural history collections contain both historical and contemporary
information about the ecology of our natural and urban areas. The
research and instructive potential of these data are rapidly becoming
more relevant as more and more collections become digitized.

I managed the digitization of over 3 million plant and insect
specimens for the National Science Foundation Tri-Trophic Thematic
Collection Network project from 2011 until 2015. The focus of this
high-throughput digitization effort was on the hemipteran herbivorous
insects (aphids, scales, hoppers, cicadas, and true bugs), their host
plants, and related parasitoids.  At this NCEAS roundtable, I plan to
present to review of contemporary standards in natural history
collection digitization, highlight some of the exciting derivative
research, and outline many of the ongoing challenges natural history
collection digitization still faces.

Katja Seltmann, PhD
Katherine Esau Director / Entomology Curator
Cheadle Center for Biodiversity & Ecological Restoration (CCBER)


Synthetic ecology across scales: a follow-up discussion on hurdles to synthesis

This will be a follow-up to a round-table last July on hurdles to synthesis (here). Look forward to an informal discussion on the process of data synthesis, based on a poster presentation by the GoA group at the CERF meeting last November in Portland, OR. A list of questions for discussion will be posted before the round-table on Wed, Jan 6th.

Here’s a link to the full poster (pdf): CERF 2015_Poster_Large

Large-scale ecological syntheses are increasingly important to understanding patterns, processes, and effects at an ecosystem scale.  However, conducting such syntheses requires lots of data which frequently is considered either large data (large-scale, designed to identify broad patterns not mechanisms, often many investigators or organizational) or small data (intensive, designed to identify mechanisms, often single/few investigators).  We explored a case where we integrated large and small data to examine questions across spatial and temporal scales in the Gulf of Alaska, focusing on the impacts of the Exxon Valdez oil spill.  However, for this discussion we will be focusing on the process of synthesizing disparate datasets rather than the actual data themselves.  Key to integrating data for synthetic analyses is the availability of informative documentation of the data.  We used Ecological Metadata Language (EML), online code sharing (GitHub), and an online data repository (DataONE) to document the data we used and to aid in transparency of these analyses.  Some of the hurdles encountered included a wide variety of poorly documented data formats, and fragmented research (through space and time).  Potential solutions include standardization of data formatting and storage across organizations, and better integration of research efforts by large organizations (government agencies, academia, etc.).  We hope to foster a discussion about these hurdles and potential solutions to synthesizing ecological data across scales.

Rachael Blake, NCEAS Post Doc
Jessica Couture, NCEAS Research Associate
Colette Ward, NCEAS Post Doc

When big data and complex workflows meet the reality of finite data storage: a discussion of best practices for data management

"tapes, backup" CC-BY-SA 2.0 by Martin Abblegen via flickr

“tapes, backup” CC-BY-SA 2.0 by Martin Abblegen via flickr

Scientific workflows — for many of us, it’s a love/hate relationship. We love the fact that they help us keep our stuff organized, but hate the overhead required to maintain them. And then when we find out that our meticulously maintained workflow hasn’t captured some important detail? Oh the frustration!!

This discussion will be broadly about managing scientific workflows, and I hope to hear from everyone about the tools and tricks you have for keeping track of which outputs match with which inputs to an analysis, with which models, and which parameters, which figures, papers, and projects all of those things are connected to. It would be great to hear about a wide range of strategies ranging from how you organize and name your files to how you’ve implemented a workflow management tool like Kepler.

I also hope that we can spin up ideas for workflow management problems people may be facing, so if you have a workflow-related issue or question that you’d like to get input on, please let me know. I’ll make sure you get a few minutes to describe your problem or question so that you can get ideas from the crowd.

And if you’re reading this and thinking “I’m a workflow management pro, and don’t need any help with or ideas for managing my workflow,” then please come to the discussion! We (well, at least I) need your help. I have a homegrown scripted workflow management system for the text analyses I do, which does a great job of capturing a lot of details and documenting relationships between inputs and outputs, but requires me to purge unused outputs (e.g., outputs for all but selected runs of a model) manually. How do the rest of you keep track of which files you can throw away down the line and which need to be kept indefinitely? I need to downsize my data storage and am a little worried about making mistakes when I do this manually, so would love to hear ideas about how to build functions like this into my system.

Hope to see you all for a fun discussion!