Open Access Month – OA to Spur Innovation

October is Open Access Month. Throughout the month, guest contributors will present their perspectives on the value of open access to research, scholarship and innovation at The University of Texas at Austin.

This installment provided by Dr. Maryjka B. Blaszczyk,Postdoctoral Research Associate, Department of Anthropology.

A need for open access to research materials to spur new discoveries in biological anthropology

Dr. Maryjka B. Blaszczyk.
Dr. Maryjka B. Blaszczyk.

A major aim of research in biological anthropology is to understand how humans have ended up looking and behaving the way that they do. To understand the evolution of our body form, anthropologists look at fossils. Behavior, however, does not fossilize, and so we turn to studying our closest living relatives, the nonhuman primates, preferably in their natural habitats where they have to deal with selective pressures such as avoiding predators and finding enough food to eat. Primate behavior field data are hard-won, involving substantial investments of time and resources. Apart from jumping through logistical hoops such as obtaining permits and building relationships with local stakeholders in far-flung locales, establishing a new field site for behavioral fieldwork involves months if not years of patiently following wild primates around to habituate them to researchers’ presence. Once habituated, data collection begins, with blood, sweat, and tears invariably spilt as one accumulates precious hours of detailed behavioral observations on this group of primates at this place and particular time.

These investments are one reason given by field primatologists as justification for closely guarding their data. Another is the unique insights they have into the lives of their study animals, having spent hours upon hours of observation time with them. Some primatologists argue that researchers not familiar with their study site and animals may misuse the data if they were to make it widely available, subjecting it to improper analyses or not accounting for information about the study site/animals that is known only to researchers who have worked there. Researchers also generally have many ideas for secondary analyses of their data that they plan to get to in the future.

Each of these arguments is by no means specific to primate behavioral ecology, with very similar arguments having been made, for example, by medical researchers working with clinical trial data. Of course, clinical trial data has a substantially higher status (given its applications for human health and welfare) than primate behavior data, and arguments about the costs and benefits of trial data sharing have been ongoing in high profile forums for several years. Data sharing advocates point to benefits such as new discoveries, better metanalyses, and correction or confirmation of findings in the scientific record, which they argue far outweigh potential risks such as incorrect analyses or data misuse. We all know researchers who have been sitting on data for years (even decades) with plans for secondary analyses, many of which they will never find the time to conduct and publish. In the case of primate field data collected on a specific population at a specific place and point in time – and frequently on endangered primates living in rapidly changing habitats – these data cannot be reproduced, so it is a double shame that they may never make it into the scientific record.

Primate behavioral ecologists are included in Anthropology departments because comparative studies on primate behavior illuminate the ways in which humans differ from and are similar to our closest kin, allowing us to better understand the evolutionary ecology of our lineage.  However, many comparative studies are hampered by poor descriptions of how data in primate field studies were collected and processed, and many large-scale comparative studies cannot be undertaken unless raw data itself is made available. Behavioral ecologists should take a page out of their molecular primatology colleagues’ playbooks, where publication of genetic data alongside scientific articles is the rule. This type of data sharing has enabled large-scale comparative phylogenetic studies that have given us a rich understanding of primate evolution. It is time for primate behavioral ecologists to catch up and to make sharing of data as well as associated behavioral and ecological data collection protocols the norm. Who knows what insights await us.

Open Access Month – Open the Data

October is Open Access Month. Throughout the month, guest contributors will present their perspectives on the value of open access to research, scholarship and innovation at The University of Texas at Austin.

This installment provided by Spencer J. Fox (ORCID ID: 0000-0003-1969-3778), PhD candidate focusing on computational epidemiology.

Spencer J. Fox.
Spencer J. Fox.

Three years ago, I was choosing the next research direction for my PhD. I was interested in two subjects and had found a journal article in each to build upon. I thought to follow the computational biologist’s path of least resistance: pursue the paper whose results I could reproduce first, as that represents an important first step. One of the papers had published a repository with all of their data alongside working code for analyzing it, while the other had simply stated: “Data available upon request” with no reference to code used for the analyses.

Being a naive graduate student, I politely reached out to the authors of the second study to obtain their data and inquire about their code. In return, I received a scathing email filled with broken links to old websites, excuses about proprietary data, and admonishment for having asked for “their” code: “any competent researcher in the field could replicate our analysis from the information within the manuscript.” I was stunned.

While expressing my frustration to my peers, I found that their requests had also been met with equal hostility and degradation from scientists in their respective fields. When data or code had been provided – usually after months of negotiations – cooperation came with heavy stipulations in article authorship, time-stamped embargos, or permissible analyses. Clearly, it’s not enough to rely on researchers to act in good faith.

The unfortunate truth is that the onus falls on journals to enact real change. Many major journals now require that raw data be deposited in permanent online repositories like Dryad1. This has improved data sharing, but is only half the battle and simply provides the likeness of reproducible research. I have spent weeks reproducing someone’s analysis using their provided data and code. It would have been impossible without both. Simply put, freely available code – even if messy and difficult to follow – provides an invaluable foundation for future researchers to build upon, and all journals should require that both analysis code and data accompany a manuscript.

Too many conscious and subconscious coding decisions are made over the course of a project that even minor decisions early on present serious stumbling blocks for researchers trying to reproduce results. Differences in mundane behaviors between programming languages, versions, library functions, and self-written pipelines can have drastic implications on end results. A great example of this is the inadvertent errors in one fifth of genomics papers attributed to Microsoft Excel use2.

Finally, while ultimately it is the researcher’s responsibility to provide code alongside a manuscript, there are tangible incentives for doing so: citations. Open access manuscripts and those that provide their data receive more citations3,4, and the same likely applies to providing analysis code. After debating between those articles three years ago, I alone have cited the reproducible paper in two separate publications. How many other potential citations are lost “upon request”?


Citations

  1. http://datadryad.org/
  2. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
  3. https://elifesciences.org/articles/16800
  4. https://peerj.com/articles/175/

Open Access Month – Open Educational Resources in Biology

October is Open Access Month. Throughout the month, guest contributors will present their perspectives on the value of open access to research, scholarship and innovation at The University of Texas at Austin.

This installment provided by Sata Sathasivan, Senior Lecturer, Biology Instructional Office.

K.Sata Sathasivan.

I have been using open educational resources (OER) in biology as supplemental instructional sources for many years. These included animations, videos, simulations and public databases of DNA and protein. These resources are constantly evolving and they complement well with any level of teaching.

Recently, I started using a biology textbook published by Open Stax based at Rice University for my introductory biology classes successfully. While a publisher’s popular textbook may cost the students up to $250 each semester, OpenStax textbooks are free to download a PDF and have a nominal cost ($40) for printed versions. Several students liked this free textbook and I received only a few complaints about the inadequacies of this textbook to explain a particular concept. Overall, it was well received by the students and they found this very helpful.

This free textbook can be supplemented with other open educational resources that can be found online in various sites such as https://www.oercommons.org, and if you want to explore more OER sites, check this site.

The only concern that I have about OERs is the time it takes to check them for quality and consistency with your teaching, and the time involved in making the structure for them to be seamlessly integrated into the course.