Click here to Original article with photos by Jan Wiese-Fales
Soybeans are big business in Missouri, which ranks in the nation’s top five soybean-producing states. In 2014, the tiny beans represented a $4.5 billion dollar industry for the Show-Me State.
The University of Missouri expanded its investment in soybean research in 1999 by hiring four faculty members with expertise in breeding, genetics, genomics and economics to fill endowed chair positions funded by the Missouri Soybean Merchandising Council (MSMC) with assets that came directly from the state’s soybean growers.
A national “checkoff” program initiated by U.S. soybean farmers in 1990 collects one-half percent of the market price per bushel from every soybean grower when crops are first sold. Half of the proceeds remain in state — with the Missouri Soybean Merchandising Council (MSMC) — and other half is forwarded to the United Soybean Board (USB) and managed by volunteer farmer-directors from around the nation.
Two of the MU hires, Endowed Professor of Genetics and Biotechnology Henry Nguyen and Gary Stacey, a Curators Professor in the Division of Plant Sciences, were named director and associate director, respectively of the university’s National Center for Soybean Biotechnology, launched in 2004. And one of the center’s first orders of business was to sequence the soybean genome.
“Plant science cannot be just biology anymore. You have to have the informatics,” Nguyen explained. “We had no bioinformatics capabilities, and [a campus taskforce] spent a year meeting about it. In the process, we recruitedDong Xu, who was familiar with the landscape.”
Xu, now the James C. Dowell Professor and chair of engineering’s Computer Science Department, came to MU with a background in protein structure prediction and high-throughput biological data analyses. He was accustomed to working with large data sets and biological modeling.
Trupti Joshi, an assistant computer science research professor, soon joined Xu’s research group to further support the center’s informatics goals. With a background in both biology and computer science, Joshi had a 180-degree view of plant biotechnology.
Joshi has had a passion for programming since 2000, when bioinformatics was a new field. “It was rapidly evolving field with cutting edge ideas. I found it so satisfying.”
In 2010, all four were members of an extended international team that successfully mapped soybean’s genome, a huge undertaking that generated a large amount of valuable genetic data. Making the data accessible to the larger science and agricultural communities was made possible by the National Science Foundation 2008 establishment of the iPlant Collaborative — cyberinfrastructure that anticipated the need for the supercomputing capabilities necessary for bioinformatic research — and the Obama Administration’s $200 million Big Data Initiative.
Xu and Joshi worked together initially to generate ideas for what they named the Soybean Knowledge Base (SoyKB). As lead designer and developer, it became Joshi’s dissertation project, and she has spearheaded the effort to build it into a comprehensive web resource featuring tools to functionalize integrated access to the original genomic data and a growing body of published genetic data that MU researchers and others from all over the world are generating.
“We have more than 400 registered users, including domestic and international, as well as [users from] academia and industry, who access both public and private data in SoyKB,” said Joshi. “Users can access the public data without any registration and login. We have more than 1,000 users who regularly visit SoyKB every month.”
According to Joshi, one of the problems with contemporary soybean varieties is that through selective breeding, natural variability has been reduced, and one of the project’s goals is to bring in data from regions with natural variations. The ongoing, USB-supported project to identify variations in 500-plus soybean lines through the academic and industry partnership among MU, Dow Agrosciences, Monsanto and Bayer will generate extremely valuable datasets in this regard.
“Users can utilize their own data, which is one of the biggest benefits,” said Joshi of SoyKB’s architecture, which includes database, genome browser, web interface and data integration modules. “They can look at visualizations of how gene expression might be different in different situations and are able to take all of the datasets and go in and make decisions by looking at regions of interest to design a crop computationally. It cuts down on the [traditional crossbreeding] time to make the combinations.”
Nguyen identified genetic traits of key interest to soybean researchers as drought tolerance, stress tolerance, increased yield and disease resistance, adding that those with interest in soy as biofuel would like to increase soybean oil content. Soybeans are also being examined for their potential role in the prevention of cancer and other human diseases. Scientists working with food grade soybeans would like to increase protein content and cooking oil quality, which is one of Nguyen’s interests.
“Researchers are able to use SoyKB to look at the gene structure and mutations to bring the right genetic combination of soybeans together,” Nguyen said. “Currently, soybean oil is 25 percent oleic acid, its most healthy component. We are working to make it 80 percent. It’s a game changer. Once in a while you get a breakthrough, and this is a breakthrough.
“Very soon, consumers in the U.S. can buy soybean oil that is healthier. It will revolutionize the use of oil in this country and the world,” he predicted.
Nguyen said the biotech research he is doing can be described as molecular breeding — using gene structure variation by changing a chemical base in a nucleotide sequence. Computational biology gives access to this marker-assisted selection, or breeding of natural variation traits, creating “better” genetics, not foreign genetics.
“Biotech [also] includes recombinant DNA technology [creating sequences that would not otherwise be found in biological organisms], which results in GMO crops, but not all biotech [results] are GMOs,” he explained in order to differentiate the work he is doing from genetic modification, which is viewed by some consumers as a controversial practice.
Joshi said their work has been supported by efforts from other computer scientists on the MU campus to expand the university’s cyberinfrastructure. National Science Foundation grants awarded to computer science Assistant Professor Prasad Calyam and Electrical and Computer Engineering Department Chair Chi-Ren Shyu were used to build a secure hybrid cloud networks, which support the high-volume data movement necessary for scientific research.
“It’s really exciting,” Joshi said. “It will give MU a high profile and gives us an opportunity to really make a difference. [At MU] there is the opportunity to work with so many groups. It’s interesting and rewarding.”
SoyKb also serves as an excellent tool and resource for training the next generation of plant breeders, computational biologists and computer scientists (see below).
Joshi recently has transitioned to the position of director of translational bioinformatics with the MU School of Medicine and will serve as assistant research professor in the Department of Molecular Microbiology and Immunology, but development of SoyKB will continue under her direction, through her many collaborations as a core faculty of MU Informatics Institute (MUII) and Interdisciplinary Plant Group (IPG).
“There is no way we could do this without the partnership between plant sciences and computer science,” Nguyen said of the collaboration. “SoyKB is a good bridging between CAFNR (College of Agriculture, Forestry and Natural Resources) and the College of Engineering.
“We have outstanding scientific recognition. It represents a lot of passion on the part of a lot of people.”
FEATURE | AUGUST 13, 2014 | BY AMBER HARMON
Even though next-generation sequencing (NGS) — with millions or
billions of DNA nucleotides sequenced in parallel — is much less costly compared to first-generation sequencing, it still remains too expensive for many labs. NGS platform start-up costs can easily surpass hundreds of thousands of dollars, and individual sequencing reactions can cost thousands per genome.
To garner accurate information, the data analysis can be time-consuming and require special knowledge of bioinformatics. Even so, this high-throughput computational analysis is the backbone of novel discoveries in the life sciences, as well as in other domains including anthropology, social sciences, and plant sciences.
“Using next-generation sequencing you’re getting a snapshot of everything that is happening in a given genome up to that point,” says Trupti Joshi, assistant research professor in computer science and core faculty at the Informatics Institute at the University of Missouri (MU), Columbia, US.
Joshi manages SoyKB (Soybean Knowledge Base), a free online data resource infrastructure that was developed as part of the Obama administration’s $200 million Big Data Research and Development Initiative. Joshi’s team is working with the iPlant Collaborative and XSEDE (Extreme Science and Engineering Discovery Environment) teams to integrate SoyKB data resources and analysis tools.
In addition to integrating SoyKB — which already includes many built-in informatics tools — with existing iPlant tools, the MU team is developing additional toolsets that will also be available to the iPlant community. “Right now we are building the infrastructure so that we can submit jobs — RNA-seq analysis is just one example — to iPlant Atmosphere.” Joshi says three to four different analysis capabilities will be available in a couple months.
SoyKB includes the tens of thousands of genes in the soybean genome, experimental data related to gene expressions, fast-neutron mutation data, and soybean lines GWAS (genome-wide association studies) data. SoyKB is unique in that it includes ‘multi-omics’ experimental data that might otherwise be irrelevant (thrown out) by a particular researcher at a particular time. By making all research data available, experiments take on an increasingly important role in the bigger picture, and enable future researchers to narrow their own results.
Researchers may want to look at soybeans that have a high-oil content, for example, or a high-protein content. Or, they may want to focus on soybean lines that are more drought, disease, or insect resistant. Scientists can access data on particular genomic variations directly in SoyKB, using tools to quickly query and isolate items of interest.
“One of the biggest advantages here is that iPlant is an integrated environment,” says Mats Rynge, who is part of XSEDE’s Extended Collaborative Support Service Workflow Community Applications team. “The iPlant team clearly understands the science and can tailor their services and setup to a biologist.”
More than 19,000 users take part in the iPlant Collaborative, and about 2,500 of them use Atmosphere — iPlant’s cloud service that is fully integrated with user management and theData Store (570 terabytes). “Atmosphere is one of the nicest academic cloud implementations available,” says Rynge. “I would say it is on par with Amazon in terms of user interface; really well done.”
Rynge is developing a SoyKB submit infrastructure and Pegasus workflows for scientists to pull data from the data store, analyze it, and deposit the results back in the data store — all with the click of a button. The ultimate goal is to make the workflows general enough to be mapped to other infrastructures, which future sequencing groups can use as a starting point.
As NGS techniques continue to amass more data than labs and researchers can handle on their own, high-performance computing and infrastructures capable of presenting, analyzing, and storing data will remain critical resources for complex bioinformatics analysis. After all, with 50,000 to 70,000 genes in a single soybean, looking at thousands of soybean genomes can produce several gigabytes of data for each soybean line.
The progress of SoyKB as part of the Big Data Initiative was presented at the IEEE International Conference on Bioinformatics and Biomedicine, December 2013, in Shanghai, China. The US National Science Foundation funds the ongoing project.
Yaya Cui, an investigator in plant sciences at the Bond Life Sciences Center examines data on fast neuron soybean mutants that are represented on the SoyKB database.
The most puzzling scientific mysteries may be solved at the same machine you’re likely reading this sentence.
In the era of “Big Data” many significant scientific discoveries — the development of new drugs to fight diseases, strategies of agricultural breeding to solve world-hunger problems and figuring out why the world exists — are being made without ever stepping foot in a lab.
Developed by researchers at the Bond Life Sciences Center, SoyKB.org allows international researchers, scientists and farmers to chart the unknown territory of soybean genomics together — sometimes continents away from one another — through that data.
Digital solutions to real-world questions
As part of the Obama Administration’s $200 million “Big Data” Initiative, SoyKB (Soy Knowledge Base) was born.
The digital infrastructure changes the way researchers conduct their experiments dramatically, according to plant scientists like Gary Stacey, Bond LSC researcher, professor of soybean biotechnology and professor of plant sciences and biochemistry.
“It’s very powerful,” Stacey said. “Humans can only look at so many lines in an excel spreadsheet — then it just kind of blurs. So we need these kinds of tools to be able to deal with this high-throughput data.”
The website, managed by Trupti Joshi, an assistant research professor in computer science at MU’s College of Engineering, enables researchers to develop important scientific questions and theories.
“There are people that during their entire career, don’t do any bench work or wet science, they just look at the data,” Stacey said.
The Gene Pathway Viewer available on SoyKB, shows different signaling pathways and points to the function of specific genes so that researchers can develop improvements for badly performing soybean lines.
“It’s much easier to grasp this whole data and narrow it down to basically what you want to focus on,” Joshi said.
A 3D-protein modeling tool lends itself especially to drug design. A pharmaceutical company could test the hypothesis and in some situations, the proposed drug turns out to yield the expected results — formulated solely by data analysis.
The Big Data initiative drives a blending of “wet science” — conducting experiments in the lab and gathering original data — and “dry science” — using computational methods.
Testament of the times?
“Oh, absolutely,” Joshi said.
Collaboration between the “wet” and “dry” sciences
Before SoyKB, data from numerous experiments would be gathered and disregarded, with only the desired results analyzed. The website makes it easy to dump all of the data gathered to then be repurposed by other researchers.
“With these kinds of databases now, all the data is put there so something that’s not valuable to me may be valuable to somebody else,” Stacey said,
Joshi said infrastructure like SoyKB is becoming more necessary in all realms of scientific discovery.
“(SoyKB) has turned out to be a very good public resource for the soybean community to cross reference that and check the details of their findings,” she said.
Computer science prevents researchers having to reinvent the wheel with their own digital platforms. SoyKB has a translational infrastructure with computational methods and tools that can be used for many disciplines like health sciences, animal sciences, physics and genetic research.
“I think there’s more and more need for these types of collaborations,” Joshi said. “It can be really difficult for biologists to handle the large scope of data by themselves and you really don’t want to spend time just dealing with files — You want to focus more on the biology, so these types of collaborations work really well.
It’s a win-win situation for everyone,” she said.
The success of SoyKB was perhaps catalyzed by Joshi. She adopted the website and the compilation of data in its infant stages as her PhD dissertation.
Joshi is unique because she has both a biology degree and a computer science background. Stacey said Joshi, who has “had a foot in each camp,” serves as an irreplaceable translator.
Most recently, the progress of SoyKB as part of the Big Data Initiative was presented at the International Conference on Bioinformatics and Biomedicine Dec. 2013 in Shanghai. The ongoing project is funded by NSF grants.
Original Story at http://decodingscience.missouri.edu/
Related Stories:
– MU Researchers Develop Free Online Database for Soybean Studies
– Assistant professor manages online database for soybean studies
Yangrong Cao, Kiwamu Tanaka, Cuong T Nguyen, Gary Stacey
Abstract
By Kerry Grens | April 1, 2014