From Bioinformatics Core
Contents |
BTF:Biotechnology Forums
The purpose of the Thursday Bioinformatics Technology Forum (BTF) meeting series at the Genome Center is to provide a campus-wide venue to show and tell how bioinformatics tools or related information technology actually work. To present your practical problems and ask bioinformatics help is also an appropriate thing to do. In the meeting, people are encouraged to do live demonstrations as well as brief introductions of their work or problems. All talks are informal (although introduction slides are often helpful) and active interactions are expected. BTF is operating by a committee, which consists of Kyoungmi Kim, Jennifer Lee, Dawei Lin, and Kristian Stevens. If you want to talk at BTF, please send an email to hslin@ucdavis.edu to schedule your talk. BTF meetings are usually held at 11:00am-12:00pm on Thursdays in room 4202, GBSF.
Upcoming Meetings
Jan 31, 2008, Matthew Lange, Genomics in a knowledge managed environment: many contribute, a few curate, and everyone gains.
Abstract:
The International Milk Genomics Consortium Research Portal hosted at UC Davis has been designed from the ground up as a system that is simultaneously easy for naïve users to browse and get results immediately, yet powerful enough to answer both detailed and abstract questions about the biology of milk and its health-conferring properties. In addition to being a knowledge resource, it is also easy for users-at-large to contribute to the knowledge curation process. Intuitive graphic user interfaces have been designed that leverage users' familiarity with popular software products and services such as Google, Wikipedia, Microsoft Word, and EndNote. Furthermore, users will find embedded inside the IMGC portal, familiar websites such as: NCBI, OMIM, UCSC Genome Browser, IHOP, Kegg, and many more. By making the contents of these sites available to the IMGC Portal Catalog, users have a "one stop shopping environment" for querying the most relevant and current information known about lactation and milk genes, their products and pathways, and the health benefits that they confer. In addition to querying the portal's catalog of known information, users can add to the repository of knowledge of specific genes by contributing lactation-specific annotations, bibliographic references, or any other reference materials that exist in electronic form.
Break for Holiday Season
Jan 10, 2008 Ann Stapleton, Department of Biology and Marine Biology, University of North CArolina Wilmington.
- Title: 'Cyberinfrastructure: Moving from Multidisciplinary to Collaborative Projects'
- Ann will talk about what they have learned from the Gridnexus cyberinfrastructure project in North Carolina and how the plant cyberinfrastructure project will support computational thinking.
- http://www.gridnexus.org/
- Ann is willing to meet with individuals interested in her research or is willing to set up a group meeting to talk about being a faculty member at a smaller University. Please contact Ann directly if you are interested in meeting with her (stapletona (at) uncw.edu).
Past Meetings
Jan 10, 2008 Ann Stapleton, Department of Biology and Marine Biology, University of North CArolina Wilmington.
- Title: 'Cyberinfrastructure: Moving from Multidisciplinary to Collaborative Projects'
- Ann will talk about what they have learned from the Gridnexus cyberinfrastructure project in North Carolina and how the plant cyberinfrastructure project will support computational thinking.
- http://www.gridnexus.org/
- Ann is willing to meet with individuals interested in her research or is willing to set up a group meeting to talk about being a faculty member at a smaller University. Please contact Ann directly if you are interested in meeting with her (stapletona (at) uncw.edu).
Dec 6, 2007 Jill Wegrzyn - PineSAP (rescheduled from Nov 8)
Pine SNP discovery and Sequencing Alignment pipeline. We have developed a custom sequence alignment pipeline which uses phredPhrap, probconsRNA and custom perl script to align resequencing data. A machine learning algorithm has also been developed which uses the output of Polyphred and Polybayes to more accurately call SNPs in conifers. We have also tested this for use with other species such as Poplar.
Dec 13, 2007 James Purdy, Professor and Vice Chairman, Department of Radiation Oncology
Chief, Physics Section, UC Davis Medical Center
Advanced Technology Quality Assurance Consortium (ATC): Information Management for Multi-Center Radiation Oncology Clinical Trials Utilizing Advanced Technology
The NCI-sponsored Advanced Technology QA Consortium (ATC), which consists of the Image-Guided Therapy QA Center (ITC), Radiation Therapy Oncology Group (RTOG), Radiological Physics Center (RPC), and the Quality Assurance Review Center (QARC), has pioneered the development of an informatics infrastructure and QA methodology for advanced technology clinical trials that requires volumetric digital data submission of a protocol patient’s treatment planning and verification data. In particular, the ITC has nearly 15 years experience in facilitating QA review for RTOG advanced technology clinical trials. This QA process includes (1) data integrity review for completeness of protocol required elements, format of data, and possible data corruption; (2) recalculation of Dose Volume Histograms (DVHs); (3) review of target volume and organ at risk contours compliance by study chair using the ITC’s web-based Remote Review Tool (RRT); and (4) review of dose prescription and dose heterogeneity compliance by RTOG HQ Dosimetry Group using RRT. To date, the ATC has successfully supported 20 RTOG Phase I-III clinical trials. Over 500institutions have been credentialed to submit volumetric digital data to the ITC and over 4500 volumetric digital data sets have been submitted. This presentation will review the ATC’s experience in managing these data including data submission, informatics infrastructure needs, QA review of data from remote locations, use of new advanced treatment modalities, and specific problems encountered by institutions attempting to participate in trials requiring volumetric digital data submission.
Nov 15, 2007 Jens, Hoefkens, Ph.D., Managing Director, Genedata, Inc.
Integrated Storage and Analysis of Microarray and Mass Spectrometry Data: One Stop Data Mining.
Nov 29, 2007 Mark Neff - Dog SNPs
Mark will talk about the SNP project that we did with Illumina to validate their canine SNP array (20k).
June 25, 2007: Cmap: function, installation, and usage.
CMap is a web-based tool that is primarily used to view comparisons of genetic and physical maps. Tools for the curation of map data are also contained in the CMap package. CMap's primary views will be displayed briefly and its chief features quickly discussed. The installation and customization of CMap is a fairly complex process which will also be discussed during this presentation.
Presented by Brandon Tearse
May 24, 2007: Databases: experiences with development, data integrity and Pine trees. Dr. Jennifer Lee
May 17, 2007 Novel Synthesis, Screening, & Chemoinformatics Technologies for Infectious Disease Research
Dr. Barry Bunin
11-12pm, May 17, 2007 4202, GBSF Bioinformatics Technology Forum
Selected technologies starting from the first published small molecule libraries (1,4-benzodiazepine, valium-analogs), to the first gene-family wide SAR databases (kinases), to modern community-based collaborative drug discovery technologies inspired by the open-source movement will be presented. Over 100 database and modeling products available to assist in the small molecule drug design and discovery process have been recently reviewed (see: Chemoinformatics: Theory, Practice, and Products, Springer, December 22, 2006). Each technology has some scope and limitations, which is critical to understand when considering new drug design technologies. Early examples will cover new parallel synthesis methodologies and using Gene-Family-Wide SAR from the scientific and patent literature according to mechanisms-of-action to build target-specific QSAR models to identify novel chemotypes against Abl Kinase a commercially important target due to the clinical resistance to Gleevec™ observed in patients.
This presentation will provide perspectives on a new type of web-database to help scientists more effectively develop new drug candidates from commercial and humanitarian academic drug discovery research. Community-based technologies are currently being used to help develop new treatments especially for infectious diseases afflicting poor people in developing countries including malaria, Chagas disease, and African sleeping sickness. Community case studies range from early stage high throughput screening to lead optimization and GMP scale-up for clinical trials entirely in academic laboratories. Selected examples will be presented from work done in collaboration with leading researchers at UCSF, UC Berkeley, Stanford, UCLA, U. Penn, Burnham Institute, UW, and St. Jude Children’s Research Hospital using CDD technologies to archive, mine, and (selectively) collaborate around drug discovery data. The novel functionality is the web-based collaborative environment for heterogeneous drug data. Heterogeneous low-throughput and high-throughput enzyme, cell, and animal data can be selectively shared among colleagues or even openly shared on the internet if desired. After providing direct collaborations with the top ~100 researchers studying infectious disease, broader community-generating strategies will be organically nurtured.
Biography: Dr. Barry Bunin, CEO & President of Collaborative Drug Discovery Inc. (CDD, Inc.) is interested in helping scientists more effectively archive, mine, and (selectively) collaborate around drug discovery data. He is co-author of a new text titled “Chemoinformatics: Theory, Practice, and Products” that surveys the range of modern drug discovery informatics tools. Prior to CDD, Inc, Dr. Bunin was an Entrepreneur in Residence with Eli Lilly & Co where CDD was incubated. Before that he was the founding CEO, President, & CSO of Libraria. At Libraria, Dr. Bunin led a team that integrated exhaustive reaction capture (synthetic chemistry) with Gene-family wide SAR capture (medicinal chemistry). Barry Bunin was formerly a Senior Scientist at Axys Pharmaceutical Corporation (now Celera), where he managed a library synthesis development group and designed patented protease inhibitors. He also synthesized RGD mimics containing unnatural amino acids to inhibit GP-IIbIIIa previously while at Genentech Inc. He was one of the early pioneers in high-throughput chemistry and previously wrote “The Combinatorial Index”, a widely used text on high-throughput chemical synthesis. Dr. Bunin received his Ph.D. at University of California at Berkeley, where he synthesized and tested the initial 1,4-benzodiazepine libraries with Prof. Jonathan Ellman.
May 7, 2007 A discussion of Solexa-Illumina sequencing technology will be held at 4:00 in the GBSF Auditorium by Christian D. Haudenschild, Ph.D. Senior Manager, Illumina
who will present relevant technical aspects of the Solexa system. These should include topics such as protocol development, daily workflow considerations, required add-ons, bioinformatics challenges (and solutions!), platform hardware and software upgrades, and whatever else is of concern to us as researchers utilizing this technology. This presentation should be 30 minutes or less, followed by questions and answers, and open discussion. So please come with some hard questions!
Apologies to those of you who wanted to come but can't make this time. Please invite all interested members of your lab to attend, and anyone else who might be interested but perhaps weren't contacted on this email. A related presentation on the AB Solid technology will happen at some future time.
Information provided by Charles Nicolet, Ph.D. UC Davis Genome Center One Shields Ave. Davis, CA 95616
April 26, 2007, Matthews Lange, Department of Food Science and Technology
Milk Genomics and Chocolate Research Databases on Plone
April 19, 2007, Dr. Dennis Kostka, Postdoctoral Researcher, Department of Computational Molecular Biology Max Planck Institute for Molecular Genetics
Inferring Hierarchies from Effect Data
Abstract
- Uncovering the inner workings of a cell via gene perturbation experiments is becoming more and more popular. Nested Effects Models are a computational approach to infer hierarchical structure from these kinds of experiments. Roughly, each conceivable hierarchy is encoded via a transitive directed graph, and the most likely topology is identified by taking the maximum over associated posterior probabilities.
- The main bottleneck is the enumeration of all possible hierarchies (quasi-orders), whose number is known to grow exponentially. We propose a divide-and-conquer like strategy and use a lin- ear program to find an optimal solution satisfying transitivity constraints.
- We find that our method outperforms comparable heuristics. Implementing it with standard tools enables the fitting of Nested Effect Models to studies involving O(10) different perturbations. This size is sufficient for most focused interventional studies targeting a specific system of interest.
Mar. 8, 2007, Yufeng Wu from Dan Gusfield's lab
- Dept. of Computer Science
- University of California, Davis
Algorithms for Inferring Recombination and Association Mapping in Populations
Abstract
With increasingly available population-scale genetic variation data, a current high priority research goal is to understand how genetic variations influence complex diseases (or more generally genetic traits). Recombination is an important genetic process that plays a major role in the logic behind association mapping, a currently intensely studied method widely hoped to efficiently find genes (alleles) associated with complex genetic diseases. In this talk, I will present algorithmic and computational results on inferring historical recombination and constructing genealogical networks with recombination and applications to two biologically important problems: association mapping of complex diseases and detecting recombination hotspots. On association mapping, I will present a method that generates the most parsimonious genealogical networks uniformly and show how it can be applied in association mapping. I will introduce results on evaluating how well the inferred genealogy fits the given phenotypes (i.e. cases and controls) and locates genes associated with the disease. Our recent work on detecting recombination hotspots by inferring minimum recombination will also be briefly described. For both biological problems, I will demonstrate the effectiveness of these methods with experimental results on simulated or real data.
Mar. 15 Hans van Leevuween from Richard Michelmore's lab
- Development and Exploitation of a Lettuce (Lactuca sativa) 6.6 Million Feature Affymetrix GeneChip for Massively Parallel Genotyping and Gene Expression Analysis.
Currently, there is the need for high-resolution genetic and physical maps in many crops to characterize the genetic mechanisms controlling agriculturally important traits. Although the discovery of sequence-based markers has become relatively straightforward, their identification and implementation into mapping populations or germplasm screens currently remains laborious and expensive. Single Feature Polymorphisms (SFPs) based on microarray hybridization is emerging as a powerful marker technology because of its highly parallel nature. Current gene expression arrays rarely provide redundancy for any one SFP, leading to both false positive and negative scoring of SFPs and subsequently to incorrect mapping or associations. Furthermore, a limited proportion of each gene is represented on expression chips, limiting SFP detection in species with low polymorphism such as lettuce. We have developed a 6.6 million feature Affymetrix GeneChip for robust SFP discovery, genotyping and gene expression analysis. Probes from ~29,000 cultivated lettuce (L. sativa) unigenes and an additional ~6,000 unique sequences from four other related lettuce species (L. serriola, L. saligna, L. virosa and L. perennis) were synthesized on the GeneChip. Each probe set contains an average of 173 probes per transcript (6,415,642 total probes). A technical replicate control block of 100 probes is synthesized 169 times across the chip to facilitate signal normalization and post processing. To validate our SFP discovery scripts, approximately 2,000 manually curated regions, each containing 1-3 putative SNPs, were tiled in 2 bp increments. We are currently assessing various target preparation and hybridization methodologies including cDNA and total genomic DNA.
Mar. 1, 2007, Dr. Allen van Deynze
- Challenges in SNP discovery in relevant
germplasm for diploids and polyploids
Nov. 9, 2006, Dr. Perroud Bertrand
- Pathway analysis for omics profiling studies
Abstract:
To address the issue on how to determine the significance of pathways or networks hits generated with Microarray or proteomic experiments? Dr. Bertrand will illustrate with data obtained in a kidney cancer study and show resulting interpretations and possible extrapolations.
Nov. 2, 2006, Dr. Dawei Lin
- Spotfire: a multi-dimensional data analysis tool for cluster usage analysis.
Oct. 19,2006, Dr. Patrice Koehl
- Protein Structure Analysis
Oct. 12, 2006 Dr. Andrew Eckert
- A phylogeographic analysis of the range disjunction for foxtail pine (Pinus balfouriana, Pinaceae): the role of Pleistocene glaciation
Oct. 5, 2006, Dr. Kyoungmi Kim
- Microarray Analysis
Sept 28, 2006, Dr. David Rocke
- Microarray Analysis (Will start at 10:00am instead of normal 11:00am)
Sept. 21, 2006, Dr. Tobias Kind
- Machine Learning Methods on Metabolomics