Protein Folding

Searching for high level rules in protein folding and unfolding: from amyloid diseases to protein structure prediction
PTDC/BIA-PRO/72838/2006

Principal Contractor: Centro de NeurociÍncias e Biologia Celular (CNC/UC)
Departamento de Zoologia - Universidade de Coimbra


Partners: University of Ulster - School of Biomedical Sciences (UU-SBS)
MIT-Portugal


Abstract

In recent years, the identification of many human and animal diseases as protein conformation disorders and the need to translate into useful information the increasing number of genome sequences, highlighted the importance of the protein folding problem, i.e. the conversion of a linear sequence of aminoacids into a functional protein tridimensional structure. After decades of efforts, this still is an unsolved problem in structural molecular biology.

Computer simulations based on molecular dynamics or other methods are a useful way of exploring folding and unfolding events in proteins. In fact, it is today believed that protein unfolding events are responsible for triggering the amyloidogenic process in several proteins.

In this project, we propose to use multiple molecular dynamics protein unfolding simulations to infer rules for protein unfolding in several structural classes of proteins and in disease causing amyloidogenic proteins. These multiple simulations generate a huge amount of data which have to be stored, managed and shared. To accomplish this, Database and Grid technology will be used. Additionally, to characterize and compare multiple unfolding simulations Data Mining techniques are required. The data mining requirements in each scientific field differ substantially and there is a significant need to adapt existing techniques and to develop new data mining techniques for specific problems, such as the protein unfolding simulations. Here, the goal is extraction of meaningful patterns (associations, correlations, clusters, rules, relationships) among molecular properties which vary along the unfolding simulations. One of the challenges for data analysis is to identify, among all the structural and physical-chemical properties that characterize the system, those that are essential in describing the protein unfolding and folding processes. If we are able to follow and analyze a large number of properties that characterize the protein system, and if we study a large enough set of proteins and variants, we should be able to find patterns of variation for several properties, we should be able to decide which properties are most relevant at each stage of the unfolding or folding processes, and we should be able to define rules characterizing these processes for different structural classes of proteins. And if we are able to define rules for representative examples, we should then be able to predict unfolding and folding behaviour for other proteins [13].

This interplay between structural computational biology and information technology could help putting forward novel ideas regarding protein unfolding in amyloidogenic proteins and even help design new algorithms for protein structure prediction.

Principal Objectives

The main goal of this project is, using a large database of protein folding and unfolding simulations based on first principles (physical-chemical interactions), to build new knowledge in two main areas:



The first prototype of the database has already been implemented (www.p-found.org).

It is also important to stress here the heterogeneity of backgrounds of the researchers involved in the project, which in general leads to a very creative environment. To the initial team of molecular biophysicists, teams of computer scientists and bioinformaticians were added.