Epsin is a protein that binds to the Eps15 homology (EH) domains, and is involved in clathrin-mediated endocytosis. The epsin N-terminal homology (ENTH) domain (about 140 amino acid residues) is well conserved in eukaryotes and is considered to be important for actin cytoskeleton organization in endocytosis. In this study, we have determined the solution structure of the ENTH domain (residues 1-144) of human epsin by multidimensional nuclear magnetic resonance spectroscopy. In the ENTH-domain structure, seven α-helices form a superhelical fold, consisting of two antiparallel two-helix HEAT motifs and one three-helix ARM motif, with a continuous hydrophobic core in the center. We conclude that the seven-helix superhelical fold defines the ENTH domain, and that the previously-reported eight-helix fold of a longer fragment of rat epsin 1 is divided into the authentic ENTH domain and a C-terminal flanking α-helix.
A bioinformatics method was developed to identify the protein surface around the functional site and to estimate the biochemical function, using a newly constructed molecular surface database, eF-site. Molecular surfaces of protein molecules were computed based on the atom coordinates, and a database named eF-site (electrostatic surface of Functional site) was prepared by adding the physical properties on the constructed molecular surfaces. The electrostatic potential on each molecular surface was individually calculated solving the Poisson-Boltzmann equation numerically for the precise continuum model, and the hydrophobicity information of each residue was also included. The eF-site database is accessed by the internet (http://pi.protein.osaka-u.ac.jp/eF-site/). We have prepared four different databases, eF-site/antibody, eF-site/prosite, eF-site/P-site, and eF-site/ActiveSite, corresponding to the antigen binding sites of antibodies with the same orientations, the molecular surfaces for the individual motifs in PROSITE database, the phosphate binding sites, and the active site surfaces for the representatives of the individual protein family, respectively. An algorithm using the clique detection method as an applied graph theory was developed for search of the eF-site database, so as to recognize and discriminate the characteristic molecular surfaces of the proteins. The method identifies the active site having the similar function to those of the known proteins.
Finding genes by the positional candidate approach requires abundant cDNAs mapped to chromosomes. To provide such important information, we computationally mapped 19032 of our mouse cDNAs to mouse chromosomes by using data from public databases. We used 2 approaches. In the first, we integrated the mapping data of cDNAs on the human genome, known gene-related data, and comparative mapping data. From this, we calculated map positions on the mouse chromosomes. For this first approach, we developed a simple and powerful criterion to choose the correct map position from candidate positions in sequence homology searches. In the second approach, we related cDNAs to expressed sequence tags (EST) previously mapped in radiation hybrid experiments. We discuss improving the mapping by combining the 2 methods.
Multi-wavelength anomalous diffraction phasing is especially useful for highthroughput structure determinations. Selenomethionine substituted proteins are commonly used for this purpose. However, the cytotoxicity of selenomethionine drastically reduces the efficiency of its incorporation in in vivo expression systems. In the present study, an improved E. coli cell-free protein synthesis system was used to incorporate selenomethionine into a protein, so that highly efficient incorporation could be achieved. A milligram quantity of selenomethionine-containing Ras was obtained using the cell-free system with dialysis. The mass spectrometry analysis showed that more than 95 % of the methionine residues were substituted with selenomethionine. The crystal of this protein grew under the same conditions and had the same unit cell constants as those of the native Ras protein. The three-dimensional structure of this protein, determined by multi-wavelength anomalous diffraction phasing, was almost the same as that of the Ras protein prepared by in vivo expression. Therefore, the cell-free synthesis system could become a powerful protein expression method for highthroughput structure determinations by X-ray crystallography.
In this paper, we describe a neural network analysis of sequences connecting two protein domains (domain linkers). The neural network was trained to distinguish between domain linker sequences and non-linker sequences, using a SCOP-defined domain library. The analysis indicated that a significant difference existed between domain linkers and non-linker regions, including intra-domain loop regions. Moreover, the resulting Hinton diagram showed a position-dependent amino acid preference of the domain linker sequences, and implied their non-random nature. We then applied the neural network to predict domain linkers in multi-domain protein sequences. As the result of a Jack-knife test, 58 % of the predicted regions matched actual linker regions (specificity), and 36 % of the SCOP-derived domain linkers were predicted (sensitivity). This prediction efficiency is superior to simpler methods derived from secondary structure prediction that assume that long loop regions are putative domain linkers. Altogether, these results suggest that domain linkers possess local characteristics different from those of loop regions.
The crystal structure of a conserved hypothetical protein from Escherichia coli has been determined using X-ray crystallography. The protein belongs to the Cluster of Orthologous Group COG1553 (National Center for Biotechnology Information database, NLM, NIH), for which there was no structural information available until now. Structural homology search with DALI algorism indicated that this protein has a new fold with no obvious similarity to those of other proteins with known three-dimensional structures. The protein quaternary structure consists of a dimer of trimers, which makes a characteristic cylinder shape. There is a large closed cavity with approximate dimensions of 16 Å x 16 Å x 20 Å in the center of the hexameric structure. Six putative active sites are positioned along the equatorial surface of the hexamer. There are several highly conserved residues including two possible functional cysteines in the putative active site. The possible molecular function of the protein is discussed.