Abstract
The time-series clustering method is one of unsupervised machine learning techniques that
classify time-series data. In this article, we applied three methods to the clustering
analysis for 200 molecular dynamics (MD) trajectories of human adult hemoglobin (HbA), and
have reported their clustering performances for detecting the T-R state transition
trajectories (TrajT-R). By compared with their silhouette indices, we have
discussed the proper clustering conditions.
Table
Table 1.
The silhouette indices and the numbers of Traj
T-R in three methods, ED,
DBA and soft-DTW.
Methods |
s
¯
ka |
# of TrajT-Rb |
This work: |
|
|
ED |
0.38 |
34 (17%) |
DBA |
0.37 |
54 (27%) |
soft-DTW |
0.78 |
24 (12%) |
Previous studies: |
|
|
Empirical criterion |
- |
8%[2], 16%[3] |
a
s
¯
kis an average of
s
¯
k(iα) over all the number of
TrajT-R.
b The percentage in parentheses are the ratio of TrajT-R in all
the 200 MD trajectories.
References
- [1] J. F. Storz, (2019).
"HEMOGLOBIN: Insights into Protein Structure, Function, and Evolution", Oxford, Oxford
University Press. ISBN; 978–0-19–881068–1.
- [2] M. Tanakayagi, I. Kurisaki,
M. Nagaoka, Sci. Rep., 4, 4601 (2014). doi:10.1038/srep04601
PMID:24710521
- [3] K. Suzuki, (2019) "Theoretical
Research of Allosteric T R State Transition of Human Hemoglobin: Application of Principal
Component Analysis and Motion Tree Method", Master thesis, Nagoya University,
Japan.
- [4] J. Paparrizos, L. Gravano,
ACM Trans. Database Syst., 42, 1 (2017). doi:10.1145/3044711
- [5] H. Sakoe, S. Chiba, IEEE
Trans. Acoust. Speech Signal Process., 26, 43 (1978).
doi:10.1109/TASSP.1978.1163055
- [6] E. Keogh, C. A.
Ratanamahatana, Knowl. Inf. Syst., 7, 358 (2005).
doi:10.1007/s10115-004-0154-9
- [7] M. Cuturi, M. Blondel,
"Soft-DTW: a Differentiable Loss Function for Time-Series," ICML 2017.
- [8] P. J. Rousseeuw, J. Comput.
Appl. Math., 20, 53 (1987). doi:10.1016/0377-0427(87)90125-7
- [9] s ¯ k was called the
overall average silhouette width in [8].
- [10] D. A. Case, R. M. Betz, D. S.
Cerutti, T. E. Cheatham, III, T. A. Darden, R. E. Duke, T. J. Giese, H. Gohlke, A. W.
Goetz, N. Homeyer, S. Izadi, P. Janowski, J. Kaus, A. Kovalenko, T. S. Lee, S. LeGrand, P.
Li, C. Lin, T. Luchko, R. Luo, B. Madej, D. Mermelstein, K. M. Merz, G. Monard, H. Nguyen,
H. T. Nguyen, I. Omelyan, A. Onufriev, D. R. Roe, A. Roitberg, C. Sagui, C. L. Simmerling,
W. M. Botello-Smith, J. Swails, R. C. Walker, J. Wang, R. M. Wolf, X. Wu, L. Xiao, P. A.
Kollman, (2016), AMBER 2016, University of California, San Francisco.
- [11] S. Y. Park, T. Yokoyama, N.
Shibayama, Y. Shiro, J. R. H. Tame, J. Mol. Biol., 360, 690 (2006).
doi:10.1016/j.jmb.2006.05.036 PMID:16765986
- [12] R. Tavenard, J. Faouzi, G.
Vandewiele, F. Divo, G. Androz, C. Holtz, M. Payne, R. Yurchak, M. Rußwurm, K. Kolar, E.
Woods, J. Mach. Learn. Res., 21, 1 (2020).