2016 Volume 3 Issue 2 Pages 39-46
Structured regularization is a mathematical technique which incorporates prior structural knowledge among variables into regression analysis to make a sparse estimation reflecting relationships among them. Abundance of structural information in biology, such as pathways formed by genes, transcripts, and proteins, especially suits well its application. Previously, we reported on the first application of latent group Lasso, a group-based regularization method, in toxicogenomics, with genes regulated by the same transcription factor treated as a group. We revealed that it achieved good predictive performances comparable to Lasso and allowed us to discuss mechanisms behind liver weight gain in rats based on selected groups. Latent group Lasso, however, does not lead to a sparse estimation, due to large group sizes in our analytical setting. In this study, we applied graph-based regularization methods, generalized fused Lasso and graph Lasso, for the same data, with regulatory networks formed by transcription factors and their downstream genes as a graph. These methods are expected to make a sparser estimation since they select variables based on edges. Comparisons showed that graph Lasso generated an accurate, biologically relevant and sparse model that could not have been possible with latent group Lasso and generalized fused Lasso.