Methods for detection of interactions with multiple components
Abstract
In genetic association studies, it is typically thought that important insights will be obtained through joint modeling of genetic variants and environmental variables. However, weak effect of gene-environment interactions, and imprecise measurement of the environment make it hard to identify "statistically significant" interaction effects. We propose two different modeling techniques. First, for regression problems in which the main effects are already established, as is the case with many diseases or their estimation is not a priority, we propose the use of dedicated boosting. Dedicated boosting is a variation to the usual L-2 boosting procedure which focuses on the interaction search in contrast to most boosting methods which address overall model prediction or classification. We compare the performance of dedicated boosting to other competing methods in the WHI data and a simulation study. Secondly, we use the idea of a structured interaction model form together with penalized regression to limit model complexity in regression problems where we believe interactions might behave in a similar way. We propose the directed LASSO, a regression modeling strategy using a pairwise fused LASSO penalty to encourage interaction model simplicity through fusion.
Collections
- Biostatistics [215]