Synthetic test data set for DEER spectroscopy based on T4 lysozyme

Loading...
Thumbnail Image

Authors

Edwards, Thomas H.
Stoll, Stefan

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Abstract

Tikhonov regularization is the most commonly used method for extracting distance distributions from experimental double electron-electron resonance (DEER) spectroscopy data. This method requires the selection of a regularization parameter, α, and a regularization operator, L. We analyze the performance of a large set of α selection methods and several regularization operators, using a test set of over half a million synthetic noisy DEER traces. These are generated from distance distributions obtained from in silico double labeling of a protein crystal structure of T4 lysozyme with the spin label MTSSL. We compare the methods and operators based on their ability to recover the model distance distributions from the noisy time traces. The results indicate that several α selection methods perform quite well, among them the Akaike information criterion and the generalized cross validation method with either the first- or second-derivative operator. They perform significantly better than currently utilized L-curve methods. This test set was developed as part of the 2018 publication in J. Magn. Reson., "Optimal Tikhonov Regularization for DEER Spectroscopy." Using scripts adapted from the Matlab toolbox MMM, the PDB ID structure 2LZM (T4 lysozyme) was in silico labeled at every accessible amino acid using the rotamer library R1A_298K_UFF_216_r1_CASD for the MTSSL spin label. The resulting pairwise distance distributions between each pair of labels was calculated, resulting in 5622 distance distributions. From these, 621030 synthetic DEER data traces were simulated. The test set includes DEER data spanning the range of experimentally reasonable noise levels, truncation lengths, time-step sizes, and underlying distribution characteristics. Reference: T.H. Edwards, S. Stoll, Optimal Tikhonov regularization for DEER spectroscopy, J. Magn. Reson. 288 (2018) 58-68. https://doi.org/10.1016/j.jmr.2018.01.021

Description

=========================================================================== Contents of the test set =========================================================================== (Edwards, Stoll, Optimal Tikhonov Regularization for DEER Spectroscopy) The test set is contained in three files: - distributions_2LZM: model distributions - timetraces_2LZM: noise-free time-domain traces - Sdata_2LMZ: noisy time-domain traces distributons_2LZM.mat --------------------------------------------------------------------------- This file contains the model distributions obtained from the 2LZM PBD crystal structure of T4 lysozyme, and associated statistics. sitePair array of residue indices for all double labeled mutants P0 array of all 5622 model distributions, 1341 points each r0 associated high-resolution distance vector r_xy xy-th percentile distance for each model distribution r_iqr inter-quartile range for each distribution r_mean mean distance for each distribution r_median median distance for each distribution r_mode modal distance for each distribution r_std standard deviation for each distribution npeaks number of significant local maxima for each distribution skew skewness for each distribution timetraces_2LZM.mat --------------------------------------------------------------------------- This file contains all unique noise-free time-domain DEER traces generated from the model distributions. data.S0 noise-free time trace data.Pidx index of model distribution (P0 in distributions_2LZM.mat) data.tmin minimum t (microseconds) data.tmax maximum t (microseconds) data.dt time increment (microseconds) data.nt number of points data.sigma noise standard deviation data.seeds seeds for random-number generator to generate noise (one seed for each of 10 noise realizations) Sdata_2LZM.mat --------------------------------------------------------------------------- This file contains all noisy time-traces generated from the noise-free traces. Sdata.S all noisy time traces Sdata.tmin minimum t (microseconds) Sdata.tmax maximum t (microseconds) Sdata.dt time increment (microseconds) Sdata.nt number of points Sdata.sigma noise standard deviation Sdata.seed seed for random-number generator Sdata.sites residue numbers of associated doubly-labeled mutant Sdata.idxP index of associated model distribution (for distributions_2LZM.mat) Sdata.idxS0 index of associated noise-free trace (for timetraces_2LZM.mat) sdata.idxseed index into the seed vector (in timetraces_2LZM.mat)

Citation

JMR-17-376

DOI