# Modelfree Tutorial using Ubiquitin

## Copyright 2009 Arthur G. Palmer

## Getting Started

Open a terminal window and change directory to:

>cd ubiquitin_tutorial/bin

A directory listing should give:

>ls modelfree4 pdbinertia quadric_diffusion r2r1_diffusion r2r1_tm r2r1ratio

Make certain that the executable versions in this directory run on your computer. If not, email Arthur Palmer (agp6@columbia.edu) to obtain executables for your machine.

Open a terminal window and change directory to:

>cd ubiquitin_tutorial/tutorial

A directory listing should give:

>ls 1ubq.prot.pdb NOE.dat R1.dat R2.dat fastMF13 setupFMF

Any other files can be removed. Any missing files can be copied into this directory from the following locations:

ubiquitin_tutorial/pdb/1ubq.prot.pdb ubiquitin_tutorial/rates/R1.dat ubiquitin_tutorial/rates/R2.dat ubiquitin_tutorial/rates/NOE.dat ubiquitin_tutorial/FASTModelFree/setupFMF ubiquitin_tutorial/FASTModelFree/fastMF13

## Relaxation Data

1. Record R_{1}, R_{2}, NOE and process to optimize sensitivity and resolution as needed. Frequently, the spectra will be processed differently to obtain optimal results for resolved and partially overlapped resonances. The present data were recorded for ubiquitin at 298 K on a 500.13 MHz NMR spectrometer. Each rate constant should be recorded in a file with the format

residue_number rate uncertainty

The residue_number field must increase numerically and should not include entries for residues with missing data. View the R_{1}, R_{2}, and NOE sample data.

2. Inspect graphs of the rates to identify problems (high uncertainties, disparate rates in secondary structure) and fix any problems (misassignments, too strong of resolution enhancement, etc.). View the R_{1}, R_{2}, and NOE sample graphs. Note that in the present case, the data for residue 31 seems rather unusual compared to other residues in secondary structure.

## Diffusion tensor

We will use the program *quadric_diffusion* to get an initial estimate of the diffusion tensor of ubiquitin. This program will compare isotropic, axial, and anisotropic diffusion models, but is only accurate for moderately anisotropic proteins. The program *r2r1_diffusion* does not have the latter restriction, but the current version does not fit fully anisotropic diffusion models.

1. Use *MolMol* or other program to generate a pdb file with H atoms added, if needed. The tutorial already provides a file 1ubq.prot.pdb that was generated by protonating 1ubq.pdb.

2. Translate the pdb file to the center of mass using the program *pdbinertia*:

>pdbinertia -t 1ubq.prot.pdb 1ubq.prot.trans.pdb

Input pdb file: 1ubq.prot.pdb Output pdb file: 1ubq.prot.trans.pdb # atoms read 1230 # atoms skipped 0

mass 8563.9004 center of mass 30.3128 28.8001 15.3504 principle moments 934104.0000 844540.6875 594139.0000 relative moments 1.0000 0.9041 0.6361

rotation matrix 0.7981 -0.4332 -0.4187 0.0741 0.7603 -0.6453 0.5979 0.4840 0.6389

3. Generate a file containing R_{2}/R_{1} ratios

>paste R2.dat R1.dat | \ awk '{print $1,"\t",$2/$5,"\t", ($2/$5)*sqrt(($3/$2)^2+($6/$5)^2)}' >\ r2r1.dat

View the r2r1.dat file. Alternatively, you can get some more diagnostics by running

../bin/r2r1ratio R2.dat R1.dat >r2r1.dat

4. Generate a file containing the τ_{m} values needed for the *quadric_diffusion* program by using *r2r1_tm*:

>r2r1_tm >ubq.tm.001.dat

Estimation of tm using R_{2}/R_{1}ratio

Input nucleus (13C or 15N): 15N The default bond length and csa are: 1.0200 A and -160.00 ppm Enter new values or <return> to accept: 1.02, -172 Input the proton spectrometer field used for R1 (MHz): 500.13 Input the proton spectrometer field used for R2 (MHz): 500.13 Input initial guess for tm (ns): 4 Input file name for R2/R1 data: r2r1.dat

View the ubq.tm.001.dat file.

5. Generate an input file for the *quadric_diffusion* program using a text editor. A sample file is provided as ubq.quadric.001.in:

# sample control file for ubiquitin 0.8 1.2 10 # minimum maximum gridsteps for axial ratio search 1 'N' # number_of_nuclei nucleus_type1 ... ubq.tm.001.dat # input data file 1ubq.prot.trans.pdb # input pdb file 1ubq.prot.axial.001.pdb # output pdb file for axial diffusion analysis 1ubq.prot.anis.001.pdb # output pdb file for anisotropic diffusion analysisOnce this file is created, run the

*quadric_diffusion*program:

>quadric_diffusion ubq.quadric.001.in > ubq.quadric.001.out.txt

The ouput file can be viewed as ubq.quadric.001.out.txt. This file has some residues with very large values of the χ^{2} variable. Notice that residues 23 and 25 have highly increased values of R_{2}, suggesting a chemical exchange contribution, and that residues 8-12, 31, 49, 62, and 73-76 have very low values of the NOE < 0.6, suggesting extensive backbone motions. These residues can be excluded from the analysis by either deleting them from the data file or by putting a '#' symbol in the first position of the line and editting the input control file to read the modified data file. The modified files are called ubq.quadric.002.in and ubq.tm.002.dat. Running these files using the command:

>quadric_diffusion ubq.quadric.002.in > ubq.quadric.002.out.txt

gives the output file ubq.quadric.002.out.txt. The results are summarized below:

**************** Isotropic Results ***************** Parameters Actual Diso (10^-7/s) 3.93880 +/- 0.01253

X2 127.0208 X2 (red) 2.309470

****************** Axial Results *******************

Parameters Actual Diso (10^-7/s) 3.88557 +/- 0.01691 Dpar/Dper 1.19953 +/- 0.03252 Theta 0.61955 +/- 0.08160 Phi 0.71427 +/- 0.17304

Parameters Jacknife Diso (10^-7/s) 3.88859 +/- 0.03470 Dpar/Dper 1.18243 +/- 0.06393 Theta 0.70515 +/- 0.21051 Phi 0.46150 +/- 0.21298

X2 76.16295 X2 (red) 1.464672

F 11.57435 #points 56

************* Anisotropic Results ******************

Parameters Actual Simulated Diso (10^-7/s) 3.89166 3.89122 +/- 0.01618 2Dzz/(Dxx+Dyy) 1.21094 1.21876 +/- 0.03286 Dxx/Dyy 0.91714 0.91313 +/- 0.01966 Theta 0.54477 0.60673 +/- 0.31689 Phi 0.63147 0.54930 +/- 0.56306 Psi -1.03673 -0.95727 +/- 0.45293

X2 59.55943 X2 (red) 1.191189

F 6.969250 #points 56

Note that

τ_{m} = 1/(6D_{iso})

so that for the anisotropic model, τ_{m} = 4.28 ns.

The results suggest that the best fitted diffusion tensor is the fully anisotropic tensor (the axial tensor is a significant improvement over isotropic and anisotropic is an improvement over axial, as judged by F-statistical testing). The relative orientation of the original pdb file and the structure rotated to the diffusion tensor frame is illustrated below:

The diffusion times (1/6τ_{m}) depend on Y_{20}(θ). The resulting graph for the axial diffusion model is shown below:

Note that the critical value of the F-statistic for α = 0.01 is 5.06, obtained using the numerator degrees of freedom of 2 (the difference between the number of fitted parameters for axial and anisotropic models) and the denominator degrees of freedom of 50 (the number of data points minus the number of fitted parameters for the anisotropic model). The residuals are plotted against the residue sequence below:

All of the residuals > 2 are located in loop regions, suggesting that in a more detailed analysis, one might try limiting the input data to residues in secondary structure and checking how the results would be changed by using other coordinate files.

Also note however, that the degree of anisotropy is relatively small, which is good because the current version of ModelFree is limited to axially symmetric diffusion models.

## Identifying Residues with Chemical Exchange Contributions to R_{2}

Powerful methods exist for identifying chemical exchange contributions to R_{2} independently of the ModelFree analysis. Performing these other experiments can help confirm that the exchange terms that are determined by ModelFree are reasonable. Confirming the presence of exchange effects by independent experiments is particularly important if the chemical exchange effects are of particular importance for the biological system or are highly unusual (for example, extensive R_{ex} values throughout the molecule).
One approach is to compare R_{2} to cross-correlation rates η_{xy}

_{2}measurements could be Hahn echo, CPMG, or R

_{1ρ}measurements. A graph comparing Hahn echo R

_{2}to η

_{xy}is shown below:

## ModelFree

The parameters as the simplest form of the model-free formalism are shown on the following schematic:

The *ModelFree* program requires creation of a number of input files and generates a number of output files. All of the input and output files are simple text files and can be created and editted using any text editor.

The *FASTModelFree* (http://xbeams.chem.yale.edu/~loria/software.php) interface helps automatically generate the input files needed to generate a series of ModelFree runs to select appropriate models for optimal fitting.

A considerable difficulty in fitting data is that selection of models and determination of the diffusion tensor can be highly coupled (and the program can be quite slow). As a result the usual approach uses the following steps:

1. Estimate the diffusion tensor (using quadric_diffusion for example).

2. Fix the current estimate of the diffusion tensor and determine models for internal motions for each residue

3. Using the selected internal models, optimize both internal models and the diffusion tensor.

4. Iterate steps 2 and 3 until (hopefully) convergence is reached.

Issues to consider:

1. Separating contributions from R_{ex} (chemical exchange) and diffusion anisotropy is much easier if relaxation data are acquired at >1 static magnetic field.

2. The FASTModelFree interface will perform model selection using F-statistics. If you want to do model selection using AIC or other criteria, then you will have to set up and run the ModelFree program yourself (or switch to a program like relax).

3. The FASTModelFree does an automated generic analysis. You should carefully examine the models, t-values, and residuals of the final models and go back and run the ModelFree program by hand to explore any unusual cases.

### Optimizing the isotropic diffusion correlation time

Optimizing an axial diffusion tensor can be very time-consuming. For the present tutorial, only an isotropic tensor was optimized; this still required a couple of hours of computer time.

The program file fastMF13 must be editted using a standard text editor to define the path to the Modelfree program executable. This has already been done for the present computers.

The FASTModelFree program is begun with the command:

>setupFMF

to start up a GUI interface for setting the necessary parameters. The properly completed page is shown below:

The output setupFMF is a file 'FMF.config'. This file is a text file that can be editted with a text editor if desired. The actual calculation was started by running

>fastMF13

The program ran a number of iterations with the following results

Iteration N(residues) χ^{2}χ^{2}/N τ_{m}

1 3.9 ns 68 166.4 2.45 2 3.997 68 171.4 2.52 3 4.075 68 158.2 2.33 4 4.119 69 173.8 2.52 5 4.165 70 175.3 2.50 6 4.196 69 163.0 2.37 7 4.208 69 163.1 2.36

Details about the different interations are recorded in the file ubiquitin.log.

The final results from each interation are summarized in a file 'ubiquitin.iterXX.par', in which XX is the iteration number (the number is omitted if the diffusion tensor was not optimized). The final results are given in the file ubiquitin.iter7.par. The final output file of the *ModelFree* program contains much more information in addition to that in the FASTModelFree .par file. An example of the file that would be generated is mfout.final. If the isotropic correlation time or diffusion tensor are not optimized, then a separate mfout file is produced for each model fitted to the data (mfout.1, mfout.2, mfout.3, mfout.4, mfout.5). Each of these files will contain the data for only those spins fit with each model.

Notice that the iterations of the program decided that the converged result was not the lowest reduced χ^{2}. The calculation was re-done by fixing the value of τ_{m} = 4.075 ns; the ubiqutin.log and ubiquitin.par files can be viewed. The following graph compares S^{2} values for the two different values of τ_{m} (in addition the two analyses eliminated different residues from the final fitted parameters).

You have to decide which of these analyses are the ones you wish to report-no single answer can be given to this question. However, you should be very careful about basing the main conclusions of a study on results that would be very different depending on which analysis you choose to utilize.

The final optimization gave the following parameters:

MANY things should be examined and thought about before publishing these results.

1. This final data set excluded residue 56. This residue appears to have a somewhat high R_{1} value, but some of the different iterations of the ModelFree program did not exclude it. Perhaps the initial guesses are not accurate enough (or the number of grid steps is not large enough).

2. Residue 31 has an unrealistically low value of S2 for a residue in secondary structure. The raw relaxation data already was noted to be unusual. Either the data for this residue is corrupted or the residue has been misassigned. Inspection of the HSQC spectrum for ubiquitin shows that residues 31 and 72 are nearly overlapped; thus, it is likely that the assignments for these two resonances were swapped inadvertently when analyzing the relaxation data. ALWAYS confirm assignments of any seemingly unusual results from the model-free analysis.

3. Some residues have rather large values of SSE (these can be compared with statistical distributions in the ModelFree output files). Again, perhaps initial guesses are not accurate enough (or the diffusion model is incorrect).

4. The very large values of correlation time for residues 22 and 31 are unlikely to be very accurate because they are so close to the overall rotational tumbling times.

5. The internal correlation times derived from data at a single field generally are not very accurate in general. Data recorded at two or more static magnetic fields helps dramatically.

6. Small values of R_{ex} (< 1-2 1/s) for many residues are likely to reflect error in the diffusion model more than actual chemical exchange.

7. Systematically large values of S2 in elements of regular secondary structure generally indicate aggregation. Check by dilution that R_{2} is independent of concentration before recording full data sets.

### Optimizing the diffusion tensor

To save time when running a full axial diffusion tensor optimization with FASTModelFree, the convergence criteria were increased (see the FMF.config file). The initial values for the tensor were set to the output values from the quadric program (τ_{m} = 4.28 ns and D_{ratio} = 1.182). The optimized results yielded results very similar to the results obtained for the isotropic optimization. The final results for the diffusion tensor are (τ_{m} = 4.272 ns and D_{ratio} = 1.189). A graph of the optimized S_{2} values are graphed below. Notice that 3 residues are significantly different between the two analyses. These are residues for which different spectral density models were chosen in the two analyses. Model selection can be a very significant problem when comparing results; fortunately, selection tends to be more stable if data is available at >1 static magnetic field.

### Optimizing the internal parameters for fixed diffusion correlation time

In the present tutorial, we will assume that the diffusion tensor is isotropic and hold it fixed at the final value above in order to finish the calculations in a reasonable amount of time. This means that we will perform only step 3 in the above flowchart for the last iteration.

>setupFMF

to start up a GUI interface for setting the necessary parameters. The properly completed page is shown below:

The output of *setupFMF* is a file 'FMF.config'. The actual calculation is started by running

>fastMF13

After the program finishes, you can examine the ubiquitin.log and ubiquitin.par files to see the final results.

## Spectral Density Mapping

An alternative approach to directly fitting the rate constants, as done by ModelFree, is to first use the reduced spectral density approach to calculate spectral density functions from the rate constants and then analyze the spectral density values as desired. This approach is most useful if data are available at more than one static field because

1. The static magnetic field dependence of R_{2} aids in separating R_{ex} (chemical exchange) contributions to transverse relaxation.

2. The static magnetic field dependence of the ^{1}H-^{15}N cross-relaxation rate constant, obtained from the NOE and R_{1}, can be used to determine the frequency dependence of J(ω) near the ^{1}H Larmor frequency, which is needed for the spectral density mapping approach.