Modelfree Tutorial using Ubiquitin

Copyright 2009 Arthur G. Palmer

Getting Started

Open a terminal window and change directory to:

>cd ubiquitin_tutorial/bin

A directory listing should give:

>ls
modelfree4		pdbinertia		quadric_diffusion	r2r1_diffusion		r2r1_tm			r2r1ratio

Make certain that the executable versions in this directory run on your computer. If not, email Arthur Palmer (agp6@columbia.edu) to obtain executables for your machine.

Open a terminal window and change directory to:

>cd ubiquitin_tutorial/tutorial

A directory listing should give:

>ls
1ubq.prot.pdb	NOE.dat		R1.dat		R2.dat		fastMF13	setupFMF

Any other files can be removed. Any missing files can be copied into this directory from the following locations:

ubiquitin_tutorial/pdb/1ubq.prot.pdb
ubiquitin_tutorial/rates/R1.dat
ubiquitin_tutorial/rates/R2.dat
ubiquitin_tutorial/rates/NOE.dat
ubiquitin_tutorial/FASTModelFree/setupFMF
ubiquitin_tutorial/FASTModelFree/fastMF13
     

Relaxation Data

1. Record R1, R2, NOE and process to optimize sensitivity and resolution as needed. Frequently, the spectra will be processed differently to obtain optimal results for resolved and partially overlapped resonances. The present data were recorded for ubiquitin at 298 K on a 500.13 MHz NMR spectrometer. Each rate constant should be recorded in a file with the format

		residue_number rate uncertainty

The residue_number field must increase numerically and should not include entries for residues with missing data. View the R1, R2, and NOE sample data.

2. Inspect graphs of the rates to identify problems (high uncertainties, disparate rates in secondary structure) and fix any problems (misassignments, too strong of resolution enhancement, etc.). View the R1, R2, and NOE sample graphs. Note that in the present case, the data for residue 31 seems rather unusual compared to other residues in secondary structure.

Diffusion tensor

We will use the program quadric_diffusion to get an initial estimate of the diffusion tensor of ubiquitin. This program will compare isotropic, axial, and anisotropic diffusion models, but is only accurate for moderately anisotropic proteins. The program r2r1_diffusion does not have the latter restriction, but the current version does not fit fully anisotropic diffusion models.

1. Use MolMol or other program to generate a pdb file with H atoms added, if needed. The tutorial already provides a file 1ubq.prot.pdb that was generated by protonating 1ubq.pdb.

2. Translate the pdb file to the center of mass using the program pdbinertia:

>pdbinertia -t 1ubq.prot.pdb 1ubq.prot.trans.pdb 
Input pdb file:  1ubq.prot.pdb 
       Output pdb file: 1ubq.prot.trans.pdb 
 # atoms read        1230      # atoms skipped           0
 mass                      8563.9004
       center of mass              30.3128          28.8001          15.3504
       principle moments       934104.0000      844540.6875      594139.0000
       relative moments             1.0000           0.9041           0.6361
 rotation matrix
       0.7981          -0.4332          -0.4187
       0.0741           0.7603          -0.6453
       0.5979           0.4840           0.6389

3. Generate a file containing R2/R1 ratios

>paste R2.dat R1.dat | \
		awk '{print $1,"\t",$2/$5,"\t", ($2/$5)*sqrt(($3/$2)^2+($6/$5)^2)}'  >\
 		r2r1.dat      

View the r2r1.dat file. Alternatively, you can get some more diagnostics by running

 ../bin/r2r1ratio R2.dat R1.dat >r2r1.dat 

4. Generate a file containing the τm values needed for the quadric_diffusion program by using r2r1_tm:

>r2r1_tm  >ubq.tm.001.dat    
Estimation of tm using R2/R1 ratio      
Input nucleus (13C or 15N): 15N      
The default bond length and csa are:   1.0200 A and  -160.00 ppm     
Enter new values or <return> to accept: 1.02, -172
Input the proton spectrometer field used for R1 (MHz): 500.13
Input the proton spectrometer field used for R2 (MHz): 500.13
Input initial guess for tm (ns): 4
Input file name for R2/R1 data: r2r1.dat    

View the ubq.tm.001.dat file.

5. Generate an input file for the quadric_diffusion program using a text editor. A sample file is provided as ubq.quadric.001.in:

         # sample control file for ubiquitin
         0.8 1.2 10                   # minimum maximum  gridsteps for axial ratio search
         1 'N'                        # number_of_nuclei nucleus_type1 ... 
         ubq.tm.001.dat               # input data file
         1ubq.prot.trans.pdb          # input pdb file
         1ubq.prot.axial.001.pdb      # output pdb file for axial diffusion analysis
         1ubq.prot.anis.001.pdb       # output pdb file for anisotropic diffusion analysis
Once this file is created, run the quadric_diffusion program:

>quadric_diffusion ubq.quadric.001.in > ubq.quadric.001.out.txt 

The ouput file can be viewed as ubq.quadric.001.out.txt. This file has some residues with very large values of the χ2 variable. Notice that residues 23 and 25 have highly increased values of R2, suggesting a chemical exchange contribution, and that residues 8-12, 31, 49, 62, and 73-76 have very low values of the NOE < 0.6, suggesting extensive backbone motions. These residues can be excluded from the analysis by either deleting them from the data file or by putting a '#' symbol in the first position of the line and editting the input control file to read the modified data file. The modified files are called ubq.quadric.002.in and ubq.tm.002.dat. Running these files using the command:

>quadric_diffusion ubq.quadric.002.in > ubq.quadric.002.out.txt 

gives the output file ubq.quadric.002.out.txt. The results are summarized below:

 **************** Isotropic Results *****************
 Parameters          Actual
 Diso (10^-7/s)     3.93880 +/-      0.01253
 X2    127.0208          X2 (red)    2.309470 
 ****************** Axial Results *******************
         
 Parameters          Actual
           Diso (10^-7/s)     3.88557 +/-      0.01691
           Dpar/Dper          1.19953 +/-      0.03252
           Theta              0.61955 +/-      0.08160
           Phi                0.71427 +/-      0.17304
 Parameters          Jacknife
           Diso (10^-7/s)     3.88859 +/-      0.03470
           Dpar/Dper          1.18243 +/-      0.06393
           Theta              0.70515 +/-      0.21051
           Phi                0.46150 +/-      0.21298
 X2    76.16295          X2 (red)    1.464672 
 F      11.57435          #points            56
 ************* Anisotropic Results ******************
         
 Parameters          Actual                 Simulated
           Diso (10^-7/s)     3.89166          3.89122 +/-      0.01618
           2Dzz/(Dxx+Dyy)     1.21094          1.21876 +/-      0.03286
           Dxx/Dyy            0.91714          0.91313 +/-      0.01966
           Theta              0.54477          0.60673 +/-      0.31689
           Phi                0.63147          0.54930 +/-      0.56306
           Psi               -1.03673         -0.95727 +/-      0.45293
 X2    59.55943          X2 (red)    1.191189
 F      6.969250          #points            56

Note that

τm = 1/(6Diso)

so that for the anisotropic model, τm = 4.28 ns.

The results suggest that the best fitted diffusion tensor is the fully anisotropic tensor (the axial tensor is a significant improvement over isotropic and anisotropic is an improvement over axial, as judged by F-statistical testing). The relative orientation of the original pdb file and the structure rotated to the diffusion tensor frame is illustrated below:

The diffusion times (1/6τm) depend on Y20(θ). The resulting graph for the axial diffusion model is shown below:

Note that the critical value of the F-statistic for α = 0.01 is 5.06, obtained using the numerator degrees of freedom of 2 (the difference between the number of fitted parameters for axial and anisotropic models) and the denominator degrees of freedom of 50 (the number of data points minus the number of fitted parameters for the anisotropic model). The residuals are plotted against the residue sequence below:

All of the residuals > 2 are located in loop regions, suggesting that in a more detailed analysis, one might try limiting the input data to residues in secondary structure and checking how the results would be changed by using other coordinate files.

Also note however, that the degree of anisotropy is relatively small, which is good because the current version of ModelFree is limited to axially symmetric diffusion models.

Identifying Residues with Chemical Exchange Contributions to R2

Powerful methods exist for identifying chemical exchange contributions to R2 independently of the ModelFree analysis. Performing these other experiments can help confirm that the exchange terms that are determined by ModelFree are reasonable. Confirming the presence of exchange effects by independent experiments is particularly important if the chemical exchange effects are of particular importance for the biological system or are highly unusual (for example, extensive Rex values throughout the molecule). One approach is to compare R2 to cross-correlation rates ηxy

The R2 measurements could be Hahn echo, CPMG, or R measurements. A graph comparing Hahn echo R2 to ηxy is shown below:

ModelFree

The parameters as the simplest form of the model-free formalism are shown on the following schematic:

The ModelFree program requires creation of a number of input files and generates a number of output files. All of the input and output files are simple text files and can be created and editted using any text editor.

The FASTModelFree (http://xbeams.chem.yale.edu/~loria/software.php) interface helps automatically generate the input files needed to generate a series of ModelFree runs to select appropriate models for optimal fitting.

A considerable difficulty in fitting data is that selection of models and determination of the diffusion tensor can be highly coupled (and the program can be quite slow). As a result the usual approach uses the following steps:

1. Estimate the diffusion tensor (using quadric_diffusion for example).

2. Fix the current estimate of the diffusion tensor and determine models for internal motions for each residue

3. Using the selected internal models, optimize both internal models and the diffusion tensor.

4. Iterate steps 2 and 3 until (hopefully) convergence is reached.

Issues to consider:

1. Separating contributions from Rex (chemical exchange) and diffusion anisotropy is much easier if relaxation data are acquired at >1 static magnetic field.

2. The FASTModelFree interface will perform model selection using F-statistics. If you want to do model selection using AIC or other criteria, then you will have to set up and run the ModelFree program yourself (or switch to a program like relax).

3. The FASTModelFree does an automated generic analysis. You should carefully examine the models, t-values, and residuals of the final models and go back and run the ModelFree program by hand to explore any unusual cases.

Optimizing the isotropic diffusion correlation time

Optimizing an axial diffusion tensor can be very time-consuming. For the present tutorial, only an isotropic tensor was optimized; this still required a couple of hours of computer time.

The program file fastMF13 must be editted using a standard text editor to define the path to the Modelfree program executable. This has already been done for the present computers.

The FASTModelFree program is begun with the command:

>setupFMF

to start up a GUI interface for setting the necessary parameters. The properly completed page is shown below:

The output setupFMF is a file 'FMF.config'. This file is a text file that can be editted with a text editor if desired. The actual calculation was started by running

>fastMF13

The program ran a number of iterations with the following results

Iteration N(residues)  χ2 χ2/N  τm 
1     3.9 ns	68	166.4	2.45
2     3.997	68	171.4	2.52
3     4.075	68	158.2	2.33
4     4.119	69	173.8	2.52
5     4.165	70	175.3	2.50
6     4.196	69	163.0	2.37
7     4.208	69	163.1	2.36


Details about the different interations are recorded in the file ubiquitin.log.

The final results from each interation are summarized in a file 'ubiquitin.iterXX.par', in which XX is the iteration number (the number is omitted if the diffusion tensor was not optimized). The final results are given in the file ubiquitin.iter7.par. The final output file of the ModelFree program contains much more information in addition to that in the FASTModelFree .par file. An example of the file that would be generated is mfout.final. If the isotropic correlation time or diffusion tensor are not optimized, then a separate mfout file is produced for each model fitted to the data (mfout.1, mfout.2, mfout.3, mfout.4, mfout.5). Each of these files will contain the data for only those spins fit with each model.

Notice that the iterations of the program decided that the converged result was not the lowest reduced χ2. The calculation was re-done by fixing the value of τm = 4.075 ns; the ubiqutin.log and ubiquitin.par files can be viewed. The following graph compares S2 values for the two different values of τm (in addition the two analyses eliminated different residues from the final fitted parameters).

You have to decide which of these analyses are the ones you wish to report-no single answer can be given to this question. However, you should be very careful about basing the main conclusions of a study on results that would be very different depending on which analysis you choose to utilize.

The final optimization gave the following parameters:

S2taue

RexSSE

MANY things should be examined and thought about before publishing these results.

1. This final data set excluded residue 56. This residue appears to have a somewhat high R1 value, but some of the different iterations of the ModelFree program did not exclude it. Perhaps the initial guesses are not accurate enough (or the number of grid steps is not large enough).

2. Residue 31 has an unrealistically low value of S2 for a residue in secondary structure. The raw relaxation data already was noted to be unusual. Either the data for this residue is corrupted or the residue has been misassigned. Inspection of the HSQC spectrum for ubiquitin shows that residues 31 and 72 are nearly overlapped; thus, it is likely that the assignments for these two resonances were swapped inadvertently when analyzing the relaxation data. ALWAYS confirm assignments of any seemingly unusual results from the model-free analysis.

3. Some residues have rather large values of SSE (these can be compared with statistical distributions in the ModelFree output files). Again, perhaps initial guesses are not accurate enough (or the diffusion model is incorrect).

4. The very large values of correlation time for residues 22 and 31 are unlikely to be very accurate because they are so close to the overall rotational tumbling times.

5. The internal correlation times derived from data at a single field generally are not very accurate in general. Data recorded at two or more static magnetic fields helps dramatically.

6. Small values of Rex (< 1-2 1/s) for many residues are likely to reflect error in the diffusion model more than actual chemical exchange.

7. Systematically large values of S2 in elements of regular secondary structure generally indicate aggregation. Check by dilution that R2 is independent of concentration before recording full data sets.

Optimizing the diffusion tensor

To save time when running a full axial diffusion tensor optimization with FASTModelFree, the convergence criteria were increased (see the FMF.config file). The initial values for the tensor were set to the output values from the quadric program (τm = 4.28 ns and Dratio = 1.182). The optimized results yielded results very similar to the results obtained for the isotropic optimization. The final results for the diffusion tensor are (τm = 4.272 ns and Dratio = 1.189). A graph of the optimized S2 values are graphed below. Notice that 3 residues are significantly different between the two analyses. These are residues for which different spectral density models were chosen in the two analyses. Model selection can be a very significant problem when comparing results; fortunately, selection tends to be more stable if data is available at >1 static magnetic field.

Optimizing the internal parameters for fixed diffusion correlation time

In the present tutorial, we will assume that the diffusion tensor is isotropic and hold it fixed at the final value above in order to finish the calculations in a reasonable amount of time. This means that we will perform only step 3 in the above flowchart for the last iteration.

>setupFMF

to start up a GUI interface for setting the necessary parameters. The properly completed page is shown below:

The output of setupFMF is a file 'FMF.config'. The actual calculation is started by running

>fastMF13

After the program finishes, you can examine the ubiquitin.log and ubiquitin.par files to see the final results.

Spectral Density Mapping

An alternative approach to directly fitting the rate constants, as done by ModelFree, is to first use the reduced spectral density approach to calculate spectral density functions from the rate constants and then analyze the spectral density values as desired. This approach is most useful if data are available at more than one static field because

1. The static magnetic field dependence of R2 aids in separating Rex (chemical exchange) contributions to transverse relaxation.

2. The static magnetic field dependence of the 1H-15N cross-relaxation rate constant, obtained from the NOE and R1, can be used to determine the frequency dependence of J(ω) near the 1H Larmor frequency, which is needed for the spectral density mapping approach.