Google

REFMAC (CCP4: Unsupported Program)

NAME

refmac - macromolecular refinement program (old version).

SYNOPSIS

refmac XYZIN foo_cycle_i.brk HKLIN foo.mtz PROTOUT protout.cycle_i PROTCOUNTS counts.cycle_i XYZOUT foo_cycle_j.brk HKLOUT foo_cycle_j.mtz
[Keyworded input]

DESCRIPTION

The REFMAC program can carry out rigid body, restrained or unrestrained refinement against Xray data, or idealisation of a macromolecular structure. It minimises the coordinate parameters to satisfy either a Maximum Likelihood or Least Squares residual. There are options to use different minimization methods. If the user wishes to invoke geometric restraints the program PROTIN which analyses the protein geometry and produces an output file containing restraints information must be run prior to running REFMAC. REFMAC also produces a MTZ output file containing weighted coefficients for SigmaA weighted mFo-DFcalc and 2mFo-DFcalc maps, where "missing data" have been restored.

[Program organisation and communication with it is illustrated in the EXAMPLES section.]

REFERENCES

If you use REFMAC please refer to one of these papers!!!

  1. "Application of Maximum Likelihood Refinement" G. Murshudov, A.Vagin and E.Dodson, (1996) in the Refinement of Protein structures, Proceedings of Daresbury Study Weekend.
  2. "Refinement of Macromolecular Structures by the Maximum-Likelihood Method" G.N. Murshudov, A.A.Vagin and E.J.Dodson, (1997) in Acta Cryst. D53, 240-255
  3. "Efficient anisotropic refinement of Macromolecular structures using FFT" G.N.Murshudov, A.Lebedev, A.A.Vagin, K.S.Wilson and E.J.Dodson (1999) Acta Cryst. section D55, 247-255

KEYWORDED INPUT

Anything input on a line after an exclamation mark ("!" or "#") is ignored and lines can be continued by using a minus sign. The program only checks the first 4 characters of each keyword. The order of the cards is not important except that an END card must be last. Each keyword has various subsiduary keywords.

Principal keywords controlling Xray refinement:

LABIN, REFIne, SCALe, SIGMaa, WEIGht

All keywords for Xray refinement :

BINS/RANGE, BLIMit, CELL, DAMP, FREEflag, LABIn, LABOut, MODE, MONItor, NCYCLE, PHASE, REFIne, RIGIdbody, SCALe, SCPArt, SIGMaa, SYMMetry, WEIGht, END/GO

Keywords controlling geometric restraints:

BFAC/TEMP, CHIRal, DISTance, NCSR/NONX, PLANe, RBONd, SPHEricity, TORSION, VDWR, HOLD
(Their function is very similar to that in PROLSQ)

Optional keywords controlling the data harvesting functionality:

PNAME, DNAME, PRIVATE, USECWD, RSIZE, NOHARVEST

A) Essential Refinement keywords:

LABIN <program label>=<file label>...

This card is essential for all refinement except idealisation of geometry. To some extent the course of the refinement is governed by the assignments given.

The following program labels can be assigned:

FP SIGFP   FREE   FPARTi PHIPi   HLA HLB HLC HLD  or  PHIB FOM

Assignments for FP and SIGFP are always required.

The use of the free R flag is highly recommended. This is an important component in using maximum likelihood sensibly. If FREE is assigned, reflections which are flagged with nfree_exclud (default 0) are excluded from the gradient calculation, and therefore the agreement between them and Fc is not part of the refinement procedure.

REMARK 0: It is strongly recommended to run the uniqueify script on the first dataset as soon as possible, e.g. after TRUNCATE. One purpose of this script is to add a column of free-R flags, and it is important for the validity of the free-R approach that this is done before any model refinement. If you are continuing model refinement with a new data set, it is important to preserve the FreeR assignment used before. See FreeR assignment.

REMARK 1: Reflections flagged for free R flag calculation are omitted from any parameter refinement, and also from the scale and B-factor calculation. For ML refinement the default is to use them to estimate SigmaA. [See SCALe and SIGMA keywords.]

If you wish to add known FPART(s) to the structure factors, assign FPARTi and PHIPi. This option gives the program a lot of flexibility. Possible examples of it use are: 1) Adding in calculated hydrogen contributions. 2) Adding in anisotropic metal atoms. 3) Adding in contributions from unmodelled parts of a structure. Eg: A solvent continuum, Uninterpretable parts of an MIR map.

REMARK 2: See SCPART keyword: The FPARTi will be added to the FC without any further scaling unless this is set.

PHIB FOM An input phase and its figure of merit.

Or: HLA HLB HLC HLD The Hendrickson-Lattman coefficients describing an input phase. These can be obtained from the usual routes; MIR, MAD plus density modification.

REMARK 3: In our experience using DM phases gave better result than using MIR phases. The FOM weighting may need to be changed. See PHASe keyword

LABOUT <program label>=<file label> ...

FP SIGFP (and FREE if available) plus the following additional columns are output. Labels can be assigned: FC PHIC FWT PHWT DELFWT PHDELWT FOM [PHCOMB]

By default, FP and SIGFP are scaled to match FC, using 1/K0. The overall B correction, which can be anisotropic, is applied to the FC terms. IF you have APPLY OBSERVED the inverse of this B factor is applied to FP.

FC PHIC are the structure factor contributions from the input atoms only, WITHOUT any FPARTi terms added.

FWT and DELFWT are similar to the terms output from SIGMAA:

DELFWT and PHDELWT - Fourier amplitude and phase for 'difference' map (mFo-DFCalc)
FWT and PHWT - Fourier amplitude and phase for '2Fo-Fc' map (2mFo-DFCalc)

PHCOMB is the combined phase to use with FP. It is output only if experimental phases were input.

If prior phase probabilty distribution is available then all output Fourier coefficients correspond to combined phases.

FOM = <m> - The "figure of merit" for this reflection . See Paper for definition ( Ref ??)

For the FWT and DELFWT terms, FP is scaled by 1/K0, FCalc= vector sum of {Dc FC + D1 FPART1 + D2 FPART2 + .. }, Di are the resolution dependent SIgmAA term, and m is the figure of merit of this reflection.. (PHDELWT and PHWT will be "combined" phases if HLA etc or PHIB FOM have been assigned.) Otherwise they equal either PHIC or PHIC+180.

Missing Data: For those reflections where the FP are missing mFo is set equal to dFc. Hence the terms become FWT=dFC and DELFWT=0.0 The m and D are based on the program's estimate of SigmaA.

Rebuilding into these 2mFo-DFcalc and mFo-DFcalc maps seems easier than using classic nFO-(n-1)FC and difference maps, consistent with the established technique for SigmaA style maps. One advantage here is that since the m and D values are based on the Free set of reflections they are less biased than the values obtained by the ccp4 version of SIGMAA after refinement.

REFI [ TYPE | PHAS | RESI | BREF | METH | RESO ]

For example:

 
           REFI TYPE RESTrained RESOLUTION  20 1.50
           REFI RESI MLKF
           REFI BREF ISOTropic    METH CDIR 
             or
           REFI TYPE REST  RESO 20 1.50  RESI MLKF  BREF ISOT METH CDIR
             or 
           REFI TYPE RIGID  (all other definitions are defaults)

This keyword controls the type of refinement or idealization. Subkeywords:

TYPE RESTrained | UNREstrained | IDEAlise | RIGID
[Default RESTrained]
PHASed SCBLurred <scblur> BBLUrred <bblur> SIGMacalc
[Default : only used if PHASE definition given on LABIN ; scblur = 1.0, bblur = 0. Do not use phases for SigmA estimation even if available]
RESIdual LSQF | MLKF
[Default MLKF ]
BREFinement OVERall | ISOTropic | ANISotropic | MIXED
[Default ISOTropic] [ANISotric MIXED]
METHod CGMAt | CGRAd | CDIR
[Default CDIR]
RESO <resmin> <resmax>
[Default: all data]

In more detail, these subkeywords are:

TYPE
This keyword describes type of refinement.
RESTrained invokes restrained refinement, where both the Xray residual (reflecting the agreement between the observed and the calculated Fs, and the geometric residual (reflecting the fit between the expected and the observed geometry) are minimised at the same time. The relative weighting of these two terms is given by values of Wxray and Wgeom. These can be set using the WEIGht keyword.
UNREstrained is for unrestrained refinement. [ie: Wxray =1, Wgeom=0]
IDEAlised is for geometric idealization. [ie: Wxray =0, Wgeom=1]
RIGID invokes rigid body refinement. The description of domains is given below. Alternative to MODE RIGID.
PHASed
SCBL <scblur> BBLUr <bblur>
If experimental phases are being used it may be necessary to blur the phase probabilities, especially after some density modification calculations. (This information can also be input with the keyword PHASE - see below.)

Program will apply blurring as follows:

      HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)

or

     If PHASE and FOM are given: the program first generates 
     the Hendrickson-Lattmann coefficients 
     using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), 
                       HLB = Func(FOM)*SIN(DEGTOR*PHASE),
                       HLC = HLD = 0. 
     ie the Phase probability distribution is unimodal.

SIGMAcalc - use vector differences when estimating SIGMaa; ie use the phase information.
RESIdual
This keyword describes the Xray part of the function.
LSQF - defines amplitude based least-squares residual.
        Fxray = SUM(Whkl*(|FO|-|FC|)**2)
MLKF - A -loglikelihood residual derived from Rice distribution for centric and acentric cases of Fs.
        Fxray = SUM(LLKcentric_hkl) + SUM(LLKacentric_hkl)
If experimental phase information is available the residual is modified appropriately. This invoked by assigning appropriate input columns; see key word LABIN. (For methodology see G.N. Murshudov, A.A.Vagin and E.J.Dodson,(1997) in Acta Cryst. D53, 240-255)
METHod
This keyword describes method of minimization (see Tronrud, Acta Cyst 1992 A48 pp 912-916).
CGMAT is sparse matrix as in PROLSQ.
CGRAD is conjugate gradient.
CDIR (default) is the conjugate direction method

REMARK: If you are using IDEAlise it is better to use CGRAD option. When using RESTrained option CDIR will work much better. Maybe for NCS the CGMAT is more appropriate; it is the only one which considers off-diagonal terms for atom-atom interactions, and these can be large when the NCS restraints are tight. If the minimisation fails it may be worth trying another option.

BREFinement
This keyword describes method for assigning atomic Bvalues.
OVERall :- Overall B-factor (Boverall) obtained from scaling is added to the atomic Bvalues.
ISOTropic :- Individual isotropic B-factor refined for all atoms.
ANISotropic - Individual anisotropic B-factor refined for all atoms
MIXEd - Some atoms with isotropic, some with anisotropic B-values. In this case input file (pdb) defines which atom should be refined isotropicly and which anisotropicly. Only atoms with ANISOU card are refined anisotropicly. (This option has not been tested extensively).
RESO
<resmin> <resmax> [ Default: all data]
<dmin>, <dmax> (default 1, 1000) are resolution limits used for refinement in Angstroms (or in 4*sin**2/l**2 if both are < 1.0. They can be given in either order. If only one value is given, it is assumed to be the high resolution cutoff.

Include all well measured data, not omitting the weak observations; it will be weighted appropriately. The low resolution data helps define the solvent shell. However if you have lost strong terms by some accident of data collection the scaling may not behave well.

SCALe [TYPE <BULK|SIMP>] [BAVER <baverage>] [LSSC [ANIS] | [FIXBulk <scbulk> <bbulk>] | [RESO <resmin> <resmax>] | [NCYC <ncyc>] | [EXPE] | [FREE] | [APPL <OBSE|CALC>] ]

The SCALE keyword has several different options.. See below for keywords for estimation of SIGMaA, triggered by SCALe MLSC. For example:

       SCAL TYPE BULK    LSSC ANIS    RESO 10 2.1
       SCAL TYPE SIMP    LSSC EXPE 
       SCALe LSSC FIXBulk BBULk 70.0 SBULk -0.75 ! Fix bulk solvent parameters for LSQ scaling

These keyword controls the type of scaling of Fo and Fc. Subkeywords:

TYPE BULK | SIMPle
[Default BULK]
BAVERage <baverage>
[If there is not sufficient data to refine a B value it is possible to hold it at some sensible value derived from the Wilson plot.]
LSSC
Flag to indicate all following keywords apply to estimation of scale between Fo and Fc.
ANISotropic
[Default isotropic overall scale]
FIXBulk <scbulk> <bbulk>
[Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein component. It can help to fix these]
RESO <resmin> <resmax>
[Default: all data used for refinement]
NCYCle <ncyc>
[Default ncyc = 10]
EXPE
[Default is to not use experimental sigmas in the determination. the keyword EXPE changes this to use experimental sigmas]
FREE
[Default Scales are calculated against the WORKing set of reflections, but if requested it can be derived from the FREE set.]
APPLY OBS | CALC
[Default - ouput file contains Fobs modified to Fcalc scale]

In more detail, these subkeywords are:

TYPE
with one of the following sub-subkeywords:
BULK
(Default.) If TYPE BULK, then the scale KB is a function of 4 variables with the form:
                  KB = K0*exp(-B0*s^2) * (1- K1*exp(-B1*s^2))  

The scale formulation is based on the Babinet principal and described by Dale Tronrud and others. Ref?? TNT manual. Better results can be obtained by more sophisticated modelling, but this is a simple compromise.

SIMPLE
If TYPE is SIMPle the scale factor has the form:
                   KB = K0*exp(-B0*s^2)  (Simple Wilson scaling)
                   ie: K1 = 0

This may be more appropriate if you are adding bulk solvent contributions as an FPARTi term or you have parameterised most of the expected waters.

BAVErage <baverage>
Lower resolution structures may not have sufficient data to give a robust Wilson plot overall B factor, so it is possible to fix the <B> for the structure to a set value. If you are using this option it is important to add remaining B-value to observed structure factors.
FIXB <scbulk> <Bbulk>
Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein. SCBULK = <solvent_density>/<protein_density> Ie: For aqueous solvent, with solvent density ~ 1.0. and protein density ~ 1.35, SCBULK ~ 1.0/(1.35)
ANISO
Many crystals generate seriously "anisotropic" reflection data. This is presumably due to some crystalline disorder, and is not the same as anisotropy of individual atoms. However the correction can be expressed in a similar form.

The isotropic overall B factor B0, is replaced by:

      B11*s1*s1+2.0*B12*s1*s2+2.0*B13*s1*s3+B22*s2*s2+2.0*B23*s2*s3+B33*s3*s3 

i.e. [s] [B] [s]* where:

      [s] is vector in reciprocal orthogonal system 
                  = [s1 s2 s3] = [ h k l ] [RFR] ,
      [h k l] are the Miller indices;
      RFR is the fractionalisation matrix.
            [B11 B12 B13] 
      [B] = [B21 B22 B23] ;  B21 = B12, B31 = B32, B32 = B23.
            [B31 B32 B33] 

Anisotropic scaling of data should ideally be done at the merging stage but often the distortion aligns with the crystal axes, and therefore cannot be detected ifrom symmetry equivalent reflections alone. Large improvements in Rfactors can result from this correction.

RESO
<sub-subkeywords> <resmin resmax> or <dmin> <dmax>
[Default all data used for refinement]

Defines resolution limit for scaling. If you are using the Wilson plot option with only one gaussian for scaling this keyword allows you to exclude the low resolution data which is affected by solvent. Advice: use 4.5A to the highest resolution.

NCYC <ncyc>
Default: <ncyc> = 10
EXPE
Default is to not use experimental sigmas in the determination. the keyword EXPE changes this to use experimental sigmas.
FREE
Default is to use all reflections in the WORKing set for scaling. The keyword FREE changes this to determine the scale from the FREE set of reflections.
APPL OBSE|CALC
APPL OBSE | CALC will apply overall Bcorrection to either observed or calculated structure factors.

NB: Before applying bulk solvent scaling and including all low resolution data Check your distribution of <F> looks sensible. This is the raw material for all overall scaling algorithms. A good way to check this is to look at a <Fsq> plot against resolution.

This should look something like this:

         +
          +           +
           +        +     +
             +    +          +
                +                  +
       <10A     5A    4.5A          ............

If the low resolution looks strange - it may mean your backstop was causing problems, intensities were saturated etc etc, and including such data may give crazy solvent scales. A sensible sort of value would be : Fc 1.0 0.0 Bulk Solvent -0.4 200.0

Program always first calculates log scaling assuming Bs = exp(-B1*s^2) This minimises: LFoKBFc = sum(ln(|FO|-ln(K)-ln(|FC|)+B1*S^2)^2 where the FC is simply derived from the input coordinates, without any addition of FPARTi terms. (This option does not require iterative cycles of scaling) and after this it scales KFoBFc =sum(K*|FO|-Bs*|FC|)^2 where the FC include FPARTi and Bulk solvent correction if requested.

NB-=======================================================================
NB-=======================================================================
NB-=======================================================================
We are not really sure how best to handle scaling. If you have problems please get in touch. In our experience there have been no problems with data sets with resolution 2.5A or higher, unless there was some obvious flaw; huge ice rings or Is labelled as Fs or some such thing. But with one unusual data set which died at 2.7A there has been a problem, which we got round by tweaking parameters, but these cases should be automatically checked.
NB-=======================================================================
NB-=======================================================================
NB-=======================================================================

NOTE: When doing ML refinement the scale factors are only used to calculate R values.

SCALe MLSC [ <NCYC <ncyc> | WORK | FIXBulk <scbulk> <bbulk> ]

For example:

       SCALe MLSC FIXBulk BVALue 100.0 SCVAlue -0.1 

The SigmA estimate SA is generally fitted to the normalised Free reflections using a 4 parameter equation of an analogous form to the bulk scaling:

                  SA = SA0*exp(-T0*s^2) * (1- SA1*exp(-T1*s^2))

These keywords controls the estimation of SA. Subkeywords:

FIXBulk
The options FIXBulk to fix parameters can be evoked in the same way as for the SCAL LSSC options, but should only be used with EXTREME care! There is no logical way to estimate expected values.
NCYCle <ncyc>
[Default <ncyc> = 10]
Use <ncyc> iterative cycles to determine the parameters.
WORK
[Default Sigmaa is calculated against the FREE set of reflections]
The keyword WORK changes this to determine the scale from the WORKing set of reflections.

WEIG [NOEX|EXPE] MATRix <wmat> | GRAD <wgrad> | SIGMA <wsigma>

[Default EXPE MATR 0.5]
This keyword controls the weighting of the terms

NOEX
Exclude experimental sigmas from weighting.

This sub-keyword allows you not to use experimental sigmas of the observations for the Xray residual. The default action is to use them.

The remaining sub-keywords control the relative weighting of the X-ray and geometry terms in the residual.

MATR <wmat>
[Default 0.5] Weighting using average diagonal term of Xray and geometry Hessians (same as PROLSQ). Weighting equates wmat*average_diagonal_of_geometry to average_diagonal_of_Xray terms. Increase <wmat> to increase the weight on the Xray terms; decrease <wmat> to tighten the geometry.
GRAD <wgrad>
[Default 1.0] Weighting using norms of Xray and geometry gradient vectors. Weighting equates wgrad*norm_of_geometry_gradient to norm_of_Xray_gradient. Increase <wgrad> to increase the weight on the Xray terms; decrease <wgrad> to tighten the geometry.
SIGMA <wsigma>
[Default 0.5] <wsigma> is applied to Xray terms as 1/(wsigma**2). Decrease <wsigma> to increase the weight on the Xray terms; increase <wsigma> to tighten the geometry.

B) Other Refinement keywords:

BINS | RANGE <nbins> | <range>

Number of bins <nbins>. Default 20, maximum allowed 100. For least-square refinements it is useful only for monitoring statistics on resolution ranges. Used for normalisation to convert Fs to Es.

Or <range>. If the value after BINS/RANGE is less than 1.0 this is assumed to define the bin width in units of 4*sin**2/Lambda**2

BLIM <Bmin> <Bmax>

[Default 2.0 1000.0]
Bmin and Bmax are minimum and maximum B-factors allowed. This keyword is not recommended. The lower limit is required by the program, but is set by default.

CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]

Defines parameters of the cell. If this keyword is specified then cell parameters from MTZ and coordinate files will be overridden. This keyword could be important when cell dimensions are suspect and the user wants to change them.

DAMP <Pdamp> <Bdamp>

[Default 1.0 1.0 for resolution > 2.5A, 0.5 0.5 for resolutions < 2.5]
<Pdamp> <Bdamp> scales shifts after each cycle of refinement. If there is limited data, or you are using unrestrained refinement it is sensible to scale down the shifts.

FREE <nfree_exclud>

[Default 0]
The default value for exclusion for the FreeR is zero. However there is an opportunity to reset this exclusion flag here. See the description of the program FREERFLAG.

MODE [HKRF | RIGID]

[Default HKRF]
HKRF
This invokes standard refinement of individual atomic parameters.
RIGID
This invokes RIGID body refinement. The description of domains is given below under keyword RIGID.

MONI <subkeyword 1> [<subkeyword i> <value>]

<subkeyword 1> can be one of:

NONE
means no monitoring. Program will work without any output info.
FEW
means only overall statistics will be monitored.
MEDI
means MEDIum number of statistics will be monitored. You can choose quantity of statistics by choosing numberi.
MANY
means all available info about statistics will be monitored.

The subsequent <subkeyword i> are (only used if output requested is MEDIum):

TORSION <badtor>
Print restrained torsion angles differing by more than BADTOR * torsig from the ideal value. Default is 3.
HBOND
Print distance information on possible hydrogen bonds. Not printed by default.
DISTANCE <baddis>
Count restrained distances differing by more than BADDIS * dissig from the ideal value and print dis- tances differing by more than 2 * BADDIS * dissig. Default is 3.
PLANE < badpln>
Print restrained planes differing by more than BAD- PLN * plnsig from the ideal value. Default is 2.
VANDERWAAL (or VDWR) <badvdw>
Print contact distances that are closer than badvdw Angstrom from the ideal. Default is .6 Angstrom.
XSHIFT <badshi>
Print shifts that are greater than BADSHI Angstrom. Default is 0.4A.
CHIRAL <badchi>
Print restrained chiral volumes differing by more than BADCHI * chisig from the ideal value. Default is 2.5
BSHIft <dbcut>
Needs to be explained. [ default 0.25]

NCYC <ncycref>

[Default 5]
This keyword defines number of cycles of refinement done before PROTIN is run again.

PHASe SCBL <scblur> | BBLUr <bblur>

(See above; this functionality can also be input on the REFI line.) If experimental phases are being used it may be necessary to reduce their assigned figure of merits,especially after some density modification calculations.

Program will apply blurring as follows:

      HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur)
      HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)

or

     If PHASE and FOM are given: the program first generates HLA and HLB 
     using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), 
                       HLB = Func(FOM)*SIN(DEGTOR*PHASE), 
                       HLC = HLD = 0. 
     ie the Phase probability distribution is unimodal.

RIGIDbody GROUP | PRINT | NCYCLE

Rigid body refinement in REFMAC is performed as following. Program first finds the center of mass of all domains. This is defined as (Tgx,Tgy,Tgz) where:

     TgX = sum(Occ(i)*X(i))/Natom 
     TgY = sum(Occ(i)*Y(i))/Natom    summed over all atoms in the domain.
     TgZ = sum(Occ(i)*Z(i))/Natom

Both Euler and polar rotation angles and the coordinates of the center of mass are refined for each domain. (i.e. program finds Delta_Alpha, Delta_Beta, Delta_Gamma, and deltaTg).

REMARK: In the solution of linear equation to obtain the shifts the program uses singular value decomposition, thus avoiding problems with singularities such as that when Beta =~ 0.0 Only alpha+gamma can be determined.

Subkeywords:

GROUp <ngroup> FROM <resnum> <chnid> TO <resnum> <chnid> | EXCLude | TRANS | EULErangles
<ngroup>
Domain number. Maximum number of domains in the current version of program is 100. Each domain can consist of 100 different pieces. For example:
RIGIDbody GROUP 1 FROM 10  A TO 100 A  }  A domain containing most of the A chain, and 
RIGIDbody GROUP 1 FROM 108 A TO 176 A  }  an embedded bit of a B chain, consisting of 
RIGIDbody GROUP 1 FROM 200 B TO 220 B  }  residues 200-220. 
..
RIGIDbody GROUP 2 FROM 10  B TO 100 B  } for example the equivalent domain across a NCS axis
RIGIDbody GROUP 2 FROM 108 B TO 176 B  }
RIGIDbody GROUP 2 FROM 200 A TO 220 A  }

<resnum> <chnid>
Spans are specified by the residue number, and the chain ID. See examples. By default all atoms belong to one domain.
EXCLude SCHAin | MCHAin | NONE
Exclude main chain, or side chain atoms from refinement. Rotation and translation are determined for the defined subset. Rotation and translation are applied for all atoms also. Default is NO EXCLUSION.
TRANS <Tx> <Ty> <Tz>
To add translation for this domain before starting refinement. Default is 0.0 0.0 0.0
EULErangles <alpha> <beta> <gamma>
Rotate domain by these angles (in degrees) before starting refinement (Note that all rotations are performed around center of mass of this domain). Default is 0.0 0.0 0.0
PRINT EULEr MATRix
Print rotation angles (EULErangles) and rotation MATRix. Default is EULErangles
NCYCle <ncycle>
Number of cycles for rigid body refinement. Default is 40 cycles

Note: For rigid body refinement you don't need to run protin or other supporting programs.

SCPArt < nsc1> <nsc2> <nsc3> ...

If NSCi are set, the FPARTnsci is scaled relative of the FC FC_tot = FC_calc(PHIcalc) +nsc1*FPART1(PHIP1) + nsc2*FPART2(PHI2) + ....

If NSCi set - that partial structure factor will be scaled

SYMM

Defines space group symmetry. If MTZ file is defined then symmetry from MTZ will be used. This keyword is essential for idealisation.

C) Restraints keywords:

The following key words are used to set the weighting of the restraints. We have attempted to make the default values agree with those used for PROLSQ.

DISTance <wdskal> <sigd1> <sigd2> <sigd3> <sigd4> <sigd5>

[Default 1.0 0.02 0.04 0.05 0.05 0.02]

<WDSKAL> is a multiplier used in calculating the weights for the distance restraints. The weights are of the form: Weight = WDSKAL*(model distance - ideal distance) / SIGD**2

<SIGD1> is the a value associated with bonded distances (e.g. Calpha-C).

<SIGD2> is the a value associated with a non-bonded distance determining an angle to be restrained (e.g. N - C).

<SIGD3> is the a value for a planar 1-4 distance (e.g. O(i-1)-Calpha(i)).

<SIGD4> is the a value associated with a bond distance involving hydrogen (e.g. Calpha - Halpha).

<SIGD5> is reserved for special distance groups (as set up with KDWT = 5 in the dictionary of ideal atomic dis- tances).

PLANe <wpskal> <sigpp> <sigpa>

[Default 1.0 0.03 0.02]

<WPSKAL> is a multiplier for the distance restraints defining planes. The weight used for the peptide planes is (WPSKAL / SIGPP)**2, and for the aromatics (WPSKAL / SIGPA)**2.

<SIGPP> is the sigma value associated with distances used to define peptide planes (e.g. 0.03).

<SIGPA> is the sigma value associated with distances used to define aromatic planes (e.g. 0.02 ).

CHIRal <wcskal> <sigc>

[Default 1.0 0.15]

<WCSKAL> is used in weighting Chiral groups restraints. The weight used is (<WCSKAL> / <SIGC>)**2

<SIGC> is the value associated with Chiral groups restraints (e.g. 0.02 to 0.04).

TEMPerature | BFAC <wbskal> <sigb1> <sigb2> <sigb3> <sigb4>

[Default 1.0 2.0 3.0 2.0 3.0 ]

<WBSKAL> is the used in calculating the weight for the tem- perature factor restraint parameters based on the types of bonding in which the atoms are involved. (See DISTANCE keyword.)

SPHEricity <sigsph>

[Default 2.0 ]

<sigsph> is used to restraint atomic anisotropic tensor to be spherical

RBONd <sigrbond>

[Default 1.0 ]

<sigrbond> is used to make minimum of projections of anisotropic tensors along bond for the bonded atoms.

NCSR <wsskal> <sgsp1> <sigsp2> <sigsp3> <sigsb1> <sigsb2> <sigsb3>

[Default 1.0 0.05 0.5 5.0 0.5 2.0 10.0]

<WSSCAL> is used in weighting restraints involving non- crystallographic symmetry. The weight is given by (<WSSCAL> / <SIGS>)**2

<SIGSP1>, <SIGSP2>, <SIGSP3> are a values associated with non- crystallographic positional restraints and are for tight, medium and loose restraints respectively.

<SIGSB1>, <SIGSB2>, <SIGSB3> are a values associated with non- crystallographic thermal restraints and are for tight, medium and loose restraints respectively.

TORSION <wtskal> <sigt1> <sigt2> <sigt3> <sigt4>

[Default 1.0 15.0 3.0 15.0 20.0]

<WTSCAL> is a multiplier used in calculating the weights for the torsional angle restraints. The weight is of the form Weight = (<WTSKAL> / <SIGT>)**2

<SIGT1> is the a value associated with a pre-specified angle (usually Phi and Psi of a regular secondary structure).

<SIGT2> is the a value associated with a planar angle (e.g. omega).

<SIGT3> is the a value associated with a staggered angle (e.g. Chi1).

<SIGT4> is the a value associated with an orthonormal angle (e.g. Chi2 of aromatics).

Starting values of say 20 degress are suggested for <SIGT1> to <SIGT4>.

VANDerwaal (or VDWR) <wvskal> <sigv> <dinc(1)> <dinc(2)> <dinc(3)>

[Default 1.0 1.0 -0.3,0.0]

<WVSKAL> is used in the weighting scheme for Van der Waals contacts. The weight used is (<WVSKAL> / <SIGV>)**2

<SIGV> is the sigma value associated with Van der Waals dis- tances.

<DINC(1)>, <DINC(2)>, <DINC(3)>, <DINC(4)>. The relevant matrix elements are only incremented if the distance between the non-bonded atoms is less than the sum of the Van der Waals radii. The values of DINC are used to change the ideal distance for each sort of contact.

<DINC(1)> is for atoms where relative position is determined by only one torsion angle: single torsion contact.

<DINC(2)> is for atoms where two or more torsion angles determine their relative positions: multiple torsion con- tact.

<DINC(3)> is for possible hydrogen bonds (contacts between any N and any O except Nmain - Nmain or Omain - Omain).

HOLD <PBEL> <BDEL> <QDEL>

[Default 0.3 3.0 0.2]

Restraints to current values <PDEL> is a positional shifts magnitude restraint. This goes into the matrix as (a/<PDEL>)**2, (b/<PDEL>)**2, (c/<PDEL>)**2 where a, b, c are the unit cell parameters.

<BDEL> is a shifts magnitude restraint on individual thermal parameters and goes into the matrix as (1/<BDEL>)**2. It is not used if ITEMP = 0.

<QDEL> is the shifts magnitude restraint on variable occu- pancy factors (not used if NOCC=0). This goes into the matrix as (1/<QDEL>)**2.

[Also see keyword MONItor]

D) Data Harvesting keywords

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

$HARVESTHOME/DepositFiles/<projectname>/ <datasetname>.refmac

The environment variable $HARVESTHOME defaults to the user's home directory, but could be changed, for example, to a group project directory.

PNAME <project_name>

Project Name. In most cases, this will be inherited from the MTZ file.

DNAME <dataset_name>

Dataset Name. In most cases, this will be inherited from the MTZ file.

PRIVATE

Set the directory permissions to '700', i.e. read/write/execute for the user only (default '755').

USECWD

Write the deposit file to the current directory, rather than a subdirectory of $HARVESTHOME. This can be used to send deposit files from speculative runs to the local directory rather than the official project directory, or can be used when the program is being run on a machine without access to the directory $HARVESTHOME.

RSIZE <row_length>

Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

NOHARVEST

Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

E) Final keywords:

END | GO | QUIT | STOP

End of keywords. Time to do work.

INPUT AND OUTPUT FILES

Input files:

XYZIN
The input coordinate file in PDB format.
HKLIN
Reflection data in MTZ format. This must contain columns H K L FP SIGFP and it should contain a FreeR_flag column. It may also contain FPARTi PHIPi - up to 9 partial terms may be used. It is essential to input ONLY one asymetric unit of reciprocal space (any one will do).

It MUST contain reflections from only one asymmetric unit. REFMAC resorts the hkl list, so sort order and choice of asymmetric unit are not important. It is very sensible to have followed the "complete" script given as Example 0. See more reasons for this in the description of HKLOUT.

It is EXTREMELY ADVISEABLE to assign FreeR_flags for each reflection if you are using the Maximum Likelihood option. The FreeR set is used to estimate comparatively unbiased SigmaA values.

IF FPARTi and PHIPi have been assigned, the calculated structure factor will equal the scaled FPARTi vector(s) plus the contribution calculated from the input coordinates.

PROTOUT
The file of atom parameters and restraints passed from the program PROTIN. This is an unformatted sequential file whose structure is described in the program documentation for PROTIN. This is rewritten to a scratch file PROTSCR with certain modifications for special atoms during a run.
PROTCOUNTS
The second output file from PROTIN. A single record unformatted sequential file containing the summary counts from PROTIN.

Output files:

XYZOUT
The output coordinate file with the calculated shifts applied.
HKLOUT
MTZ file containing the columns H K L FP SIGFP (FreeR_flag) FC PHIC FWT PHWT DELFWT PHDELWT FOM [ PHCOMB]

Temporary files:

PROTSCR,SCRAT1,REFSCR. These will be automatically deleted.

PRINTER OUTPUT

The printer output (logfile) is divided into a number of sections as described below. Not all the sections need appear on a given run of the program as their output depends on the options selected via the control data.

a)
A section giving details of the control data including details of the geometry restraints and weights.
b)
A section on the distance restraint information. This may give, depending on the MONItor requirements:
  1. The numbers of distances which differ my more than NSIG*sigma and 2*NSIG*sigma from the ideal values.
  2. A list of the distances which differ from the ideal values by more than 2*NSIG*sigma giving both the ideal and model values.
  3. The value of Sigma w*deltad**2 .
  4. A table giving the type of distance, the root weight, the number of distances, the rms(delta) value and the value for sigmaD for each distance type. The distance types are as follows: Type 1 = Bonded distance. Type 2 = Distance used to restrain an angle. Type 3 = Planar 1 to 4 atom distance. Type 4 = Hydrogen bond, Metal coordination distance etc.
c)
A section on the plane restraint information. This gives:
  1. Details of the planes whose rms deviation from the plane is greater than 2 sigma.
  2. The overall rms deviation from the planes.
  3. The value of Sigma w*deltap**2 .
  4. A matrix/gradient sample.
d)
A section describing the chiral centre restraints. This gives:
  1. A list of the chiral centres and the ideal and model values for the chiral volumes. Centres for which the ideal and model volumes differ by more than 1.0 are flagged with an asterisk.
  2. The value of Sigma w*deltac**2 .
  3. A table giving the root weight, the rms delta value, the sigmaC value and the number of chiral centres with delta values greater than 1.0.
  4. A matrix/gradient sample.
e)
A section on the non-bonded contacts. This gives:
  1. A list of the the single torsion contacts which are shorter than the allowed values.
  2. A list of the multiple torsion contacts which are shorter than the allowed values.
  3. A list of possible hydrogen bonds giving the atoms involved and the ideal and model distances.
  4. The value of Sigma w*deltav**2 .
  5. A table giving the contact type, the root weight, the number of contacts, the rms(delta) value, the sigmaV value and the DINC value for each contact type as follows: Type 1 = Single torsion contact. Type 2 = Multiple torsion contact. Type 3 = Possible hydrogen bond.
f)
A section on the conformation torsional angle restraints. This gives:
  1. A list of the conformational torsion angles.
  2. The value of Sigma w*delta**2 and the average value of w*deltat**2.
  3. A table giving the torsion angle type, the root weight, the number of torsion angles, the rms(delta) value and the sigmaT value for each con- formational type as follows: Type 1 = Pre-specified value. Type 2 = Planar (0,180 degrees). Type 3 = Staggered (+/- 60, 180 degrees). Type 4 = Orthonormal (+/- 90 degrees).
g)
A section with details of the non-crystallographic symmetry. This gives:
  1. For each pair of chains the transformation matrices and the relative rotation angles Phi, Psi and Chi are given.
  2. The number of positions in each chain which differ by more than 2 sigma from the average posi- tion.
  3. A list of the atoms which are more than 2 sigma from the average position.
  4. The values of Sigma w*deltas**2 and aver- age(w*delta**2) both positional and thermal parame- ters.
  5. A table giving for each type of restraint the restraint type, the root weight, the number of restraints, the sigma value and the rms(delta) val- ues for up to 4 chains. The resraint types are: Type 1 = Tight positional restraint. Type 2 = Medium positional restraint. Type 3 = Loose positional restraint. Type 4 = Tight thermal restraint. Type 5 = Medium thermal restraint. Type 6 = Loose thermal restraint.
h)
A section on the thermal restraints. This gives:
  1. The value of Sigma w*deltab**2 and the average value of w*deltab**2 .
  2. A table giving for each type of thermal restraint, the restraint type, the root weight, the number of restraints, the rms(deltaB) value, the sigmaB value, the mean deltaB value, the rms(deltaU) value and the mean deltaU value. The restraint types are: Type 1 = Main chain bonded atom. Type 2 = Main chain angle. Type 3 = Side chain bonded atom. Type 4 = Side chain angle. Type 5 = X-H bond. (X=any atom, H=hydrogen) Type 6 = X-H angle. Type 7 = Hydrogen bond, Metal coordination bond etc.
i)
A section on the Xray scaling of FO to FC. The program first estimates an approximate Wilson scale using logaritmic scaling. It then cycles round to estimate a better Wilson or Bulk scale set by least squares minimisation. The initial and final values are output. If Anisotropic scaling is requested, it then carries out a further 10 cycles to estimate the anisotropic parameters. For Example:
Scale and B factors calculated by log scaling =      0.4365         -7.855
log scaling is minimization of sum(ln(FO)-ln(FC)-log(scale)+sin^2(th/lm)*B)
Number of reflection used for scaling=    13868
***************************************

Ls initialisation is o.k.

    ****     Estimating Overall scales or Sigmaa values     ****

Maximum number of refinement cycles        10
Initial values of K-s and B-s
    1     0.437    -7.855
    2    -0.400   150.000
Final values of K-s and B-s after    10 cycles
    1     0.469    -4.220
    2    -0.400   150.000
Initial values of K-s and B-s
    1     0.469    -4.220     0.000     0.000    -4.220     0.000    -4.220
    2    -0.400   150.000
Final values of K-s and B-s after    10 cycles
    1     0.472   -12.268     0.000    -3.527     5.880     0.000    -3.284
    2    -0.400   150.000
 Overall isotropic B =    -3.3507

j)
Information about the Rfactor and the FreeR factor. This is presented in a form suitable for xloggraph, and for a mmCIF table. (You can grep for _R_factor )
k)
For the maximum likelihood option, information about the correlation coefficients, figure of merit and sigmaA. The correlation coefficient and figure of merit should increase during refinement.
Values of K and B to fit sigmaA to the Free set of reflections are estimated. Program ouput: Final values of K-s and B-s after 40 cycles 1 1.048 3.120 2 -0.184 149.968 sigmaA should not be greater than 1.0, and if these values diverge wildly you may have too few "free" reflections to determine sigmaA accurately. You can increase the scaling low resolution cutoff, and this might help.
l)
Information about the Xray and geometry gradients. You should worry if the Magnitude of X_ray positional gradient is wildly different to Magnitude of Geom. positional gradient The Cosine of the angle between the Xray shift and Geometry shift direction indicates whether the two terms are moving the structure in a concerted direction(cos > 0.0, angle acute) or in opposition(cos $lt; 0.0). If the Cosine tends to -1.0, you may as well stop refinement; the Xray and geometric corrections are cancelling each other out.
m)
(Optional) A section on the matrix solution. This gives:
  1. A sample of the matrix and the vector elements.
  2. A sample of the progress of the conjugate gra- dient solution.
n)
A section on the parameter shifts. This gives:
  1. A list of atoms and their shifts for those atoms which have large positional or thermal shifts.
  2. The mean values of the shifts (in fractional coordinates and in Angstroms for the x,y,z parameters).
  3. The rms values of the shifts (in fractional coordinates and in Angstroms for the x,y,z parameters).

ERROR MESSAGES

Errors in the control data

Various errors may be reported by the input-parsing routines and should be self-explanatory.

If a specific space group is requested and the required subroutines are not available then the program will stop with the error message:

      SPACE GROUP UNAVAILABLE

Errors in program capacity

If the limits of the program's capacity are exceeded then one of the following messages will be printed and the program will stop:

 THE NUMBER OF VECTOR OR MATRIX ELEMENTS EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY
 THE NUMBER OF ATOMS OR DISTANCES EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY

PROGRAM FUNCTION

It may help to read the ps version of Garib's talk at Chester ($CDOC/garib_talk.ps) or one of the references.

Terminology:

 |Fo(h)|  - observed amplitude of structure factor.
 |Fc(h)|  - calculated amplitude of structure factor
  Io(h)   - observed intensity of structure factor   IO = |FO|**2
  Ic(h)   - calculated intensity of structure factor IC = |FC|**2
 |Eo(h)|  - normalised observed amplitude of structure factor.
 |Ec(h)|  - normalised calculated amplitude of structure factor

 s     : 2.0*sin(theta)/lambda

 D     : <cos(2*pi*s^2*delta(r))> Luzzati 1952.
 sA    :  SigmaA   = (ref - Read, Srinavasan)
  Theoretically,if EPS  is  the  multiplicity  for  the  reflection  zone
                  SigmaA = D*sqrt(SigmaP/SigmaN)
           where  SigmaN = <Fo**2/EPS> and SigmaP = <Fc**2/EPS>.
 X(h) : 2|Eo|*|Ec|*sA/Var_D

 Var_D(h) : variance of distribution
 Var_D(h) : W*SigmaExpt(Eo) **2    +  Epsilon * (1- sA**2) ) ; 
             W=1 if SigmaExpt used, W=0 otherwise.

Define
 I0(X) :   0 order first type modified Bessel function
 I1(X) : 1st order first type modified Bessel function
 cosh  : hyberbolic cose   (EXP(X)+EXP(-X))/2.0
 sinh  : hyberbolic sine   (EXP(X)-EXP(-X))/2.0
 tanh  : hyberbolic tangent sinh(X)/cosh(X)

 m(h): figure of merit (m = I1(X)/I0(X) for acentric,
                              tanh(X)     for centric reflections)
 m(h)  should equal <cos(AlphaTrue - AlphaCalc)>
 
 The program estimates  SigmaA by minimising  -logLikelihood calculated with
 normalised Eo and Ec. 
   In REFMAC SigmaA can be estimated from the FreeR reflections and 
   fitted by a two gaussian approximation similar to Tronrud's scaling.

   The  figure  of  merit m which should equal <cos(AlphaTrue - AlphaCalc)> 
   is calculated from Eo, Ec and SigmaA.
              where   Eo   = Fo/sqrt(EPS*SIGMAN)
                and   Ec   = Fc/sqrt(EPS*SIGMAN)


 Geometry:
         
 D_12_id : ideal value of bond distances
 D_12_mod: real value of bond distances
 D_13_id : ideal value of angle distances
 D_13_mod: real value of angle distances
 D_14_id : ideal value of distance between atoms connected by torsion angle
 D_14_mod: real value of this

Description of program

It minimises:

      Ftot =  Wxray *Fxray   +   Wgeom *Fgeom  
                          where  Wgeom = 0 or 1
         (Wgeom = 1 for   restrained refinement)
         (Wgeom = 0 for unrestrained refinement)

         Wxray  =  1/(SIGMA**2) where SIGMAis given on keyword REFI
         (Geometric Idealisation is equivalent to   Wxray  = 0)
    

  Fgeom is based on the geometric quantities refined in PROLSQ.
ie: Fgeom = Wds*SUM((D_12_id-D_12_mod)**2/SIGD1**2) 
    =====      +SUM (D_13_id-D_13_mod)**2/SIGD2**2) 
               +SUM((D_14_id-D_14_mod)**2/SIGD3)**2 
               + terms for planarity, chirality etc etc...
                        exactly as in PROTIN and PROLSQ.

Choosing of Fxray has two options so far; amplitude based least squares, and a loglikelihood residual:

    LSQF  defines amplitude based least-square residual.
    ====
    SUM   : sum over hkl
       FXRAY = SUM(Whkl*(|FO|-|FC|)**2)

    MLKF defines simplest version of -loglikelihood residual which comes
    ==== from Rice distribution for centric and acentric cases assuming there
      is no knowledge of the phases. Ref:??

       Fxray = SUM(LLK_cen) + SUM(LLK_acen)

       where
       LLK_cen  is -loglikelihood function for centric reflections
       LLK_acen is -loglikelihood function for acentric reflections

       LLK_cen   = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(cosh(X)) + LN(Var_D)/2
       LLK_acen  = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(I0(X)) + LN(Var_D)

All this now uses Normalised structure factors.

       
       Var_D = W*SigmaExpt(Eo) **2    +  (1- Sum(sA(j)**2) ) 

where the sA(j) are the SigmaA for each independent partial structures, (W=1 if experimental sigmas used; 0 otherwise).

A two Gaussian fit to the <Eo-Ec>**2 is used to define SigmaA.

       
         sA(j) = sA(j)0 exp(-B(j)0*s**2) ( 1-sAsolv * exp(-bsolv s**2) )

This gives a smooth curve against resolution. If no bulk solvent (ie a one Gaussian approximation) sAsolv = 0 .

The values of m and sA are best estimated using the FREE reflections. Here the sA which are a function of resolution are fitted to the FreeR reflections as a second order Gaussian. In the CCP4 version of SIGMAA the sA are deduced from the fit over resolution bins of <Eo**2> and <Ecalc**2>.

The SigmaA technique means that sA tends to 1.0 as the Rfactor falls even if this is being achieved by some unrealistic technique; loosening geometric restraints, adding too many waters etc.. Using the FreeR set helps avoid this problem ,but the number of reflections per bin were too few to give a smooth curve without interpolation. Fitting the gaussian seems a satisfactory compromise.)

The gradients are estimated from a modified difference map with coefficients {(m|FO| - D |FC|}, PHIcalc. At the end of each set of cycles an HKLOUT file is written containing h k l Sc*Fobs Fc PHIcalc FWT =(~2mFO-dFC) PHIWT DELFWt =(~ mFO-dFC) PHIDELWT Rebuilding into the FWT and DELFWT maps seems better than using classic 2Fo-Fc and Fo-Fc maps. The "missing Data" hkl terms for FWT are set = dFC , DELFWt = 0.0. This restoration of terms helps reduce systematic noise in the 2mFO-dFC maps. See example 0 for details on how to include missing data.

Scaling of Fobs to Fc now uses the Bulk solvent correction of the same type as TNT, Shelx etc. The scale is Kexp(-bs) ( 1-Ksolv exp(-bsolv))

This is excellent as long as your data is reasonable at low resolution; Dont include complete junk; eg reflections behind the Hamburg backstop. Ie - look at your xloggraph from hklplot, or some similar program which gives <F**2> v resolution. If it has one minima at ~ 5A, and a peak at 4.5A then falls away smoothly this is appropriate; if the shape is abnormal, worry about why!

EXAMPLES

The following examples are included to illustrate the various ways in which REFMAC can be run:

Example 0.
Completing the data to include all possible hkls. Should do this after data reduction, and certainly before using REFMAC.
Example 1.
Restrained refinement with overall B-factor refinement. Method is sparse matrix method. PROTIN is run first.
Example 2.
Unrestrained refinement by maximum likelihood method.
Example 3.
Idealization. Method of minimization is conjugate gradient method
Example 4.
Restrained refinement with Partial contribution from hydrogens.
Example 5.
Restrained refinement with maximum likelihood method followed by ARPP to place waters.
Example 6.
Restrained refinement with maximum likelihood method etc. An extremely complicated script incorporating averaging, arp, etc etc.
Example 7.
Restrained refinement with maximum likelihood method etc. 3A data requires fixing of protein Bfactor and BULK scales
Example 8a.
Example of rigid body refinement in refmac. Ordinary case with several domains. Side chains excluded from refinement
Example 8b.
Example of rigid body refinement in refmac. Experimental phase information used. Side chains excluded from refinement
Example 9.
Example of using experimental phase information. Very bad model (RMS error 2A)
Example 10.
Example of individual anisotropic B-value refinement. Hydrogens must be added prior to refinement

Example 0

Completing the data to include all possible hkls. Should do this after data reduction, and certainly before using REFMAC. This is now done with the uniqueify script. Martyn - I think all this should be deleted?

#! /bin/sh
#
set -e
#
# Replace missing data with Missing Number Flags (in this case NaNs).
#
mtzmnf hklin $CEXAM/toxd/toxd_old.mtz hklout $CCP4_SCR/toxd_nan.mtz \
<<eof
TITLE toxd data, testing
LABI F1=FTOXD3 SIGF1=SIGFTOXD3 -
     D2=ANAU20 SIGD2=SIGANAU20 -
     F2=FAU20  SIGF2=SIGFAU20 -
     F3=FMM11  SIGF3=SIGFMM11 -
     F4=FI100  SIGF4=SIGFI100
END
eof

#
# Case (1)
#
# Complete dataset and add free-R column.
# Keep systematic absences with -s switch (though you probably wouldn't
# want to do this).
#
uniqueify -s $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique.mtz

#
# Case (2)
#
# Complete dataset and add free-R column.
# Increase the fraction of reflections tagged with each free-R
# indicator above the default 0.05 (sensible for toxd which has
# small dataset).
#
uniqueify -p 0.1 $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique2.mtz

#
# Case (3)
#
# First add free-R column to incomplete dataset.
# (Silly thing to do - this is just to create a dataset with an existing
# free-R column for illustrating use of uniqueify with -f switch.)
#
freerflag hklin $CCP4_SCR/toxd_nan.mtz hklout $CCP4_SCR/toxd_free.mtz << eof
END
eof
#
# Now complete with -f switch to indicate free-R column already present.
#
uniqueify -f FreeR_flag $CCP4_SCR/toxd_free.mtz $CCP4_SCR/toxd-unique3.mtz
#

Example 1

Restrained refinement with overall B-factor refinement. Method is sparse matrix method. PROTIN is run first.

#!/bin/csh -f
#
#   Example of refinement by refmac
#
#
#  Set parameters
#
set CTEST=/y/programs/xtal/ccp4/examples
cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb 
#
uniqueify $CTEST/toxd/toxd.mtz

set inmtz=toxd-unique.mtz
start:

set name = toxd
set last = 0
set cycles = 4
set count = 0
while ($count != $cycles)
echo '*******************************************************************'
echo  $count 
echo '*******************************************************************'
@ curr = $last + 1
#
#
#   Coomplete data and add freeR flag. You may not need this step
#   See complete_toxd.sh
#
#
#   Run protin  to set up geometric restraints
#   PROTCOUNTS contains the number of distances, chiral centres,etc..
#   PROTOUT    contains atomic coordinates plus all possible restraint 
#              pairings.
#
protin                          \
XYZIN $CCP4_SCR/${name}${last}.pdb \
PROTOUT $SCRATCH/protout.dat            \
PROTCOUNTS $SCRATCH/counts.dat          \
<< eop
!CHNNAME  ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid
CHNNAM ID  B  CHNTYP 1 
CHNNAM ID  W  CHNTYP 2
CHNTYP 1  NTER 1 GLN 3   CTER 59 GLY 2 
CHNTYP 2 WAT
PEPP 4 
SYMM 19
VDWRadii 1 CA 7 3.8
VDWCUT 5
END
eop
if ($status) exit
#
# Refmac step. Refine
#   PROTSCR    - an abbreviated version of PROTOUT
#                it contains atomic coordinates plus all restraint 
#                pairings for atoms which actually are present.
#
refmac:
refmac \
HKLIN   $inmtz \
HKLOUT   $CCP4_SCR/${name}${last}.mtz \
        PROTOUT $SCRATCH/protout.dat \
        PROTCOUNTS $SCRATCH/counts.dat \
        PROTSCR $SCRATCH/counts.scr \
XYZIN   $CCP4_SCR/${name}${last}.pdb \
XYZOUT  $CCP4_SCR/${name}${curr}.pdb \
<< eor
LABIN FP=FTOXD3 SIGFP=SIGFTOXD3  FREE=FreeR_flag
LABO FC=FC PHIC=PHIC    FWT=2FOFCWT PHWT=PH2FOFCWT -
                     DELFWT=FOFCWT  PHDELWT=PHFOFCWT
REFI TYPE RESTrained RESOLUTION  20 1.50
REFI RESI MLKF
REFI BREF ISOTropic    METH CGMAT 
WEIGHT MATRIX 0.35
!Scaling parameters
SCAL TYPE BULK   
NCYC 2 
MONI FEW
BINS 20
end
eor
#
if ($status) exit
# make maps.
#
#  Sigmaa style 2mfo-dfc map with restored data
#
fft:
fft \
hklin $CCP4_SCR/${name}${last}.mtz \
mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \
<<eof
title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients
labi F1=2FOFCWT PHI=PH2FOFCWT
binmapout
end
eof
if ($status) exit
#
#   Sigmaa style mfo-dfc map
#
fft \
hklin $CCP4_SCR/${name}${last}.mtz \
mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \
<<eof
title Sigmaa style mfo-Dfc map calculated with refmac coefficients
labi F1=FOFCWT PHI=PHFOFCWT
binmapout
end 
eof
if ($status) exit
#
@ last++
@ count++
end

Example 2

Unrestrained refinement by maximum likelihood method.

#!/bin/csh -f
#
#   Example of refinement by refmac
#
set CTEST=/y/programs/xtal/ccp4/examples
cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb
#
uniqueify $CTEST/toxd/toxd.mtz

set inmtz=toxd-unique.mtz
start:

set name = toxd
set last = 0
set cycles = 4
set count = 0
while ($count != $cycles)
echo '*******************************************************************'
echo  $count
echo '*******************************************************************'
@ curr = $last + 1
#
#
#   Coomplete data and add freeR flag. You may not need this step
#   See complete_toxd.sh
#
#
#
#
#
#
# Refmac step. Refine
#
refmac:
refmac \
HKLIN   $inmtz \
HKLOUT   $CCP4_SCR/${name}${last}.mtz \
XYZIN   $CCP4_SCR/${name}${last}.pdb \
XYZOUT   $CCP4_SCR/${name}${curr}.pdb \
<< eor
LABIN FP=FTOXD3 SIGFP=SIGFTOXD3  FREE=FreeR_flag
LABO FC=FC PHIC=PHIC    FWT=2FOFCWT PHWT=PH2FOFCWT -
                     DELFWT=FOFCWT  PHDELWT=PHFOFCWT
REFI TYPE UNREstrained RESOLUTION  20 1.50
REFI RESI MLKF
REFI BREF ISOTropic    
#WEIGHT MATRIX 0.35
!Scaling parameters
SCAL TYPE BULK   
NCYC 2 
MONI FEW
BINS 20
end
eor
#
if ($status) exit
# make maps.
#
#  Sigmaa style 2mfo-dfc map with restored data
#
fft:
fft \
hklin $CCP4_SCR/${name}${last}.mtz \
mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \
<<eof
title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients
labi F1=2FOFCWT PHI=PH2FOFCWT
binmapout
end
eof
#
if ($status) exit
#   Sigmaa style mfo-dfc map
#
fft \
hklin $CCP4_SCR/${name}${last}.mtz \
mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \
<<eof
title Sigmaa style mfo-Dfc map calculated with refmac coefficients
labi F1=FOFCWT PHI=PHFOFCWT
binmapout
end 
eof
if ($status) exit

Example 3

Idealization. Method of minimization is conjugate gradient method

#!/bin/csh -f
#
#   Example of refinement by refmac
#
#
#  Set parameters
#
set CTEST=/y/programs/xtal/ccp4/examples
set crdin=$CTEST/toxd/toxd.pdb
set crdout=toxd_1.pdb
#
#
#
#   Run protin  to set up geometric restraints
#   PROTCOUNTS contains the number of distances, chiral centres,etc..
#   PROTOUT    contains atomic coordinates plus all possible restraint 
#              pairings.
#
protin                          \
XYZIN $crdin \
PROTOUT $SCRATCH/protout.dat            \
PROTCOUNTS $SCRATCH/counts.dat          \
<< 'END-protin ' 
!CHNNAME  ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid
CHNNAM ID  B  CHNTYP 1 
CHNNAM ID  W  CHNTYP 2
CHNTYP 1  NTER 1 GLN 3   CTER 59 GLY 2 
CHNTYP 2 WAT
PEPP 4 
SYMM 19
VDWRadii 1 CA 7 3.8
VDWCUT 5
END
'END-protin '
if ($status) exit
#
# Refmac step. Refine
#   PROTSCR    - an abbreviated version of PROTOUT
#                it contains atomic coordinates plus all restraint 
#                pairings for atoms which actually are present.
#
refmac:
refmac \
        PROTOUT $SCRATCH/protout.dat \
        PROTCOUNTS $SCRATCH/counts.dat \
        PROTSCR $SCRATCH/counts.scr \
XYZIN   $crdin \
XYZOUT $crdout \
<< eor
REFI TYPE IDEAlise 
REFI RESI MLKF
REFI    METH CGMAT   
NCYC 2 
MONI FEW
end
eor
if ($status) exit
#

Example 4

Restrained refinement with Partial contribution from hydrogens.

#!/bin/csh -f
#
start:
set name = kak1_
set last = 1
set cycles = 4
set count = 0
while ($count != $cycles)
echo '*******************************************************************'
echo  $count 
echo '*******************************************************************'
@ curr = $last + 1
#
#           Run protin to set RESTRAINT lists.
#
protin:
protin         \
XYZIN $SCRATCH/${name}${last}.pdb    \
PROTOUT $SCRATCH/kak_protout.dat   \
PROTCOUNTS $SCRATCH/kak_counts.dat \
<< END-protin
CHNNAME ID A CHNTYP  1 
CHNNAME ID B CHNTYP  2 
CHNNAME ID C CHNTYP  3 
CHNNAME ID D CHNTYP  4 
CHNNAME ID E CHNTYP  5 
CHNTYP 1 NTER 1 ALA 3 CTER 517 HIS 2 CISPRO 1 283
CHNTYP 2 NTER 1 LYS 3 CTER 3 LYS 2
CHNTYP 3 WAT      #  You can call a CHAIN WAT as long as there is no connectivity..
CHNTYP 4 WAT
CHNTYP 5 WAT
SYMMETRY  19
END
END-protin
if ($status) exit
#
#   Run hgen to generate hydrogen position
#
hgen:
hgen \
XYZIN $SCRATCH/${name}${last}.pdb     \
XYZOUT $SCRATCH/hyd_${name}${last}.pdb     \
<<eof
HYDROGENS SEPARATE
eof
if ($status) exit
#
#   Run sfall to calculate structure factors from hydrogens
#
sfall:
sfall \
HKLIN  $SCRATCH/kak_freer_unobs.mtz   \
HKLOUT $SCRATCH/kak_freer_unobs_hyd.mtz   \
XYZIN $SCRATCH/hyd_${name}${last}.pdb     \
<< eof
NOSCALE
MODE SFCALC XYZIN HKLIN
LABI FP=F SIGFP=SIGF FREE=FreeR_flag
LABO FC=FC_hyd PHIC=PHIC_hyd
!Refinement parameters
BINS 20
END
eof
if ($status) exit
#
#   Run refmac - refinement adding in hydrogen contributions
#
refmac:
refmac \
HKLIN $SCRATCH/kak_freer_unobs_hyd.mtz   \
XYZIN $SCRATCH/${name}${last}.pdb     \
HKLOUT $SCRATCH/${name}${last}.mtz     \
PROTOUT $SCRATCH/kak_protout.dat   \
PROTCOUNTS $SCRATCH/kak_counts.dat \
REFSCR $SCRATCH/prot_scr.mtz \
XYZOUT $SCRATCH/${name}${curr}.pdb  << eop
MODE HKRF
LABI FP=F SIGFP=SIGF FREE=FreeR_flag FPART1=FC_hyd PHIP1=PHIC_hyd
LABO FC=FC_p PHIC=PHIC_p FWT=2FOFCWT_p DELFWT=FOFCWT_p
!Refinement parameters
SCPART 1
MONI MANY PLAN 0.3
REFI TYPE REST
REFI RESI MLKF RESO 20.0 1.2
REFI BREF ISOT METH CDIR 
WEIGHT MATRIX 0.5
!Scaling parameters
SCAL TYPE BULK
SCAL LSSC RESO 20 1.2
NCYC 5
BINS 20
END
eop
if ($status) exit
#
@ last++
@ count++
end

Example 5

Restrained refinement with maximum likelihood method followed by ARPP to place waters

#
#!/bin/csh -f
#
#  A script which refined a structure 
#  After refinement the REFMAC maps are calculated, and density 
#  passed to ARPP to fit waters into.


#
#  Step 1: protin and REFMAC.
#
#  Step 2: Calculate ffts for 2mFodFc and mFodFc maps .

#  Step 3: Expand the maps as ARPP requires ( you need the "edges")
#
#  Step 4: Run arpp to find the waters 
#
 
#goto start
#
###########################################################################
###########################################################################
#  Step 1: protin and REFMAC.
#   NCS restrains on molecules A B C AND water chains W X Y
#
start:
set name    = 'cutm_'
set last    =  6
set cycles  =  3
set count   =  0
set data    = 'cute_dnpp.free.mtz'
#
while ($count != $cycles)
@ curr = $last + 1
#
#
protin \
XYZIN $SCRATCH/${name}${last}.pdb \
PROTOUT $SCRATCH/cut_protout.dat   \
PROTCOUNTS $SCRATCH/cut_counts.dat \
<< eof
TITL  Cutinase monoclinic + DNPP
CHNNAM ID A CHNTYP 1
CHNNAM ID B CHNTYP 1
CHNNAM ID C CHNTYP 1
CHNNAM ID W CHNTYP 2
CHNNAM ID X CHNTYP 2
CHNNAM ID Y CHNTYP 2
CHNTYP 1 NTERM  3 GLY  3 CTERM 193 ARG 2 DISUL 2 17 94 156 163
CHNTYP 2 WAT
NONX 3 CHNID A B C NSPANS 1 1 200 1
NONX 3 CHNID W X Y NSPANS 1 1 200 2
SYMMETRY 4
LIST FEW
END
eof
#
if ($status) exit
#
refmac:
refmac   \
HKLIN  $data  \
PROTOUT $SCRATCH/cut_protout.dat   \
PROTCOUNTS $SCRATCH/cut_counts.dat \
PROTSCR $SCRATCH/cut_counts.scr \
XYZIN $SCRATCH/${name}${last}.pdb \
HKLOUT $SCRATCH/${name}${curr}.mtz     \
XYZOUT $SCRATCH/${name}${curr}.pdb  << eop
#
LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag
LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT
!Refinement parameters
REFI TYPE REST
REFI RESI MLKF RESO 20.0 2.05
PLAN 1 0.04 0.015 
REFI BREF ISOT METH CGMAT 
WEIGH MATRIX 0.8
TEMP 1.0 3.0 5.0 6.0 8.0 1.0
!Scaling parameters
SCAL TYPE BULK  LSSC ANIS RESO 20 2.05
NCYC 4    ! cycles round refinement NCYC times before redoing PROTIN
MONI FEW
BINS 20
END
eop
#
if ($status) exit
#
###########################################################################
###########################################################################
#---- fft 2Fo-Fc map 
#
#
#  Step 2: Calculate ffts for 2mFodFc and mFodFc maps 
#          ( this facilitates the averaging)

#
fft HKLIN  $SCRATCH/${name}${curr}.mtz \
    MAPOUT $SCRATCH/${name}${curr}_32_p21.map << eof
TITLE           2FO-1FC 
RESOLUTION      20.0 2.05
GRID            128 128 128
XYZLIM          0 127 0 63 0 127
BINMAPOUT
LABIN           F1=FWT  PHI=PHWT
END
eof
#
#
#
if ($status) exit
#
#
fft HKLIN  $SCRATCH/${name}${curr}.mtz \
    MAPOUT $SCRATCH/${name}${curr}_11_p21.map \
<< eof
TITLE           1FO-1FC 
RESOLUTION      20.0 2.05
GRID            128 128 128
XYZLIM          0 127 0 63 0 127
BINMAPOUT
LABIN           F1=DELFWT  PHI=PHDELWT
END
eof
#
#
#
if ($status) exit
2:
#
###########################################################################
###########################################################################
#  Step 3: Expand the maps to fill the edges for ARPP
#
mapmask \
MAPIN  $SCRATCH/${name}${curr}_32_p21.map \
MAPOUT $SCRATCH/${name}${curr}_32_p21.ext \
    << eof
SYMM P21
XYZLIM          0 128 0 64 0 128
eof
#
#
if ($status) exit
#
mapmask \
MAPIN  $SCRATCH/${name}${curr}_11_p21.map \
MAPOUT $SCRATCH/${name}${curr}_11_p21.ext \
    << eof
SYMM P21
XYZLIM          0 128 0 64 0 128
eof
#
#exit

#
if ($status) exit
#
#
###########################################################################
###########################################################################
#  Step 4: Run arpp to find the waters 
#
#
arpp:
#
arpp \
     xyzin   $SCRATCH/${name}${curr}.pdb \
MAPIN1 $SCRATCH/${name}${curr}_32_p21.ext \
MAPIN2 $SCRATCH/${name}${curr}_11_p21.ext \
     xyzout $SCRATCH/temp_arp.pdb  << eof
SYMMETRY P21    
REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50  MERGE 1.8  
FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO  
#FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3  
FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5
REFINE WATERS
END
eof
#
#
if ($status) exit
#
mv $SCRATCH/temp_arp.pdb    $SCRATCH/${name}${curr}.pdb 
#
/bin/rm *TMP*,*ARP*
#
@ last++
@ count++
end
#
exit
#
#
echo " check log file error in one step "
#

Example 6

Restrained refinement with maximum likelihood method etc. An extremely complicated script incorporating averaging, arp, etc etc.

#!/bin/csh -f
#
#  A script which refined a structure with 3 molecules in the 
#  asymmetric unit; 
#  After refinement the REFMAC maps are averaged, and density around one 
# molecule only is passed to ARPP to fit waters into.
#  This set of waters is expanded to include those around the other two molecules.
  then the process is cycled..


#  Step 1: Find the rotations which convert the master molecule ( in this case C)
#          to overlap the other two ( here labelled A & B)
#
#  Step 2: protin and REFMAC.
#   NCS restrains on molecules A B C AND water chains W X Y
#
#  Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1.
#          ( this facilitates the averaging)

#  Step 4: Extract atoms for chains C and Y from refmac output coordinates.
#          Use ncs mask to define a mask round Molecule C and its associated waters Y
#
#  Step 5: Average 2mFodFc and mFodFc maps over this mask. 
#          Output two maps - wrkout only covers the masked region.
#                            mapout is a map expanded to the volume 
#                            given on the MAPIN limits ( usually P1)
#                            It contains the averaged density in the 
#                            volume under the mask, and its non 
#                            crystallographic equivalents, and the 
#                            unmodified density everywhere else.
#
#  Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded
#          where there is no density. ( ARPP requires a full cell)
#
#  Step 7: Run arpp in P1 to find the common waters which should show up 
#          in the averaged density.
#
#  Step 8: Rebuild a coordinate  set containing molecules A B C plus 
#          the new ARPP waters, plus those generated by applying the 
#          ncs symmetry to these.
#           ( Use grep and pdbset)
 
goto start
###########################################################################
###########################################################################
#  Get rotations to average C coordinates over A and B
#  I assume these wont change during cycles...
#  Step 1: Find the rotations which convert the master molecule ( in this case C)
#          to overlap the other two ( here labelled A & B)
#
lsqkab \
workcd cutm_0.pdb \
refrcd cutm_0.pdb \
<< eof
FIT WRES 1 to 193 WCHAIN C
MATCH RRES 1 to 193 RCHAIN A
OUTPUT RMS
END
eof
if ($status) exit
#
lsqkab \
workcd cutm_0.pdb \
refrcd cutm_0.pdb \
<< eof
FIT WRES 1 to 193 WCHAIN C
MATCH RRES 1 to 193 RCHAIN B
OUTPUT RMS
END
eof
if ($status) exit
#

###########################################################################
###########################################################################
#  Check the transformations are correct. They are! cutm_0_A.pdb is the same 
#       as the A chain cordinates in cutm_0.pdb..
pdbset:
pdbset \
xyzin $SCRATCH/cutm_0_C.pdb \
xyzout $SCRATCH/cutm_0_A.pdb \
<< eof
ROTATE EULER    9.70152 -119.69961   7.80144
SHIFT 29.55211   8.54860  19.71765

CHAIN W
eof
if ($status) exit
#

pdbset \
xyzin $SCRATCH/cutm_0_C.pdb \
xyzout $SCRATCH/cutm_0_B.pdb \
<< eof

ROTATE EULER    175.90645 -120.30214 172.70070
SHIFT  -1.62819   4.70139  35.11710
CHAIN X
eof
if ($status) exit
#
exit
#
###########################################################################
###########################################################################
#  Step 2: protin and REFMAC.
#   NCS restrains on molecules A B C AND water chains W X Y
#
start:
set name    = 'cutm_'
set last    =  6
set cycles  =  3
set count   =  0
set data    = 'cute_dnpp.free.mtz'
#
while ($count != $cycles)
@ curr = $last + 1
#
#
protin \
XYZIN $SCRATCH/${name}${last}.pdb \
PROTOUT $SCRATCH/cut_protout.dat   \
PROTCOUNTS $SCRATCH/cut_counts.dat \
<< eof
TITL  Cutinase monoclinic + DNPP
CHNNAM ID A CHNTYP 1
CHNNAM ID B CHNTYP 1
CHNNAM ID C CHNTYP 1
CHNNAM ID W CHNTYP 2
CHNNAM ID X CHNTYP 2
CHNNAM ID Y CHNTYP 2
CHNTYP 1 NTERM  3 GLY  3 CTERM 193 ARG 2 DISUL 2 17 94 156 163
CHNTYP 2 WAT
NONX 3 CHNID A B C NSPANS 1 1 200 1
NONX 3 CHNID W X Y NSPANS 1 1 200 2
SYMMETRY 4
LIST FEW
END
eof
#
if ($status) exit
#
refmac:
refmac   \
HKLIN  $data  \
PROTOUT $SCRATCH/cut_protout.dat   \
PROTCOUNTS $SCRATCH/cut_counts.dat \
PROTSCR $SCRATCH/cut_counts.scr \
XYZIN $SCRATCH/${name}${last}.pdb \
HKLOUT $SCRATCH/${name}${curr}.mtz     \
XYZOUT $SCRATCH/${name}${curr}.pdb  << eop
#
LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag
LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT
!Refinement parameters
REFI TYPE REST
REFI RESI MLKF RESO 20.0 2.05
PLAN 1 0.04 0.015 
REFI BREF ISOT METH CGMAT 
WEIGH MATRIX 0.5  # (default)
TEMP 1.0 3.0 5.0 6.0 8.0 1.0
!Scaling parameters
SCAL TYPE BULK  LSSC ANIS RESO 20 2.05
NCYC 4    ! cycles round refinement NCYC times before redoing PROTIN
MONI FEW
BINS 20
END
eop
#
if ($status) exit
#
###########################################################################
###########################################################################
#---- fft 2Fo-Fc map in P1 cell ( needed for averaging)
#
#
#  Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1.
#          ( this facilitates the averaging)

#
fft HKLIN  $SCRATCH/${name}${curr}.mtz \
    MAPOUT $SCRATCH/${name}${curr}_32_p1.map << eof
TITLE           2FO-1FC 
FFTSYMMETRY     P1
GRID            128 128 128
XYZLIM          0 127 0 127 0 127
BINMAPOUT
LABIN           F1=FWT  PHI=PHWT
END
eof
#
#
#
if ($status) exit
#
#
fft HKLIN  $SCRATCH/${name}${curr}.mtz \
    MAPOUT $SCRATCH/${name}${curr}_11_p1.map \
<< eof
TITLE           1FO-1FC 
FFTSYMMETRY     P1
GRID            128 128 128
XYZLIM          0 127 0 127 0 127
BINMAPOUT
LABIN           F1=DELFWT PHI=PHDELWT
END
eof
#
#
#
if ($status) exit
2:
###########################################################################
###########################################################################
#
# Remove A B W and X chains from REFMAC output coordinates
grep -i -v " b "  $SCRATCH/${name}${curr}.pdb   > $SCRATCH/junk1.pdb 
grep -i -v " a "   $SCRATCH/junk1.pdb >  $SCRATCH/junk2.pdb 
grep -i -v " w "   $SCRATCH/junk2.pdb >  $SCRATCH/junk3.pdb 
grep -i -v " x "   $SCRATCH/junk3.pdb >  $SCRATCH/${name}${curr}_CY.pdb
#
#  make a mask from the CY atoms only
#  Step 4: Use ncs mask to define a mask round Molecule C
#          and its associated Waters Y
#
rot0:
ncsmask \
xyzin  $SCRATCH/${name}${curr}_CY.pdb \
mskout  $SCRATCH/${name}${curr}_CY.msk \
<<eof
RADIUS 4
SYMM P1
GRID 128 128 128
eof
#
#
if ($status) exit
#
###########################################################################
###########################################################################
#
#  Step 5: Average 2mFodFc and mFodFc maps over this mask.
#          Output two maps - wrkout only covers the masked region.
#                            mapout is a map expanded to the volume
#                            given on the MAPIN limits ( usually P1)
#                            It contains the averaged density in the
#                            volume under the mask, and its non
#                            crystallographic equivalents, and the
#                            unmodified density everywhere else.
#
#
rot:
# wrkout map - Averaged map within the mask only - 
#              not expanded.  One molecule
# mapout map - Averaged map expanded by symm ops to 
#              cover the extent of mapin- three ncs *nsym mols.
maprot \
    MAPIN $SCRATCH/${name}${curr}_32_p1.map \
mskin   $SCRATCH/${name}${curr}_CY.msk \
wrkout $SCRATCH/${name}${curr}_32_CY.map \
mapout $SCRATCH/${name}${curr}_32_all.map \
<<eof
MODE BOTH
SCALE 0.3333
AVERAGE 3
#
ROTATE EULER 0 0 0
TRANSLATE 0 0 0

ROTATE EULER    9.70152 -119.69961   7.80144
TRANSLAT 29.55211   8.54860  19.71765

ROTATE EULER    175.90645 -120.30214 172.70070
TRANSLAT  -1.62819   4.70139  35.11710
END
eof
#
if ($status) exit
#  Now the difference map
maprot \
    MAPIN $SCRATCH/${name}${curr}_11_p1.map \
mskin   $SCRATCH/${name}${curr}_CY.msk \
wrkout $SCRATCH/${name}${curr}_11_CY.map \
mapout $SCRATCH/${name}${curr}_11_all.map \
<<eof
MODE BOTH
SCALE 0.3333
AVERAGE 3
#
ROTATE EULER 0 0 0
TRANSLATE 0 0 0

ROTATE EULER    9.70152 -119.69961   7.80144
TRANSLAT 29.55211   8.54860  19.71765

ROTATE EULER    175.90645 -120.30214 172.70070
TRANSLAT  -1.62819   4.70139  35.11710
END
eof
#
#
if ($status) exit
#
###########################################################################
###########################################################################
#  Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded
#          where there is no density. ( ARPP requires a full cell)
#
# Pad - work map to fill P1 cell, move it to 0-1,0-1,0-1
#   ready for arpp
#
mapmask \
MAPIN  $SCRATCH/${name}${curr}_32_CY.map \
MAPOUT $SCRATCH/${name}${curr}_32_CY_p1.map \
    << eof
PAD 0.0
SYMM P1
XYZLIM          0 128 0 128 0 128
eof
#
#
if ($status) exit
#
mapmask \
MAPIN  $SCRATCH/${name}${curr}_11_CY.map \
MAPOUT $SCRATCH/${name}${curr}_11_CY_p1.map \
    << eof
PAD 0.0
SYMM P1
XYZLIM          0 128 0 128 0 128
eof
#
#exit

#
if ($status) exit
#
#
###########################################################################
###########################################################################
#  Step 7: Run arpp to find the common waters which should show up 
#          in the averaged density.
#
#
arpp:
#
arpp \
     xyzin   $SCRATCH/${name}${curr}_CY.pdb \
MAPIN1 $SCRATCH/${name}${curr}_32_CY_p1.map \
MAPIN2 $SCRATCH/${name}${curr}_11_CY_p1.map \
     xyzout $SCRATCH/temp_arp.pdb  << eof
SYMMETRY 1    
REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50  MERGE 1.8  
FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO  
#FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3  
FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5
REFINE WATERS
END
eof
#
#
if ($status) exit
#
###########################################################################
###########################################################################
#  Step 8: Rebuild a coordinate  set containing molecules A B C plus 
#          the new ARPP waters, plus those generated by applying the 
#          ncs symmetry to these.
#
grep -i wat $SCRATCH/temp_arp.pdb > $SCRATCH/temp_arp_Y.pdb 
# coordinates from refmac - #
#  output CHAINS A B C of the output coordinates from refmac to 
#                             $SCRATCH/${name}${curr}_nowat.pdb
#
grep -i -v wat $SCRATCH/${name}${curr}.pdb > $SCRATCH/${name}${curr}_nowat.pdb 

#   Expand the Y waters to lie near the A molecule; call them W ...
pdbset \
xyzin $SCRATCH/temp_arp_Y.pdb \
xyzout $SCRATCH/temp_arp_W.pdb \
<< eof
ROTATE EULER    9.70152 -119.69961   7.80144
SHIFT 29.55211   8.54860  19.71765

CHAIN W
eof
#
if ($status) exit

#   Expand the Y waters to lie near the B molecule; call them X ...
pdbset \
xyzin $SCRATCH/temp_arp_Y.pdb \
xyzout $SCRATCH/temp_arp_X.pdb \
<< eof

ROTATE EULER    175.90645 -120.30214 172.70070
SHIFT  -1.62819   4.70139  35.11710
CHAIN X
eof
#
if ($status) exit
#
#
#  Put them all back together..
cat $SCRATCH/${name}${curr}_nowat.pdb $SCRATCH/temp_arp_W.pdb $SCRATCH/temp_arp_X.pdb $SCRATCH/temp_arp_Y.pdb > $SCRATCH/temp_arp.pdb  
mv $SCRATCH/temp_arp.pdb    $SCRATCH/${name}${curr}.pdb 
#
#/bin/rm *TMP*
#
@ last++
@ count++
end
#
exit
#

Example 7

Restrained refinement with maximum likelihood method etc. 3A data requires fixing of protein Bfactor and BULK scales

#!/bin/csh -f
#
set verbose
set name = piitrig-
set last = 0
set cycles = 4
set count = 0
while ($count != $cycles)
@ curr = $last + 1
#
#goto refmac
protin:
protin  XYZIN $SCRATCH/$name$last.pdb    \
        PROTOUT $SCRATCH/${name}_protout.dat \
        PROTCOUNTS $SCRATCH/${name}_counts.dat \
<< 'END-protin'
TITLE trigonal pII at 3.0A 
SYMMETRY 154 
CHNNAME ID A CHNTYP  1 
CHNNAME ID B CHNTYP  1 
CHNNAME ID C CHNTYP  1 
CHNNAME ID W CHNTYP  2   
CHNTYP  1 NTER 1 MET 3  CTER 95 ASP 2 
CHNTYP  2    WAT
NONX 3 CHNID A B C NSPANS 1 1 95 1 
LIST  FEW
PEPP  5
END
'END-protin'
if ($status) exit
#
refmac:
refmac \
HKLIN   pii_154free.mtz \
HKLOUT   $SCRATCH/$name$last.mtz                \
        PROTOUT $SCRATCH/${name}_protout.dat \
        PROTCOUNTS $SCRATCH/${name}_counts.dat \
        PROTSCR $SCRATCH/${name}_counts.scr \
XYZIN   $SCRATCH/$name$last.pdb         \
XYZOUT $SCRATCH/$name$curr.pdb \
<< 'END-sfrk'
LABIN FP=FP SIGFP=SIGFP  FREE=FreeR_flag
!Refinement parameters
REFI TYPE RESTrained RESOLUTION  20  3.0
REFI RESI MLKF
REFI BREF ISOTropic  METH CDIR  
WEIGHT MATRIX 0.1
DAMP 0.5 0.5  !Scales shifts to avoid large shifts
!Scaling parameters
SCAL TYPE BULK LSSC  FIXBulk SCBUlk -0.75 BBULk 70 ! Fixes bulk solvent parameters
SCAL LSSC ANISO   APPLy OBSErved      !Scales aniso B and applies to observed
SCAL BAVERAGE 30.0               ! Keeps average B of molecule constatnt
BFAC 1  2.0 2.5 3.0 4.5 
NCYC 4
MONI FEW
BINS 20
LABO FC=FC PHIC=PHIC    FWT=2FOFCWT DELFWT=FOFCWT

'END-sfrk'
if ($status) exit
#
@ last++
@ count++
end

Example 8a

Example of rigid body refinement in refmac. Ordinary case with several domains. Side chains excluded from refinement

#!/bin/csh -f
#
###################################################################
#####################################################################
set name = cytc_refmac_
set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz
set last =  0
set cycles = 1
set count = 0
while ($count != $cycles)
@ curr = $last + 1
refmac \
HKLIN $inmtz \
HKLOUT ${name}${curr}.mtz \
XYZOUT ${name}${curr}.pdb \
XYZIN ${name}${last}.pdb    \
   << eop
LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag 
!
!Refinement parameters
!
REFI TYPE RIGID RESI MLKF RESO  15  2.0  ! Maximum likelihood refinement. Resolution limit 15 to 2.0 A
!Scaling parameters
!
SCALe TYPE BULK LSSC ANIS FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling
SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling
!
!Rigid body parameters
!
RIGIdbody NCYCle 10                                ! Number of cycles for rigid body refi
RIGIdbody GROUp 1 FROM 2   A TO 32 A  EXCLU SCHAIns ! Define domains. Exclude side chains 
RIGIdbody GROUp 2 FROM 38  A TO 55 A  EXCLU SCHAIns
RIGIdbody GROUp 3 FROM 76  A TO 99 A  EXCLU SCHAIns
RIGIdbody GROUp 4 FROM 101 A TO 126 A  EXCLU SCHAIns
RIGIDbody GROUp 5 FROM 56  A to 75 A  EXCLU SCHAins
RIGIDbody PRINt EULErangles MATRix                   ! Print Euler angles and Matrices

MONI MANY
END
eop
if ($status) exit
#
@ last++
@ count++
end

Example 8b

Same problem but now using experimental phases.

#!/bin/csh -f
set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz
set name = cytc_refmac_
set last =  0
set cycles = 1
set count = 0
while ($count != $cycles)
@ curr = $last + 1
refmac \
HKLIN $inmtz \
HKLOUT ${name}${curr}.mtz \
XYZOUT ${name}${curr}.pdb \
XYZIN ${name}${last}.pdb    \
   << eop
LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag -
HLA=HLA_pt2 HLB=HLB_pt2 HLC=HLC_pt2 HLD=HLD_pt2 
LABO FC=FC_DP5 PHIC=PHIC_1 FWT=2FOFCWT DELFWT=FOFCWT
!
REFI PHASE BBLUrring 30.0  SCBLurring 0.9 ! " Blur" (ie Scale down) FOMs
REFI PHASE                                !  use phased Fo Fc differences for sigmaA estimation
!
REFI TYPE RIGI RESI MLKF RESO  15  2.0  ! Maximum likelihood refinement 
!
!Scaling parameters
SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling
!
SCAL MLSC WORK       ! Use working reflections for sigmaA estimation. 
                     ! Useful at the very early stages of refinement
SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling
!
!Rigid body parameters
!
RIGIdbody NCYCle 10                                ! Number of cycles for rigid body refi
RIGIdbody GROUp 1 FROM 2   A TO 32 A  EXCLU SCHAIns ! Define domains. Exclude side chains 
RIGIdbody GROUp 2 FROM 38  A TO 55 A  EXCLU SCHAIns
RIGIdbody GROUp 3 FROM 76  A TO 99 A  EXCLU SCHAIns
RIGIdbody GROUp 4 FROM 101 A TO 126 A  EXCLU SCHAIns
RIGIDbody GROUp 5 FROM 56  A to 75 A  EXCLU SCHAins
RIGIDbody PRINt EULErangles MATRix                   ! Print Euler angles and Matrices

MONI FEW
END
eop
if ($status) exit
#
@ last++
@ count++
end


Example 9

Example of using experimental phase information . Very bad model ( RMS error 2A)

#!/bin/csh -f
#
#!/bin/csh -f
#set verbose
set name = tnc_test_
set last =   0
set cycles = 100
set count = 0
while ($count != $cycles)
@ curr = $last + 1
#goto refmac
#
protin        \
XYZIN ${name}${last}.pdb    \
  << END-protin
CHNNAM ID 5 CHNTYP 1 
CHNTYP 1 NTERM 2 SER 3 CTERM 162 GLY 2
SYMMETRY 154
LIST FEW
TITLE TNC very bad model
END
END-protin
if ($status) exit
refi:
date
refmac:

refmac \
HKLIN tnc_test.mtz \
HKLOUT ${name}${curr}.mtz \
XYZOUT ${name}${curr}.pdb \
XYZIN ${name}${last}.pdb    \
   << eop
LABI FP=Fnati SIGFP=SIGFnati FREE=FreeR_flag  -
HLA=HLA_t HLB=HLBt HLC=HLC_t HLD=HLD_t     !HL coefficients. It tells program
                                           !use phased refinement
#PHIB=PHIDM_t FOM=FOMDM_t      ! Option if there are no HL coefficients.
#
LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT  !Labels for output file

!Refinement parameters
REFI TYPE REST RESO 12 2.8   !restrained refinement. resolution for refinement
REFI RESI MLKF      ! Use maximum likelihood residual for refinement
REFI BREF OVER      ! Refine overall B-values
REFI METH CGMAT     ! Sparse matrix for minimisaion

WEIG MATR 0.15      ! Weight geometry and x-ray according to diagonal terms of second
                    ! derivative matrix

DAMP 0.5 0.5 ! Scales down shifts to avoid large shifts. Sometimes this value
             ! should be decreased. Especially at early stages of refi

REFI PHASed  BBLUrring 20.0  SCBLurring 0.7 ! You may need to play with blurring
                                             ! factors if you think the  phase
                                             ! weighting is overestimated. 
!Scaling parameters
SCAL TYPE BULK LSSC ANISO                    !Aniso scaling. Gaussian bulk solvent 
                                             ! correction

SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75  ! Fix bulk solvent parameters - often 
                                         ! ill determined for resolutions < 2.8A

SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.50 ! Fix second gaussian parameters for 
                                          ! SigmaA estimaton


NCYC 5                                           ! Number of refinement cycles
MONI FEW                                         ! Give just overall statistics
BINS 20                                          ! Number of resolution bins
END
eop
if ($status) exit
#exit
#
@ last++
@ count++
end

Example 10

Example of refinement of individual anisotropic B values. Hydrogens must be added prior to refinement.

#!/bin/csh -f
#set verbose
#
###########################################################################
# You have to edit labi, reso, name, data, spgr and protin                #
#                                                                         #
# data     - mtz file                                                     #
# name     - name of coordinate file                                      #
# reso     - resolution of data or resoloution limits you want to refine  #
# labi     - input file labels                                            #
# spgr     - space group of crystal                                       #
# protin   - program which makes list of restraints                       #
#                                                                         #
#######Following things should be optimised according to data used        #
#                                                                         #
# WEIG MATR <number> weighting x-ray and geometry. At low resolution#
#                          it should be small (sometimes 0.1) at high     #
#                          resolution high (sometimes 6.0 or even higher) #
# SPHEricity               restraints on sphericity of atoms. Larger      #
#                          less spherical                                 #
#                                                                         #
# NB: with current version only CDIR (conjugate direction) method of      #
#     minimisation is active                                              #
#                                                                         #
###########################################################################
set name   = 'p1lys_'
set last   = 0
set cycles = 3
set count  = 0
set data   = 'p1lys_val-unique'
set reso   = '100 0.92'
set spgr   = '1'
set labi   = 'FP=FO SIGFP=SIGFO FREE=FreeR_flag'
while ($count != $cycles)
@ curr = $last + 1
#goto refmac
#
#
#    Generate hudrogens and calculate contribution of them. 
###################################################################
hgen:
hgen \
XYZIN ${name}${last}.pdb     \
XYZOUT ${name}_hydr.pdb     \
<< eof
HYDROgens SEPArate  ! Hydrogens in seperate file
eof
#
#
#     Calculate contribution from hydrogens. NOSCALE option should 
#     be used to preserve absolute scale of structure factros of
#      hydrogens
###################################################################
sfall:
sfall \
HKLIN  ${data}.mtz  \
HKLOUT ${data}_hydr.mtz  \
XYZIN ${name}_hydr.pdb     \
 << eof
NOSCALE
MODE SFCALC XYZIN HKLIN
LABI $labi
LABO FC=FC_hyd PHIC=PHIC_hyd
!Refinement parameters
RESO $reso
BINS 20
END
eof
#
#    Run protin to generate list of restraints
###################################################################
protin \
XYZIN ${name}${last}.pdb \
<< eof
TITL  PAV2  2.01A data RXIS  at 120K
CHNNAME ID A CHNTYP  1 ROFFSET 0
CHNNAME ID W CHNTYP  2 ROFFSET 0
CHNTYP  1    NTER 1     LYS 3     CTER 129 LEU 2
CHNTYP  2    WAT
LIST    FEW        #[few]/some (for >20A)/all
PEPP    5          #no. of atoms restrained to be in a plane [5]
SYMMETRY $spgr
LIST FEW
END
eof
refi:
date
refmac:
#
#      Run refmac to refine structure. It is refinement with 
#      anisotropic B values
###################################################################
refmac \
HKLIN  ${data}_hydr.mtz \
HKLOUT ${name}${curr}.mtz \
XYZOUT ${name}${curr}.pdb \
XYZIN ${name}${last}.pdb    \
   << eop
LABI $labi FPART1=FC_hyd PHIP1=PHIC_hyd
LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT  !Labels for output file

!Refinement parameters
REFI TYPE REST RESO $reso !restrained refinement. resolution for refinement
REFI RESI MLKF               ! Use maximum likelihood residual for refinement
REFI BREF ANISotropic        ! Refine individual anisotropic B-values
REFI METH CDIR               ! Conjugate direction method.
    
WEIG MATR 4.0                ! Weight geometry and x-ray according to diagonal 
                             ! terms of second derivative matrix. 
SPHERicity 5.0               ! Restraints on sphericity of atoms. Default is 2.0
                             ! but is not good
!Scaling parameters
SCAL TYPE BULK LSSC ANISO    ! Aniso scaling to remove overall crystallographic 
                             ! mode. Gaussian bulk solvent correction
BLIM 1.0 150.0               ! Limit of B values. For anisotropic refinement it
                             ! is limit of eignevalues of anisotropic B tensor
NCYC 5                       ! Number of refinement cycles
MONI FEW                     ! Give just overall statistics
BINS 20                      ! Number of resolution bins
END
eop
if ($status) exit
#exit
#
@ last++
@ count++
end

SEE ALSO

protin arpp restrain

AUTHOR

Garib Murshudov, York University.

This document has been prepared by: Garib Murshudov, Eleanor Dodson, Maria Turkenburg.