CHARMM c24 replica.doc



File: Replica ]-[ Node: Top
Up: (commands.doc) -=- Next: Syntax


Replica: Commands which deal with replication of the molecular system: Replica.

# <caves>-Aug-18-1993 (Leo Caves) Initial release.
# REPLICA/PATH method added by B. Brooks March 1994.

The commands described in this node are associated with the replication of
regions of the PSF, see *note gener:(struct.doc)Generate.  A facility
for replication of regions of the PSF has been implemented to support a class
of methods which seek to improve the sampling of a (usually small) region of
the molecular system, by selective replication.  Such methods include LES
(Locally Enhanced Sampling [Elber and Karplus 1990, J. Amer. Chem. Soc. 112,
9161-9175]) and MCSS (Multiple Copy Simultaneous Search [Miranker and Karplus
1991, Proteins 11, 29-34]).


* Menu:
* Syntax::		Syntax of the replication commands
* Usage::	 	Description of command usage
* Implementation::	A brief description of the anatomy of replication 
* Restrictions:: 	Restrictions on usage	
* Examples:: 		Supplementary examples of the use of REPLica
* Path::                Replica Path Method



File: Replica ]-[ Node: Syntax
Up: Top -=- Next: Usage -=- Previous: Top


Syntax of PSF Replication commands

[SYNTAX: REPLication commands]

REPLica	 { [segid] [NREPlica integer] [SETUP] [atom-selection] [COMP] }
	 { RESEt }

segid:==		Basename for replica segment identifiers. 
atom-selection:==	(see *note select:(select.doc).)



RPATh { [ KRMS real ] [ KANGle real ] [ COSMax real ] [MASS] [WEIGht] }
      {   OFF                                                         }




File: Replica ]-[ Node: Usage
Up: Top -=- Next: Implementation -=- Previous: Syntax

	
Description of REPLica command usage

1) (The implicit GENERate subcommand)
This command performs the essential act of replication. Its action is to
replicate (to a degree specified by NREPlica, default: 2) (a subset of) the
molecular system, as specified in the (primary) atom-selection (default: all).
All atomic properties and topological attributes of the region are replicated
(for a full list, see *note implem:(replica.doc)Implementation).  Each
replica of the primary atom selection constitutes a new segment in (and 
appended to) the PSF, however the atom and residue names and the residue
identifiers of the primary atom selection are carried over.

The implicit generation subcommand optionally accepts a segment identifier
(segid).  The length of segid must be such that when concatenated with the
(integer representing the) maximum number of replicas specified for generation,
it does not exceed 4 characters.  If omitted, then replica segment identifiers
will simply be set to the replica number. At present no check is made for
duplicate segment identifiers, so choose with care. The command is designed to
operate in a manner similar to the GENErate command from the main parser.

The effect of the replication command may be classified into two areas:
structure and interactions. Structurally, as mentioned above, the command
performs the necessary book-keeping work for CHARMM, in order that each
individual replica is functionally equivalent to the region of the structure
specified in the atom selection. ie. in the case where the atomic positions of
an individual replica are the same as the primary atom selection (as they will
be immediately after issuing the REPLica command), the energy and forces of the
individual replica and the appropriate region of the primary system are
identical (there is an important corollary to this statement which is now
discussed).

In the area of discussing the interactions of replicas it is useful to
introduce the concept of a subsystem.  Before issuing a REPLica command, there
is considered to be one subsystem, the primary subsystem, to which all atoms
belong. Upon issuing the REPLIca command a new subsystem is generated, which
consists of replicas of a subset of the primary subsystem (as specified in the
atom selection).  In this case there are now two subsystems.

The simple cases specifying interactions of subsystems and replicas may now
be stated:
	* Replicas within a subsystem do NOT interact. 
	* Replicas belonging to different subsystems do interact. 

In CHARMM, the interaction rules of replicas are applied in the non-bonded list
generation routines, through appropriate group/atom exclusions. You will notice
some diagnostic messages from the list generation routines indicating the
number of group/atom interactions excluded on the basis of replication.  In
following the rules of interaction of replicas it is important to note that a
given replication of a subset of the primary subsystem, results in a new
subsystem. Thus the subset of the primary subsystem and its individual replicas
are now in different subsystems and are thus will interact.  For this reason,
the replication action is usually followed by an immediate removal of the atoms
of the subset of the primary subsystem, through a call to DELEte *note
dele:(struct.doc)Delete). This leaves all replicas of the specified
region in a single subsystem, arranged as contiguous segments appended to the
current PSF.

A note on renormalization of energy and forces:
In the original implementation of REPLica in a developmental version of CHARMM
at Harvard, there exists a close coupling of the REPLica command and the
energy/force evaluation routines. In the current REPLica implementation in the
standard CHARMM distribution, appropriate energy/force scaling for the system
in question may be achieved through the use of the BLOCK facility of CHARMM see
*note Block(block.doc). The combination of REPLica and BLOCK provides
for very flexible method of handling replica interactions.  Note that if the
primary system is FIXed and that only one replicated subsystem is present (the
case in many MCSS applications) then normalization of energy/forces is NOT
required.

Example:
In the following section of CHARMM command script, a segment named PROT is
generated from a sequence read from a coordinate file. A couple of selection
definitions are made which together identify the sidechain atoms of residue
12.  In the REPLIca command, 4 copies of the sidechain are generated and placed
in three new segments A1 to A4 at the end of the PSF. Next the selections are
redefined (as REPLica has altered the PSF and this corrupts existing selections
made with the DEFIne command). These (newly redefined) selections are made to
remove the sidechain atoms in the primary system that were selected for
replication. Next BLOCK is used to setup the scaling of energy and forces in
this system with a primary and a single replicated subsystem. In the call to
BLOCK, 2 blocks are requested. By default BLOCK places all atoms of the system
in block 1, so the first action is to redefine the replicated subsystem to
block 2. Next we simply set up the desired interaction matrix. Primary
subsystem self interactions are simply set to unity (no scaling).  Interactions
within each replica are set to 0.25 (the reciprocal of the number of
replicas).  Primary <--> replicated subsystem interactions are similarly scaled
by 0.25.  (Note that the REPLica interface to the non-bonded list generation
routines removes all inter-replica (intra-subsystem) interactions.) Finally,
the masses of the replicated atoms are scaled by 0.25, by using the SCALar
commands. (Note that mass-scaling may not be desirable as it has been
demonstrated that in the original LES framework, the thermal properties of the
replicas are such that at thermal equilibrium, the mapping of replicas back to
the "physical" system (with a single copy) results in too high a temperature.
The overestimation of the temperature in the physical system is a factor of N
in the simplest case of a uniform "weighting" of all replicas by a factor of
1/N, where N is the number of replicas employed in the simulation. This effect
is an active field of research, though a solution for systems where only
equilibrium properties are desired is to either scale up the masses of the
replicas by a factor of N, or to selectively rescale the velocities of the
replicas.)


...

! { read sequence and generate segment }
READ SEQU COOR UNIT 11
GENErate PROT 

! { define some useful atom selections } 
DEFIne backbone SELEct TYPE N  .OR. TYPE CA .OR. TYPE C .OR. -
                       TYPE HN .OR. TYPE HA .OR. TYPE CB END
DEFIne disorder SELEct (SEGID PROT .AND. RESId 12) .AND. .NOT. backbone END

! { replicate the selected sidechain four times }
REPLIcate A NREPlica 4 SELEct ( disorder ) END

! { redefine as REPLIca has changed PSF and this trashes SELEction definitions }
DEFIne backbone SELEct TYPE N  .OR. TYPE CA .OR. TYPE C .OR. -
                       TYPE HN .OR. TYPE HA .OR. TYPE CB END
DEFIne disorder SELEct (SEGID PROT .AND. RESId 12) .AND. .NOT. backbone END
                     
DELEte ATOM SELEct ( disorder ) END
DEFIne replicas SELEct SEGId A* END

! { set up an appropriate interaction matrix }
BLOCK 2
CALL 2 SELEct ( replicas ) END
COEF 1 1 1.0
COEF 2 2 0.25
COEF 2 1 0.25
END

! { note masses can be modified if desired through the SCALar commands }
! { note that this may not always be desirable --- see comments above }
SCALar MASS MULt 0.25 SELEct replicas END

... load/generate some coordinates and proceed..


2) The RESEt subcommand.

The RESEt subcommand has the effect of reducing all current subsystems to a
single primary subsystem. This is accomplished by simply switching off the
CHARMM's REPLica handling mechanism. This change is currently irreversible, in
as much as the REPLica state must be restored through appropriate calls to the
REPLica command.  This command is there to support the use of REPLica for
simple replication of PSF elements for which subsequent REPLIca handling is not
required.

Example:
The following example begins by building a PSF containing a single CO
molecule.  An immediate call to REPLica requests the generation of 256 replicas
(with SEGId's of R1 to R256) of the primary subsystem (the CO molecule with the
SEGId CO). Next the original CO molecule is removed. The final command,
switches off the CHARMM's replica handling, leaving a PSF with 256 CO's which
interact with each other. This may seem like a redundant command given the
ability to generate a long sequence with commands like READ SEQU COOR or a
little copy and paste with your favorite editor, but remember that REPLica can
handle replication of ANY subset of the PSF, reducing the need for tampering
with RTF definitions and creating new PATCh residues (PRES's).

READ SEQUence CARDS
* a single carbon monoxide molecule
*
1
CO

GENErate CO 					! generate the primary system

REPLica R NREP 256 SELEct SEGId CO END		! replicate
DELEte ATOM SELEct SEGId CO END			! remove primary system 
REPLica RESEt					! reduce replicates to primary



File: Replica ]-[ Node: Implementation
Up: Top -=- Next: Restrictions -=- Previous: Usage


Notes on Implementation of REPLica in CHARMM.

This node is of primary directed at CHARMM developers, but may be of 
interest to the curious user.

Structurally, the call to REPLica handles all atomic and topological properties
of atoms in the primary atom selection. Properties that are replicated include
group/residue membership, atom-code, IUPAC name, partial charge, parameter type
code, fixed atom flag, X,Y,Z and W for main and comparison and reference
coordinates, the forces DX,DY and DZ, the friction coefficient FBETa, and the
harmonic constraint.  Topological entries include bond, angle, dihedral,
improper terms, explicit non-bonded exclusion flags and H-bond donor and
acceptor arrays. Optionally, IC table entries for the primary selection are
replicated.

For interactions, the handling of replicas in CHARMM has been implemented using
a very simple data structure which allows for a simple and efficient interface
to the central CHARMM routines. Essentially, subsystem and replica identities
are maintained through the use of linked lists.  On the first call to REPLica,
the primary system (the existing PSF) is initialized to be subsystem 1 (repID),
consisting of 1 replica.  Each call to REPLica, establishes a new subsystem,
and each replica requested is distinguished by a separate replica number. The
replica number is maintained at both the group (repNoG) and atom (repNoA) level
for efficiency in the non-bonded list generation routines.

In the following schematic we see the state of the data structure in which
there is a primary system consisting of 4 atoms. The threefold replication of
atoms 2 and 3 (which form a distinct group in the primary system) is shown. The
replication forms a new subsystem (repID).  Each replicated group gets a
distinct flag representing the individual replica, as do the replicated atoms.
These flags index into the repID array which contains the subsystem membership
flags.  In this way the subsystem/replica membership is easily established
through knowledge of the group or atom number.

Atom  Name	repID	repNoG	repNoA	Comments
#
1	N	1	1	1	| Primary subsystem
2	CA	 		1	|
3	C			1	|
4	O			1 	|
5	CA	2	2	2	& Replicated substem (NREP=3)
6	C 	 	 	2	&
7	CA		3	3	&
8	C 		 	3	&
9	CA		4	4	&
10	C 		 	4	&

An schematic of the replica exclusion code in the non-bond list generation
is now given for an atom pair i and j.

IF ( ( repNoA(i) .NE. repNoA(j) ) .AND. 
     ( repID(repNoA(i)).EQ.repID(repNoA(j)) ) ) THEN
       
	EXCLUDE PAIR (i,j) in list
ELSE
	INCLUDE PAIR (i,j) in list
ENDIF

There is another component of the REPLica data structure which is a array
(byatom) of "weights". These weights in general reflect the degree of
replication of the subsystem to which the atom belongs, but may be changed
through SCALar commands (SCALar WEIGht SET..). This array was used in the
developmental version of CHARMM with REPLicas as the interface to the
energy/force routines for correct normalization. In the current standard CHARMM
release, this array exists, but is redundant. Currently it will be filled by a
value of the reciprocal of the number of replicas requested for any subsystem.
It has been retained for some degree of flexibility in future releases. At
present it may be used as an additional array for book-keeping.



File: Replica ]-[ Node: Restrictions
Up: Top -=- Next: Examples -=- Previous: Implementation


The only absolute requirement for this command is that a PSF of the molecular
system be present prior to the call to REPLIca. 

All non-bonded list generation options are currently supported, however
IMAGES and EXTENDED electrostatics are currently not supported.

Please note that the replica group flags follow the group membership of the
primary atom selection, therefore take care not to split groups in a selection
if group-based energy evaluations are to be subsequently used.

Run-time attributes of the system such as SHAKE constraints, BLOCK membership
and SBOUND flags will not be replicated. (Re)Issue such commands after
replication has been performed.

It must be noted that currently the replica handling mechanisms of CHARMM are
generated through the run-time use of the REPLica command. i.e. the REPLica
data structure is not currently incorporated in the standard system PSF or able
to be saved to an external file for restoring its state.  The philosophy is
that all necessary attributes of the replicas are contained in the primary
system PSF and that it is therefore only necessary to keep that explicitly. Of
course, the coordinates of the individual replicas must be saved.



File: Replica ]-[ Node: Examples
Up: Top -=- Next: Path -=- Previous: Restrictions


Supplementary examples.

Replication of PHE 22 and 33 and TYR 35 of BPTI
These examples illustrate two ways of setting up replicated subsystems.
In both cases replicas of the sidechains are created from CG outwards. 
In the first example three calls to REPLica are made, one for each sidechain,
which create 5 replicas for each subsystem.
In the second example, one call to REPLica is made, which replicates
all three of the sidechains, to create one replicated subsystem containing
five 5 replicas of the triad.
In each case an appropriate interaction matrix for the subsystems is 
created with the use of the BLOCK command.

Example 1: 

3 replicated subsystems: 5 copies of each individual sidechain in each.

REPLicate A NREPlica 5 SETUP -
     SELEct (SEGId 4PTI .AND. RESId 22) .AND. .NOT. -
             (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END

REPLicate B NREPlica 5 SETUP -
     SELEct (SEGId 4PTI .AND. RESId 33) .AND. .NOT. -
             (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END

REPLicate C NREPlica 5 SETUP -
     SELEct (SEGId 4PTI .AND. RESId 35) .AND. .NOT. -
             (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END


! DELETE the necessary regions of the primary sub-system
DELEte ATOM -
     SELEct (SEGId 4PTI .AND. (RESI 22 .OR. RESI 33 .OR. RESI 35)) .AND. .NOT. -             (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END

DEFIne phe22 SELEct SEGId A* END
DEFIne phe33 SELEct SEGId B* END
DEFIne tyr35 SELEct SEGId C* END

! set up the correct energy/force scaling. 
! the default coefficient is one.
BLOCK 4
   CALL 2 SELEct phe22 END	! assign replicated subsystems to blocks
   CALL 3 SELEct phe33 END
   CALL 4 SELEct tyr35 END
   COEF 2 1 0.2                 ! primary <-> replicated subsystems 
   COEF 3 1 0.2
   COEF 4 1 0.2
   COEF 2 2 0.2			! replicated subsystem self-terms
   COEF 3 3 0.2
   COEF 4 4 0.2
   COEF 3 2 0.04		! replicated <-> replicated subsystems
   COEF 4 2 0.04
   COEF 4 3 0.04
END



Example 2:

1 replicated subsystem: 5 replicas consisting of the 3 different sidechains

REPLicate A NREPlica 5 SETUP -
     SELEct (SEGId 4PTI .AND. (RESId 22 .OR. RESID 33 .OR. RESID 35) ) -
     .AND. .NOT.  (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END

! DELETE the necessary regions of the primary sub-system
DELEte ATOM -
     SELEct (SEGId 4PTI .AND. (RESI 22 .OR. RESI 33 .OR. RESI 35)) .AND. .NOT. -
             (type N .or. type CA .or. type C .or. -
              type O .or. type HN  .or. type HA .or. type CB)  END

! set up the correct energy/force scaling.
BLOCK 2
   CALL 2 SELEct SEGId A* END
   COEF 1 2 0.2
   COEF 2 2 0.2
END



File: Replica ]-[ Node: Path
Up: Top -=- Next: Top -=- Previous: Examples



                        Replica Path Method

      The replica/path method allows the postions between sequential
replicas to be restrained.  This allows minimization and simulated
annealing methods to be used to search for transition states.

                                      B. Brooks, NIH, March 1994

This code currently requires exactly one replicated subsystem with at
least three replicas.

Syntax:

RPATh { [ KRMS real ] [ KANGle real ] [ COSMax real ] [MASS] [WEIGht] }
             [ KMAXrms real ] [RMAXrms real ]

      {   OFF                                                         }

    MASS   - Use mass weighting in rms determination.
    WEIGht - Use the main weighting array for weighting the rms vector.
    KRMS   - The rms deviation force constant.
    KANGle - The COS(angle) deviation force constant.
    PCOSmx - The value of COS(theta) below which the vectors are restrained.


Replica/Path energy functions:

   Erms = sum  {  0.5* Krms * ( rms  - <rms> )**2 }      I=1,NREP-1
             I                     I

      where rms  is the weighted rms deviation between replica I and I+1
               I

   Erms = sum  {  0.5* Kmaxrms * ( rms  - Rmaxrms )**2 }      I=1,NREP-1
             I                        I


   Eang = sum  { 0.5* Kang * ( cosmax - cos(theta) )**2 }  (cos(theta)<cosmax) 
             I

        =  0.0   when cos(theta) >= cosmax                  I=1,NREP-2

      where cos(theta) is determined from the dotproduct of weighted deviation
      vectors between replicas I and I+1 and between I+1 and I+2.

An example of use:

! { read sequence and generate segment }
READ SEQU COOR UNIT 11
GENErate PROT 

! { define atom region in which to search for transition state }
DEFINE active sele segid prot .and. resid 15 : 19 end

! { replicate the selected residues 20 times }
REPLIcate A NREPlica 20 SELEct ( active ) END

! { redefine is necessary }
DEFINE active sele segid prot .and. resid 15 : 19 end

! { read product coordinates }                     
OPEN read card unit 12 name products.crd
READ coor card unit 12 COMP
                     
! { read reactant coordinates }                     
OPEN read card unit 12 name reactants.crd
READ coor card unit 12
COOR orient rms mass sele .not. ( active .or segid A* ) end

! { setup initial guess coordinates for all intermediates }

set 1 1
set a 0.0
label loop
  coor duplicate sele active end sele segid A@1 end
  coor duplicate sele active end sele segid A@1 end comp
  coor average fact @a sele segid A@1 end
  incr a by 0.05
  incr 1 by 1
  if @1 .lt. 20.5 goto loop

DELEte ATOM SELEct  active  END
DEFIne replicas SELEct SEGId A* END

! { average the non active reactant and product atoms }
COOR average sele .not. replicas end
COOR copy comp

! { set up an appropriate interaction matrix }
BLOCK 2
  CALL 2 SELEct replicas END
  COEF 1 1 1.0
  COEF 2 2 0.05
  COEF 2 1 0.05
END

! { specify residue 17 as more inportant in the weighting }
SCALAR wmain set 1.0
SCALAR wmain set 4.0 sele replicas .and. resid 17 end

! invoke the path code
RPATH KRMS 100.0 KANGle 100.0 COSMax 0.5 MASS WEIGHT

! { fix the endpoints }
cons fix sele segid a1 .or. segid a20 end

minimize abnr nstep 100
! {.... perhaps simulated annealing using MD ...}

! { plot energy as a function of the path }
open write card unit 20 name energy.dat
set 1 1
label eloop
  BLOCK 2
    CALL 1 sele all end
    CALL 2 sele replicas .and. .not. segid A@1 end
    COEF 1 1 1.0
    COEF 2 1 0.0
    COEF 2 2 0.0
  END
  ENERGY
  write title unit 20
* @1 ?energy  
*
  incr 1 by 1
  if @1 .lt. 20.5 goto loop

.... more analysis ...

STOP


CHARMM .doc Homepage


Information and HTML Formatting Courtesy of:

NIH/DCRT/Laboratory for Structural Biology
FDA/CBER/OVRR Biophysics Laboratory