A paper is divided into two parts. The first part generates a certain number of
parallel processes, which usually must be at least as 
many as lines to copy. The second part is the body of the paper, the code which 
replicates itself (see paper #1 below).
1.1 The Functional Principle of the Self-Replication:
First let's have a look to the following 4-line paper:
;name paper #1
0000 start  spl     1                     ;
0001        spl     1                     ;generate 4 parallel processes
0002 silk1  spl.a   @0,     100           ;split  
0003        mov.i   }silk1, >silk1        ;copy    ---> front-end silk
0004        mov.i   {silk1, <silk2        ;copy
0005 silk2  jmp     @0,     >50           ;jump    ---> back-end silk
The first two lines generate 4 processes for the 4-line paper, which executing 
line 0002. They split at the a-field address i.e. the 
address pointed by b-field zero locations away, the b-field of the line they 
are executing, 100 locations away. When all four processes executed this line we
have four other processes ready to execute line silk1+100, there is nothing
to execute here but the eight processes executes now in alternating order (see 
illustration in Chapter 1.2). Line 0003 move what's pointed by the a-field of 
line 0002 to the location pointed by b-field of line 0002 then they increment 
both a and b field of line 0002. This means, that the first of the eight process 
moves line 0002 100 cells away from line 0002 and leaves line 0002 changed such a 
way:
0002 silk1  spl.a    @1,    101
The second process executes the line silk1+100 which isn't empty anymore, while 
the third process executes again line 0003. This copies now line 0003 101 cells away 
from silk1, just after the previous line. Process 5 and 7 do the same thing by 
copying line 0004 and 0005 to silk1+102 and silk1+103.
The same happens with line 0004 and 0005 but in reversed order. It copies the paper
beginning with line 0005, because the mov in line 0004 points to the a field of 
silk1, which looks after predecremening as follow:
0002 silk1  spl.a    @3,    103
After the fourth executions of the mov in line 0004 the complete paper is copied.
The following jmp in line 0005 now starts these copy at silk1+50.
1.2 The Order of Execution:
The following Scheme illustrates the order of execution for paper #1:
original          first copy            second copy           third copy
silk1             silk1+100             silk1+200             silk1+300
spl  1,2,3,4      spl  6,8,10,12        spl  15,18,21,24      spl  28,32,36,40 
mov  5,7,9,11     mov  14,17,20,23      mov  27,31,35,39      mov
mov  13,16,19,22  mov  26,30,34,38      mov                   mov
jmp  25,29,33,37  jmp                   jmp                   jmp
Excluding the three executions of the two spl-instructions to generate 4 parallel
processes it takes 37 executions until the original paper has accomplished his
first self-replication. 
1.3 Some Math behind:
The following formula (valid only for paper #1. Other papers needs other
formulas) shows the number of executions (T) needed to accomplish the first 
self-replication of the original paper using P parallel processes:
T = (P^3+P^2)/2+1-P
For example, the calculation for a paper using 5 parallel processes would give:
T = 71
The results for different amounts of parallel processes are shown below (P=2,3
are shown just for completeness):
        parallel processes         number of executions
                2                          5
                3                         16
                4                         37
                5                         71
                6                        121 
                7                        190
                8                        281
                9                        397
               10                        541
As you can see, the number of executions to accomplish the self-replication 
significantly increases by increasing number of parallel processes.
Further spl/mov's leads to a further increase of value for T.
For example a paper containing spl/mov/spl/mov the formula would look 
something like:
T = (Summ [n=2...P] n x P) + (Summ [n=1...P-3] n x P) + 1
1.4 Front-End Silk Variations:
The first silk in a paper has some special importance because it copies the hole paper. Further silks should copy only a part of the paper.
Standard silk
The most common used front-end silk is definately:
silkA  spl   @0,     <pStepA
       mov.i }silkA, >silkA
It simply copies the complete paper as discussed in Chapter 1.1. The b-field can be further used to in/decrement the locations before copying.
Extended silk
If a further mov is added we obtain another quite often used front-end silk:
silkB  spl   @0,     <pStepB
       mov.i }silkB, >silkB
       mov.i }silkB, >silkB
This is a very interesting kind of front-end silk, because we need only the half of paralell processes to copy the complete paper. For example an 8-line paper can completly copied with just 4 paralell processes. Also in this case the b-field can be further used to in/decrement the locations before copying.
Reverse style silk
A very uncommon method in the meantime is mov before split. Nevertheless, it could be of advantage to lay a dat
in front of the paper.
front  dat   #0,      #pStepC
       mov.i }front,  >front
silkC  spl   pStepC-1, <CorecolorC
The dissadvantage is that it needs three instructions and
it copies before splitting. But nevertheless it copies the complete paper. Also the b-field can be used to in/decrement further locations.
1.5 Center Silk Variations:
There are just two variations which can be found in all successfull paper.
Standard silk
The most common center silk is:
silkD  spl   @0,     <pStepD
       mov.i }silkD, >silkD
It copies the paper beginning at silkD. The b-field can be used to in/decrement the locations before copying:
Non-in/decrementing silk
This variation is less common but can be found for example in Reepicheeps paper:
silkE  spl   pStepE,  0
       mov.i >silkE, }silkE
It copies the paper beginning at silkB but don't in/decrement before copying.
There are additionally three different possibilities to copy the hole paper again. This makes the paper more vulnearble against scanner. That's the reason why they are only rarely used: 
Standard silk
silkF  spl   pStepF, {silkA
       mov.i }silkA, }silkF
It copies the complete paper, but needs an already executed silkA. The b-field is needed to reset the a-field of silkA before mov.
in/decrementing silk
silkG  spl   pStepG, <CorecolorG
       mov.i }silkA, }silkG
It copies the complete paper, but needs an already resetted silkA. The b-field can be 
used to in/decrement further locations.
Reversed-style paper
       mov.i {silkA, {silkH
silkH  spl   pStepH, <CorecolorH
It copies the complete paper, but needs an already executed silkA. The b-field can be 
used to in/decrement further locations.
1.6 Back-End Silk Variations:
Standard silk
Usually the following silk is used:
       mov.i {silkD, <silkJ
silkJ  jmp   @0,     pStepJ
or
       mov.i {silkD, {silkK
silkK  jmp   pStepK, <CorecolorK
It copy the paper between silkD and the last line of the paper, but needs an already executed silkD. If silkD 
is the first silk then the complete paper will be copied.The b-field in silkK can be 
used to in/decrement further locations.
The jmp can be also replaced by a djn instruction like:
       mov.i {silkD, {silkL
silkL  djn.f @0,<pStepL
This will additionally decrement (self-modify) the b-field by the number of parallel processes in the silkD line of the copy.
Much more interesting is the use of the following silk:
       mov.i {silkD, {silkM
silkM  jmz.a @0,pStepM
The jmz validates that the copy is still intact/unchanged by checking for a zero a-field in the silkD line of the copy. This prevents the paper launching already wiped copies, which gives a better resistance against scanner wipes.
Stone-like looping
This is a quite uncommon but interesting way.
       add.ab #pStepN, silkD
       jmp    fsilkD,  {fsilkD
It adds a value to pStepD in the silkD line and loop back to it, while the b-field of the jmp resets the silkD line
for its further use. This means that every copy will go on in copying and launching further copies on new places in the core.