pr0mix
Electrical Ordered Freedom #3
August 2011
The main goal of garbage instructions - a hiding/protection of useful code (from av'ers, a watchful eye reverser and other curious). However, the "wrong" trash can lead to detection of viral code, thereby undermining all our efforts.
This text is about how to improve the quality of the generated garbage.
Assume that the file is checked, infected with our virus. Antivirus can operate so:
To bypass the emulator, will help us a working anti-emulation, from the signatures - the TrashGen, which (as it turns out xD) uses our virus. But if created trash-code will weak, then we'll detected by heuristics.
So, for creation of a better trash-code, in the beginning I suggest to choose the compiler, whose generated code we will "imitate": ms, borland etc. Once selected compilers (eg, ms), you can still determine which mode of generation / optimization, we will imitate ("min size"/"max speed"). Since under different modes the code is generated in another way. For example, for ms-compiler, instruction EAX = 1 in mode "max speed" will be as follows:
And for "min size"
Trashgen can generate both of these options, but it looks more realistic if you use one tactic (do not know what perversion will be new versions of the heuristics).
Further, in addition to various features that you have created in a trashgen, it must also be able to generate "realistic" code (similar to the usual code of standard programs written in HLL), namely:
But even such a trash-code generated taking into account given points, can easily be caught by heuristics.
The main problem is that the garbage-code - it's just trash, useless set of instructions. This is the reason anger heuristics. And if so, then the trash should be helpful. For this we need to implement 2 more tasks:
The first point in the general case is implemented is quite simple: generate trash, runs it, and after trash working off the received result is used in a useful code. For example, have generated the following code:
After his execution ECX = 5. And this value can be added to key for decrypt the virus code (many of use cases). However, the generated code trash may be as follows:
After his execution ECX also = 5. But instructions 2 and 3 - garbage, for what will be punished by heuristics. The solution consists in creation a "logical trash".
The idea is in that a garbage-code to make logical, like logic of code of usual programs. Normal code first initializes the parameters (registers, local variables etc); then executes instructions, which uses and/or otherwise affect these parameters. And instructions are a single whole - carry out the general problem, and among them aren't present superfluous, no "garbage". No re-initialization, is not used uninitialized parameters. All instructions related to each other, each of which affects the subsequent course of execution of a code.
Around this logic I created in new version of my engine xTG (v2.0.0), which works as follows:
________ | | | begin | |________| | | | | +--------------------------------------------+ | | | | | | | ________________________ | ________________________ | | | | | | | | | | module: generation of | | | module: logic | | | | instructions | | | | | | | | | | | | | | | | | | | | | ---------------------- | | | ---------------------- | | +-->| generation |<-----+ +--->| parser | | +------>| of | | | | | | | "right" | instr addr | | ---------------------- | | | | instruction |-----------------+ | emulator | | | | | | | | | | ---------------------- | | ---------------------- | 0 | | | updating of address |<----------------+ | analyzer of |---+ | 1 | to generate a new | | 1 | instructions | +-------| instruction | 0 +----| (logic) | | |------+ | | | ---------------------- | | | ---------------------- | |________________________| | |________________________| | | | ________ | | | | | end |<-----------------------------+ |________|
More details:
The parser determines which instruction in front of him, and gets its parameters (operands: registers, addresses, etc); then it retains in some structure these param's and exposes some flags. The completed structure will be used by code-analyzer (see below).
Also, for example, if we found instructions mov ecx, dword ptr [403008h] etc, then parser will replace address 403008h on other, the corresponding address in allocated memory for correct emulation of instruction;
Emulator receives the address of instruction, prepares a special environment, copies of this environment instruction, and emulates. Moreover, emulation can be at least 3 types: run of instructions in special environment, full imitation of execution of the instruction & the combination of these two methods (1-st method is suitable for most instructions). The result of emulation (current values of instruction param's etc) are stored in variables: the virtual registers etc.
Btw, emulation - an excellent technology for viruses, with which help it is can be to create very interesting things (for UEP, VM, "logical trash" tech, morphing and other);
The analyzer, on basis of data from the parser (filled structure) and the emulator, decides whether, fits instruction on logic or not.
Analysis of instruction takes place in 2 stages:
First, the analyzer must know, what are the param's and how to check them. For this he uses flags, passed from parser. A set of flags can be:
These flags can describe almost all instructions (some flags can be removed). If instruction contains more than 2 parameters, then other parameters are stored in a separate field.
Further, the flags determines what and how to check: checks on possibility of initialization params, change their values, to use them in other instructions, and more. The result of each check is stored in the mask's. Their 2: regs_init & regs_used. Roughly speaking, this 2 dword, where each bit corresponds to a particular parameter (for example, some reg (EAX etc)). Moreover, the bits in regs_init indicate, whether it is possible to initialize parameter or not (protection against re-initialization). And on bits in regs_used we know, whether it is possible to use param in instructions or not.
So, if the first stage is passed, it means that the parameters suitable. Continue.
State (of param) - is a stored value that takes parameter. States of all parameters are stored in the table of states, which is a buffer of a certain size.
So, the analyzer takes current value of parameter that we got through emulation (and saved, for example, in the virtual register), and compares it with all the accumulated states of this parameter.
If a match is found, then instruction - garbage, check isn't passed. In this case, from table of states we take the last saved state of this param and make it current (stored it state in virtual reg); and also restore the mask's to previous value. If no match is found, then it is a new state of param, instruction passed the check. Add this value to table of states;
In Example 1.1 the first 2 commands are normal, and the third - junk. Reg EDI can be initialized, but it will take the same value, which is now - an unnecessary initialization. Example 2.1 (and later, all the others) shows the correct way to initialize.
In Example 1.2 the first 3 instructions "right", and the fourth - junk. Regs EAX & ECX again take the values that have already taken (check of states parameters).
In Example 1.3 the first 2 instructions "right", and the third - junk. EDI += 0 (check of states);
In Example 1.4 the first 3 instructions "right", and the fourth - junk. Reg EDX can't be initialized, if before, EDX didn't affect on value of another parameter (check of params);
In Example 1.5 the first 4 instructions "right", and the fifth - junk. After the execution of the first 4 instructions, states of EAX would be: 1000, 999. And after the execution fifth instructions EAX = 1000. This state has been (check states parameters).
Realization of technique "logical trash", examples of generation of a thrash-code, and also fuller understanding of an idea - all this you find in sources xTG v2.0.0.
I should say that in xTG most better quality of logic turns out at generation of a linear thrash-code without winapi-functions. In other cases, the logic will be fuzzy, but will be: on branching and with winapi. This is associated with a logic-module - it is more powerful than, the better the result.
If not satisfied with the logic of any instructions, just set the other flags.
Also, the variant of use of logic for already created code is possible. However the required code can be on 100% different from original.
Using this technique, our trash becomes useful and logical, which makes it quite effectively bypass the heuristic. And the only complex use of techniques will realize our desires into reality. Yo0!