VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Unix ELF parasites and virus

Silvio Cesare
October 1998

[Back to index] [Comments]


This paper documents the algorithms and implementation of UNIX parasite and virus code using ELF objects. Brief introductions on UNIX virus detection and evading such detection are given. An implementation of the ELF parasite infector for UNIX is provided, and an ELF virus for Linux on x86 architecture is also supplied.

Elementary programming and UNIX knoledge is assumed, and an understanding of Linux x86 archtitecture is assumed for the Linux implementation. ELF understanding is not required but may be of help.

The ELF infection method uses is based on utilizing the page padding on the end of the text segment which provides suitable hosting for parasite code.

This paper does not document any significant virus programming techniques except those that are only applicable to the UNIX environment. Nor does it try to replicate the ELF specifications. The interested reader is advised to read the ELF documentation if this paper is unclear in ELF specifics.

ELF infection

A process image consists of a 'text segment' and a 'data segment'. The text segment is given the memory protection r-x (from this its obvious that self modifying code cannot be used in the text segment). The data segment is given the protection rw-.

The segment as seen from the process image is typically not all in use as memory used by the process rarely lies on a page border (or we can say, not congruent to modulo the page size). Padding completes the segment, and in practice looks like this.

	[...]	A complete page
	M	Memory used in this segment
	P	Padding

Page Nr
#2	[MMMMMMMMMMMMMMMM]		 |- A segment

Segments are not bound to use multiple pages, so a single page segment is quite possible.

Page Nr
#1	[PPPPMMMMMMMMPPPP]		<- A segment

Typically, the data segment directly proceeds the text segment which always starts on a page, but the data segment may not. The memory layout for a process image is thus.

	[...]	A complete page
	T	Text
	D	Data
	P	Padding

Page Nr
#1	[TTTTTTTTTTTTTTTT]		<- Part of the text segment
#2	[TTTTTTTTTTTTTTTT]		<- Part of the text segment
#3	[TTTTTTTTTTTTPPPP]		<- Part of the text segment
#4	[PPPPDDDDDDDDDDDD]		<- Part of the data segment
#5	[DDDDDDDDDDDDDDDD]		<- Part of the data segment
#6	[DDDDDDDDDDDDPPPP]		<- Part of the data segment

pages 1, 2, 3 constitute the text segment
pages 4, 5, 6 constitute the data segment

From here on, the segment diagrams may use single pages for simplicity. eg

Page Nr
#1	[TTTTTTTTTTTTPPPP]		<- The text segment
#2	[PPPPDDDDDDDDPPPP]		<- The data segment

For completeness, on x86, the stack segment is located after the data segment giving the data segment enough room for growth. Thus the stack is located at the top of memory (remembering that it grows down).

In an ELF file, loadable segments are present physically in the file, which completely describe the text and data segments for process image loading. A simplified ELF format for an executable object relevant in this instance is.

	ELF Header
	Segment 1	<- Text
	Segment 2	<- Data

Each segment has a virtual address associated with its starting location. Absolute code that references within each segment is permissible and very probable.

To insert parasite code means that the process image must load it so that the original code and data is still intact. This means, that inserting a parasite requires the memory used in the segments to be increased.

The text segment compromises not only code, but also the ELF headers including such things as dynamic linking information. If the parasite code is to be inserted by extending the text segment backwards and using this extra memory, problems can arise because these ELF headers may have to move in memory and thus cause problems with absolute referencing. It may be possible to keep the text segment as is, and create another segment consisting of the parasite code, however introducing an extra segment is certainly questionable and easy to detect.

Extending the text segment forward or extending the data segment backward will probably overlap the segments. Relocating a segment in memory will cause problems with any code that absolutely references memory.

It may be possible to extend the data segment, however this isn't preferred, as its not UNIX portable that properly implement execute memory protection.

Page padding at segment borders however provides a practical location for parasite code given that its size is able. This space will not interfere with the original segments, requiring no relocation. Following the guidline just given of preferencing the text segment, we can see that the padding at the end of the text segment is a viable solution.

The resulting segments after parasite insertion into text segment padding looks like this.

	[...]	A complete page
	V	Parasite code
	T	Text
	D	Data
	P	Padding

Page Nr
#1	[TTTTTTTTTTTTVVPP]		<- Text segment
#2	[PPPPDDDDDDDDPPPP]		<- Data segment


A more complete ELF executable layout is (ignoring section content - see below).

	ELF Header
	Program header table
	Segment 1
	Segment 2
	Section header table optional	

In practice, this is what is normally seen.

	ELF Header
	Program header table
	Segment 1
	Segment 2
	Section header table
	Section 1
	Section n

Typically, the extra sections (those not associated with a segment) are such things as debugging information, symbol tables etc.

From the ELF specifications:

"An ELF header resides at the beginning and holds a ``road map'' describing the file's organization. Sections hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, and so on.


A program header table, if present, tells the system how to create a process image. Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one. A section header table contains information describing the file's sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, etc. Files used during linking must have a section header table; other object files may or may not have one.


Executable and shared object files statically represent programs. To execute such programs, the system uses the files to create dynamic program representations, or process images. A process image has segments that hold its text, data, stack, and so on. The major sections in this part discuss the following.

Program header. This section complements Part 1, describing object file structures that relate directly to program execution. The primary data structure, a program header table, locates segment images within the file and contains other information necessary to create the memory image for the program."

After insertion of parasite code, the layout of the ELF file will look like this.

	ELF Header
	Program header table
	Segment 1	- The text segment of the host
			- The parasite
	Segment 2
	Section header table
	Section 1
	Section n

Thus the parasite code must be physically inserted into the file, and the text segment extended to see the new code.

An ELF object may also specify an entry point of the program, that is, the virtual memory location that assumes control of the program. Thus to activate parasite code, the program flow must include the new parasite. This can be done by patching the entry point in the ELF object to point (jump) directly to the parasite. It is then the parasite's responsibility that the host code be executed - typically, by transferring control back to the host once the parasite has completed its execution.

From /usr/include/elf.h

typedef struct
  unsigned char e_ident[EI_NIDENT];     /* Magic number and other info */
  Elf32_Half    e_type;                 /* Object file type */
  Elf32_Half    e_machine;              /* Architecture */
  Elf32_Word    e_version;              /* Object file version */
  Elf32_Addr    e_entry;                /* Entry point virtual address */
  Elf32_Off     e_phoff;                /* Program header table file offset */
  Elf32_Off     e_shoff;                /* Section header table file offset */
  Elf32_Word    e_flags;                /* Processor-specific flags */
  Elf32_Half    e_ehsize;               /* ELF header size in bytes */
  Elf32_Half    e_phentsize;            /* Program header table entry size */
  Elf32_Half    e_phnum;                /* Program header table entry count */
  Elf32_Half    e_shentsize;            /* Section header table entry size */
  Elf32_Half    e_shnum;                /* Section header table entry count */
  Elf32_Half    e_shstrndx;             /* Section header string table index */
} Elf32_Ehdr;

e_entry is the entry point of the program given as a virtual address. For knowledge of the memory layout of the process image and the segments that compromise it stored in the ELF object see the Program Header information below.

e_phoff gives use the file offset for the start of the program header table. Thus to read the header table (and the associated loadable segments), you may lseek to that position and read e_phnum*sizeof(Elf32_Pdr) bytes associated with the program header table.

It can also be seen, that the section header table file offset is also given. It was previously mentioned that the section table resides at the end of the file, so after inserting of data at the end of the segment on file, the offset must be updated to reflect the new position.

/* Program segment header.  */

typedef struct
  Elf32_Word    p_type;                 /* Segment type */
  Elf32_Off     p_offset;               /* Segment file offset */
  Elf32_Addr    p_vaddr;                /* Segment virtual address */
  Elf32_Addr    p_paddr;                /* Segment physical address */
  Elf32_Word    p_filesz;               /* Segment size in file */
  Elf32_Word    p_memsz;                /* Segment size in memory */
  Elf32_Word    p_flags;                /* Segment flags */
  Elf32_Word    p_align;                /* Segment alignment */
} Elf32_Phdr;

Loadable program segments (text/data) are identified in a program header by a p_type of PT_LOAD (1). Again as with the e_shoff in the ELF header, the file offset (p_offset) must be updated in later phdr's to reflect their new position in the file.

p_vaddr identifies the virtual address of the start of the segment. As mentioned above regarding the entry point. It is now possible to identify where program flow begins, by using p_vaddr as the base index and calculating the offset to e_entry.

p_filesz and p_memsz are the file sizes and memory sizes respectively that the segment occupies. The use of this scheme of using file and memory sizes, is that where its not necessary to load memory in the process from disk, you may still be able to say that you want the process image to occupy its memory.

The .bss section (see below for section definitions), which is for uninitialized data in the data segment is one such case. It is not desirable that uninitialized data be stored in the file, but the process image must allocated enough memory. The .bss section resides at the end of the segment and any memory size past the end of the file size is assumed to be part of this section.

/* Section header.  */

typedef struct
  Elf32_Word    sh_name;                /* Section name (string tbl index) */
  Elf32_Word    sh_type;                /* Section type */
  Elf32_Word    sh_flags;               /* Section flags */
  Elf32_Addr    sh_addr;                /* Section virtual addr at execution */
  Elf32_Off     sh_offset;              /* Section file offset */
  Elf32_Word    sh_size;                /* Section size in bytes */
  Elf32_Word    sh_link;                /* Link to another section */
  Elf32_Word    sh_info;                /* Additional section information */
  Elf32_Word    sh_addralign;           /* Section alignment */
  Elf32_Word    sh_entsize;             /* Entry size if section holds table */
} Elf32_Shdr;

The sh_offset is the file offset that points to the actual section. The shdr should correlate to the segment its located it. It is highly suspicious if the vaddr of the section is different to what is in from the segments view.

To insert code at the end of the text segment thus leaves us with the following to do so far.

There is one hitch however. Following the ELF specifications, p_vaddr and p_offset in the Phdr must be congruent together, to modulo the page size.

key:	~= is denoting congruency.

	p_vaddr (mod PAGE_SIZE) ~= p_offset (mod PAGE_SIZE)

This means, that any insertion of data at the end of the text segment on the file must be congruent modulo the page size. This does not mean, the text segment must be increased by such a number, only that the physical file be increased so.

This also has an interesting side effect in that often a complete page must be used as padding because the required vaddr isn't available. The following may thus happen.

	[...]	A complete page
	T	Text
	D	Data
	P	Padding

Page Nr
#1	[TTTTTTTTTTTTPPPP]		<- Text segment
#3	[PPPPDDDDDDDDPPPP]		<- Data segment

This can be taken advantage off in that it gives the parasite code more space, such a spare page cannot be guaranteed.

To take into account of the congruency of p_vaddr and p_offset, our algorithm is modified to appear as this.

Now that the process image loads the new code into being, to run the new code before the host code is a simple matter of patching the ELF entry point and the virus jump to host code point.

The new entry point is determined by the text segment v_addr + p_filesz (original) since all that is being done, is the new code is directly prepending the original host segment. For complete infection code then.

This, while perfectly functional, can arouse suspicion because the the new code at the end of the text segment isn't accounted for by any sections. Its an easy matter to associate the entry point with a section however by extending its size, but the last section in the text segment is going to look suspicious. Associating the new code to a section must be done however as programs such as 'strip' use the section header tables and not the program headers. The final algorithm is using this information is.

infect-elf-p is the supplied program (complete with source) that implements the elf infection using text segment padding as described.

Infecting infections

In the parasite described, infecting infections isn't a problem at all. By skipping executables that don't have enough padding for the parasite, this is solved implicitly. Multiple parasites may exist in the host, but their is a limit of how many depending on the size of the parasite code.

Non (not as) trivial parasite code

Parasite code that requires memory access requires the stack to be used manually naturally. No bss section can be used from within the virus code, because it can only use part of the text segment. It is strongly suggested that rodata not be used, in-fact, it is strongly suggested that no location specific data be used at all that resides outside the parasite at infection time.

Thus, if initialized data is to be used, it is best to place it in the text segment, ie at the end of the parasite code - see below on calculating address locations of initialized data that is not known at compile/infection time.

If the heap is to be used, then it will be operating system dependent. In Linux, this is done via the 'brk' syscall.

The use of any shared library calls from within the parasite should be removed, to avoid any linking problems and to maintain a portable parasite in files that use varying libraries. It is thus naturally recommended to avoid using libc.

Most importantly, the parasite code must be relocatable. It is possible to patch the parasite code before inserting it, however the cleanest approach is to write code that doesn't need to be patched.

In x86 Linux, some syscalls require the use of an absolute address pointing to initialized data. This can be made relocatable by using a common trick used in buffer overflow code.

	jmp	A
	pop %eax	; %eax now has the address of the string
	.		; continue as usual

	call B
.string "hello"

By making a call directly proceeding the string of interest, the address of the string is pushed onto the stack as the return address.

Beyond ELF parasites and enter virus in Unix

In a UNIX environment the most probably method for a typical garden variety virus to spread is through infecting files that it has legal permission to do so.

A simple method of locating new files possible to infect, is by scanning the current directory for writable files. This has the advantage of being relatively fast (in comparison to large tree walks) but finds only a small percentage of infect-able files.

Directory searches are however very slow irrespectively, even without large tree walks. If parasite code does not fork, its very quickly noticed what is happening. In the sample virus supplied, only a small random set of files in the current directory are searched.

Forking, as mentioned, easily solves the problem of slowing the startup to the host code, however new processes on the system can be spotted as abnormal if careful observation is used.

The parasite code as mentioned, must be completely written in machine code, this does not however mean that development must be done like this. Development can easily be done in a high level language such as C and then compiled to asm to be used as parasite code.

A bootstrap process can be used for initial infection of the virus into a host program that can then be distributed. That is, the ELF infector code is used, with the virus as the parasite code to be inserted.

The Linux parasite virus

This virus implements the ELF infection described by utilizing the padding at the end of the text segment. In this padding, the virus in its entirety is copied, and the appropriate entry points patched.

At the end of the parasite code, are the instructions.

	movl	%ebp, $XXXX
	jmp	*%ebp

XXXX is patched when the virus replicates to the host entry point. This approach does have the side effect of trashing the ebp register which may or may not be destructive to programs who's entry points depend on ebp being set on entry. In practice, I have not seen this happen (the implemented Linux virus uses the ebp approach), but extensive replicating has not been performed.

On execution of an infected host, the virus will copy the parasite (virus) code contained in itself (the file) into memory.

The virus will then scan randomly (random enough for this instance) through the current directory, looking for ELF files of type ET_EXEC or ET_DYN to infect. It will infect up to Y_INFECT files, and scan up to N_INFECT files in total.

If a file can be infected, ie, its of the correct ELF type, and the padding can sustain the virus, a a modified copy of the file incorporating the virus is made. It then renames the copy to the file its infecting, and thus it is infected.

Due to the rather large size of the virus in comparison to the page size (approx 2.3k) not all files are able to be infected, in fact only near half on average.

Development of the Linux virus

The Linux virus was completely written in C, and strongly based around the ELF infector code. The C code is supplied as elf-p-virus.c The code requires the use of no libraries, and avoids libc by using a similar scheme to the _syscall declarations Linux employs modified not to use errno.

Heap memory was used for dynamic allocation of the phdr and shdr tables using 'brk'.

Linux has some syscalls which require the address of initialized strings to be passed to it, notably, open, rename, and unlink. This requires initialized data storage. As stated before, rodata cannot be used, so this data was placed at the end of the code. Making it relocatable required the use of the above mentioned algorithm of using call to push the address (return value) onto the stack. To assist in the asm conversion, extra variables were declared so to leave room on the stack to store the addresses as in some cases the address was used more than once.

The C code form of the virus allowed for a debugging version which produces verbose output, and allows argv[0] to be given as argv[1]. This is advantageous because you can setup a pseudo infected host which is non replicating. Then run the virus making argv[0] the name of the pseudo infected host. It would replicate the parasite from that host. Thus it was possible to test without having a binary version of a replicating virus.

The C code was converted to asm using the c compiler gcc, with the -S flag to produce assembler. Modifications were made so that use of rodata for initialized data (strings for open, unlink, and rename), was replaced with the relocatable data using the call address methodology.

Most of the registers were saved on virus startup and restored on exit (transference of control to host).

The asm version of the virus, can be improved tremendously in regards to efficiency, which will in turn improve the expected life time and replication of the virus (a smaller virus can infect more objects, where previously the padding would dictate the larger virus couldn't infect it). The asm virus was written with development time the primary concern and hence almost zero time was spent on hand optimization of the code gcc generated from the C version. In actual fact, less than 5 minutes were spent in asm editing - this is indicative that extensive asm specific skills are not required for a non optmised virus.

The edited asm code was compiled (elf-p-virus-egg.c), and then using objdump with the -D flag, the addresses of the parasite start, the required offsets for patching were recorded. The asm was then edited again using the new information. The executeable produced was then patched manually for any bytes needed. elf-text2egg was used to extract hex-codes for the complete length of the parasite code usable in a C program, ala the ELF infector code. The ELF infector was then recompiled using the virus parasite.

# objdump -D elf-p-virus-egg
08048143 <time>:
 8048143:       55              pushl  %ebp
08048793 <main0>:
 8048793:       55              pushl  %ebp
 80487f8:       6a 00           pushl  $0x0
 80487fa:       68 7e 00 00 00  pushl  $0x7e
 80487ff:       56              pushl  %esi
 8048800:       e8 2e fa ff ff  call   8048233 <lseek>
 80489ef:       bd 00 00 00 00  movl   $0x0,%ebp
 80489f4:       ff e5           jmp    *%ebp

080489f6 <dot_jump>:
 80489f6:       e8 50 fe ff ff  call   804884b <dot_call>
 80489fb:       2e 00 e8        addb   %ch,%al

080489fd <tmp_jump>:
 80489fd:       e8 52 f9 ff ff  call   8048354 <tmp_call>
 8048a02:       2e 76 69        jbe    8048a6e <init+0x4e>
 8048a05:       33 32           xorl   (%edx),%esi
 8048a07:       34 2e           xorb   $0x2e,%al
 8048a09:       74 6d           je     8048a78 <init+0x58>
 8048a0b:       70 00           jo     8048a0d <tmp_jump+0x10>

0x8048143 specifies the start of the parasite (time).
0x8048793 is the entry point (main0).
0x80487fb is the lseek offset which is the offset in argv[0] to the parasite.
0x80489f0 is the host entry point.
0x8048a0d is the end of the parasite (not inclusive).

0x8048a0d - 0x8048143 (2250)is the parasite length.
0x8048793 - 0x8048143 (1616) is the entry point as a parasite offset.
0x80487fb - 0x8048143 (1720) is the seek offset as a parasite offset.
0x80489f0 - 0x8048143 (2221) is the host entry point as a parasite offset.

# objdump --all-headers elf-p-virus-egg
Program Header:
    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
         filesz 0x00015960 memsz 0x00015960 flags r-x

The seek offset as a file offset is 0x80487fb - 0x08048000 + 0x00000000 (2043)
(<seek address from above> - <vaddr> + <off>)

To patch the initial seek offset, an infection must be manually performed, and the offset recorded. The infected host is not functional in this form.

# infect-elf-p host
Parasite length: 2251, Host entry point index: 2221, Entry point offset: 1616
Host entry point: 0x8048074
Padding length: 3970
New entry point: 0x80486ce
Parasite file offset: 126
Infection Done
# vpatch elf-p-virus-egg 2043 126

The supplied program elf-egg2text will convert the address range specified on the command line, and found using the ELF loadable segments in the file to a hex string for use in C.

usage: elf-egg2text filename start stop

# elf-egg2text elf-p-virus-egg 0x08048143 0x8048a0d > parasite-v.c

parasite-v.c was edited manually to declare the hex string as the variabled
char parasite[], and likewise these variables were declared.

long hentry = 2221;
long entry = 1616;
int plength = 2250;

The infector was recompiled and thus can infect the host it was compiled for making it a live virus. null-carrier is the supplied host program that the infector is compiled for.

This completed the manual infection of the virus to a host. The newly infected host would then attempt replication on execution. A live virus has been included in the source package (live-virus-be-warned). A simplified carrier program (carrier.S) was used to host the virus (null-carrier is the unfected host as stated).

Improving the Linux virus

The first major change that would increase the life time and replication rates of the virus is to optimise the code to be space efficient. Looking at a 50% size decrease is probably realistic when optimised.

The replication is notable rather slow scanning only the current directory. The virus may be modified to do small tree walks increasing infection rates dramatically.

The virus is easily detected - see below.

Virus detection

The virus described is relatively easy to detect. The blatant oddity is that the entry point of the program isn't in a normal section or not in a section at all.

Typically the last section in the text segment is .rodata which obviously shouldn't be the entry point. Likewise, it is suspicious if a program does not have a corresponding section then this arouses any would be virus scanner. Also if no section table at all, which will disguise what section the entry point is in, is certainly an odd event (even though this is optional).

Removal of the virus described here, is similar to infection, requiring deletion of the virus code, modification of the ELF headers to reflect segment relocation in the file and patching of the entry point to jump to the proper code.

Location of the correct entry point can be easily seen by disassembling the executable using objdump, matching the entry point of the infected file to the disassembled code, and tracing through the code to find where the parasite code returns flow back to the host.

$ objdump --all-headers host		# a parasite infected host

>host:     file format elf32-i386
>architecture: i386, flags 0x00000112:
>start address 0x08048522


The entry point is thus seen as 0x08048522, the entry point of the suspected parasite code.

$ disassemble --disassemble-all host

>host:     file format elf32-i386
>Disassembly of section .interp:
>080480d4 <.interp>:
> 80480d4:       2f              das
> 80480d5:       6c              insb   (%dx),%es:(%edi)


>Disassembly of section .text:
>08048400 <_start>:
> 8048400:       31 ed           xorl   %ebp,%ebp
> 8048402:       85 d2           testl  %edx,%edx
> 8048404:       74 07           je     804840d <_start+0xd>


>Disassembly of section .rodata:
>0804851c <.rodata>:
> 804851c:       48              decl   %eax
> 804851d:       6f              outsl  %ds:(%esi),(%dx)
> 804851e:       73 74           jae    8048594 <_fini+0x94>
> 8048520:       0a 00           orb    (%eax),%al
> 8048522:       b8 00 84 04 08  movl   $0x8048400,%eax
> 8048527:       ff e0           jmp    *%eax
>        ...
>Disassembly of section .data:


Looking at the entry point code, which looks obviously to be parasite code since its residing in the .rodata section, we have.

	movl	$0x8048400,%eax
	jmp	*%eax

This code is easily seen to be jumping to _start, the original host code.

# entry host 0x808400

The parasite code is thus easily removed from program flow by patching the entry point to skip the parasite code.

On occasion no section matches the parasite code and hence the entry point. objdump will only disassemble sections so thus we cant see the parasite code as is. However, gdb can be used to disassemble manually, and the same method of manually finding the host entry point can be used as above.

Automated virus detection of these variety of UNIX virus is practical by detecting missing section headers and/or entry points to non permissible sections or segments.

Typically, the default entry point is _start, however this can be changed in linking. If a virus has been found in a file, and the host entry point is indeterminable for any reason, it may be beneficial to patch the entry point to _start. This however is still guesswork and not totally reliable.

Typical general virus detection algorithms are directly applicable in UNIX, including signature strings, code flagging, file integrity checking etc.

Evading virus detection in ELF infection

The major problem in terms of evading detection with the parasite described, is that the entry point changes to a suspicious position.

Ideally, the entry point of the program either wouldn't change or stay within expected sections.

A possible method using the parasite described would be to find unused memory in normal entry point sections such as the .text section, and insert code to jump to the parasite code. This would require only a small number of bytes, and such empty space is common, as can be noted by looking through disassembly of executables.

Alternatively, one of the original ideas of where to insert the parasite code, thrown away, by extending the text segment backwards may be possible. The parasite code and entry point would belong in the .text section and thus seemingly be quite normal.


The algorithms and implementation presented gives a clear example and proof of concept that UNIX while not popular for, is actually a viable breeding ground for parasites and virus.

[Source unix-linux-pv-src.tgz]

[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka