# Some Further Theoretical Results about Computer Viruses

Zhihong Zuo, Mingtian Zhou
The Computer Journal, Vol. 47, No. 6
ISSN 0010-4620
2004

3

School of Computer Science and Engineering,
University of Electronic Science and Technology of China,
Chengdu, P.R. China
Email: [email protected]

In this paper we give some general definitions of computer viruses which comply with our common understanding of computer viruses. Based on these definitions, we prove theoretically that there may exist some special kinds of computer viruses that have not been found in the real world yet. Furthermore, we prove that the set of computer viruses with the same kernel is ∏ 2-complete. In general the set of computer viruses is Σ 3-complete.

## 1. Introduction

The first abstract theory of computer viruses is the viral set theory given by Cohen, based on the Turing machine [1, 2]. A viral set is defined by where is a Turing machine and is a non-empty set of programs on Turing machine. Each is called a computer virus and satisfies the following condition: if it is contained in the tape at time , then there exist a time and a such that is contained in the tape at time . The most important of Cohen's theorems is about the undecidability of computer viruses [1].

In a different approach, Adleman developed an abstract theory of computer viruses based on recursive functions [3]. In his definition a virus is a total recursive function which applies to all programs (the Gödel numberings of programs), such that has some characteristic behaviors of computer viruses like injury, infection and imitation. Furthermore, Adleman proved that the set of computer viruses is -complete [3].

In the past 10 years, the number and variety of computer viruses have greatly increased, and many of them may not be conveniently and directly described by Cohen's or Adleman's theory. For example, we can list resident viruses, polymorphic viruses, stealthy viruses and so on. Hence some other models and theories of computer viruses have been established [4]. But these models do not comply with the common understanding of computer viruses.

The structure of this paper is as follows: the next section introduces preliminaries and recursive-function-based definitions for several kinds of computer viruses are presented in Section 3. In Section 4, we derive some important results about computer viruses based on the definitions. In the last section, we discuss the limitations of our work and give some comments and suggestions for further research.

## 2. Preliminaries

We briefly present here basic notation used in this paper.

Let be the set of all natural numbers and be the set of all finite sequences of natural numbers. For every , denotes a computable injective function from to with computable inverse. If is a partial function , then for , denotes . Similarly, for every , denotes a computable injective function from to with computable inverse such that and means for partial function . Furthermore, we write when is defined and when is undefined.

For a sequence , we use to denote the sequence which is the same as except that replaced by , i.e. . If the element in sequence is operated by a computable function , namely , for conciseness of notation, it will always be written in this paper as where the underlined symbol denotes the element being operated. If there are more than one element in that have been replaced by other values or operated by computable functions, we write them as or , respectively.

The symbol denotes a function computed by a computer program with the running environment where and mean data (including clock, spaces of diskettes and so on) and programs (including operating systems), respectively. If the Gödel numbering of is , the function is commonly written as . Its domain and range are written as and , respectively.

S-m-n theorem, universal theorem and recursion theorem [5] are the main tools used in our development of the abstract theory of computer viruses.

## 3. Definitions of some kinds of computer viruses

In this section, we give definitions of some kinds of computer viruses, including not only some common kinds of computer viruses, e.g. non-resident viruses and resident viruses, but also some special kinds of computer viruses, e.g. polymorphic viruses with infinite forms which have not been found till now in the real world.

Definition 3.1. (Non-resident virus) A total recursive function is called a non-resident virus if for all ,

1. and are two recursive predicates and no satisfies them simultaneously; and are two recursive functions;
2. the set is an infinite set.

The two predicates, and , are called injury condition (trigger) and infection condition, respectively. When condition is satisfied, the virus executes the injury function , and when condition is met, the virus chooses a program using the selection function , infects it first, and then executes the original program. These two conditions and two functions, called the kernel of a non-resident virus, determine a non-resident virus uniquely. In what follows, unless stated otherwise, the kernel of a computer virus always denotes the set of mathematical objects (functions and predicates) which theoretically determine a computer virus uniquely.

In clause (a) of the above definition, three branches (i), (ii) and (iii) describe three typical behaviors of computer viruses, injury, infection and imitation, respectively.

In the following definitions of other kinds of computer viruses, we no longer list (b) and (c) as in the definition above, and always regard them as satisfied by each kind of computer virus. It should be noted that the kernels of different kinds of computer viruses are different in general.

Definition 3.2. (Resident virus) The pair of a total recursive function and a system call (also a recursive function) is called a resident virus with respect to the system call if for all ,

Resident viruses always employ some system calls or endless execution processes (e.g. under Unix) to reside in the memory and modify some system calls (or some user processes) by which they infect other programs. For example, resident viruses in the DOS environment often modify system calls such as int 21h and int 13h to reach their objects.

Definition 3.3. (polymorphic virus with two forms) The pair of two different total recursive functions and is called a polymorphic virus with two forms if for all ,

Polymorphic viruses with forms can be defined as sequences of different total recursive functions which satisfy similar conditions as the above definition. Polymorphic viruses have become transplanted over last 10 years and detectinf them involves considerable difficulty. Commonly they have billions of forms and no two forms have the same consecutive three bytes in general. However, they are not the most difficult viruses to detect. In what follows, we define the polymorphic viruses with infinite forms, and prove their existence theoretically in the next section.

Definition 3.4. (Polymorphic virus with infinite forms) A total recursive function is called a polymorphic virus with infinite forms if for all ,

and for all , .

Polymorphic viruses with infinite forms have not actually been found so far, but their existence, similar to common computer viruses, is guaranteed by the same mathematical theorem (recursion theorem).

Definition 3.5. (Stealthy virus) The pair of a total recursive function and a system call is called a stealthy virus with respect to the system call if there is a recursive function such that for all ,

The crucial distinction between stealthy viruses and common viruses is that stealthy viruses not only infect programs as common viruses do, but also modify some system calls such that, when someone or the computer system uses these system calls to check programs, the infected program appears the same as if it were not infected.

Definition 3.6. (Combinatorial virus) The pair of two total recursive functions and is called a combinatorial virus if for all ,

where the two total recursive functions and are called the activation function and the hiding function, respectively.

The combinatorial virus with hiding functions can be defined similarly to the above. The extraordinary feature of combinatorial viruses is that they imitate totally original programs and do not make any injury and infection unless their activation functions apply to them.

## 4. Some theorems about computer viruses

In this section, we use some symbols listed as follows.

Symbols 1.

If the superscript 'fixed' is attached to a symbol above, then it denotes the set of one kind of viruses which have a fixed kernel, e.g. denotes the set of all resident viruses that have a fixed kernel. If its superscript is a name of a virus, e.g., , then it denotes the set of all viruses which have the same kernel as the Jerusalem virus.

Theorem 4.1. There exist polymorphic viruses with infinite forms.

Proof. Let be a 1-1 total recursive function such that

Applying s-m-n theorem to , there exists a total recursive function such that

By the recursion theorem, there is an such that . Let , so

Notice that if , then for all and all such that holds, i.e.

Theorem 4.2. There exist combinatorial viruses.

Proof. Let be a recursive function such that . Applying the s-m-n theorem, there exists a total recursive function such that

By the recursion theorem, as in Theorem 4.1, there exists a total recursive function such that

Let , substituting by in the above equation, it follows that

In fact, we proved that there exists a combinatorial virus for any padding or compressing program in Theorem 4.2.

The proofs of existence of other viruses are similar to the proof above, thus we omit them here. One thing necessary to know is that proofs of existence of resident viruses, polymorphic viruses and stealthy viruses will make use of the double (or multi-) recursion theorem.

Theorem 4.3. The set is a -complete set.

Proof. Suppose recursive predicates and are the injury condition and the infection condition, and recursive functions and are the injury function and the selection function, respectively. Then, since

and '' is a -predicate, and '' is a -predicate, so '' is a -predicate.

To prove is a -complete set, without loss of generality, we suppose that viruses do not make any injury and their infection condition is .

Let be an -set, then there exists a recursive predicate such that .

Consider the function

By Church's thesis, it is a recursive function: suppose is the procedure computing the characteristic function of predicate . For a given , first test the recursive predicate ; if it is true, then compute ; if it is false, then compute . If for each there exist an in this sequence such that , then stop the computing procedure to compute ; otherwise, the procedure cannot stop, so function .

Applying the s-m-n theorem to , there exists a recursive function such that . By the recursion theorem with parameters, there exists a recursive function such that , i.e.

Thus, if , then

On the other hand, assume , then . Since, for each , there exist infinite numbers of such that , it follows that there exists a large enough such that , hence for all , . Assume is a total recursive function, then for all , is defined at each , hence .

According to the above discussion, it follows that

i.e. is a -complete set.

In the following proof, we shall use the lemma below.

Lemma 4.1. For a given set , if its complement is an infinite set, then there exists a recursive function such that for all -set ,

Proof. Let in the proof of theorem 3.4 in chapter IV in [6].

Theorem 4.4. is a -complete set.

Proof. By the definition of , we can establish the following logical equivalence,

where , and correspond to and otherwise in the Definition 3.1 of non-resident viruses respectively, and recursive functions and denote and , respectively.

Since '' and ' are -predicates, '' is a -predicate, '' is a -predicate, and '' is a -predicate, so is a -predicate.

Let be an infinite recursive set whose complement is also infinite. Let and be the selection function. Consider the function

where is the function in Lemma 4.1. By Church's thesis, function is a recursive function. Applying the s-m-n theorem to it, there exist a recursive function such that . By the recursion theorem with parameters, there exists a recursive function such that , thus

Let be a -set, it follows from Lemma 4.1 and the above equation that

On the other hand,

Thus, it follows that is a -complete set.

Theorems 4.3-4.5 mean that detecting viruses is quite intractable in the following sense: the degree of undecidability of the set of one kind of computer viruses which have same kernel, is 2 (e.g. ); the degree of undecidability of the set of one kind of all computer viruses (e.g. ) is 3. Furthermore, in the proof of Theorem 4.4, if , then the recursive function is not a virus by our definition of viruses, because it has at least one branch in which condition is not a recursive predicate. Thus, the corollary below is obtained.

Corollary 4.1. If the set of all computer viruses is a -set, then it is a -complete set.

For a given virus , let the set of all programs infected by be [3]. If is a recursive set, then there exists a procedure deciding whether or not a particular program is infected by virus . Whenever a program becomes infected by , it can be detected by this procedure and removed from the computer, i.e. the virus can be isolated from the computer environment. Set is a recursively enumerable set certainly, but does not have to be a recursive set. The next theorem gives an instantial computer virus such that is not a recursive set. In other words, for this virus , the strongest detecting procedure for it can just pick up every program infected by , but cannot find all programs not infected by .

Before giving the proof of the next theorem, we first state a lemma that will be used in the proof.

Lemma 4.2. There exists a total recursive function such that and is a -complete set.

Proof. Let be an increasing padding function, i.e. for all and , (as in the proof of Theorem 4 of [3]). Let be a total recursive function such that , consider the function

It is clear that . Let , since is a 1-1 function it follows that

i.e. . Thus, is a -complete set.

Theorem 4.6. There exists a computer virus such that is a -complete set.

Proof. Let be the function satisfying conditions in Lemma 4.2. Applying the s-m-n theorem, let be the 1-1 total recursive function such that

By recursion theorem, there exists a function such that

Substituting by in the above equation, it follows that

Let , then is a non-resident virus. Since is a 1-1 function, so is not a recursive set. Otherwise, since

it follows that is a recursive set and contradicts Lemma 4.2.

The method proving Lemma 4.2 comes from Adleman's paper [3] and the method proving Theorem 4.6 can be used to prove that there exists any type of computer virus such that is a -complete set.

## 5. Discussion

Definition 3.1 of non-resident viruses, which we take as the typical definition of computer viruses, describes the characteristic behaviors of computer viruses. But clauses in the definition seem too restrictive, especially clauses (b) and (c), which deserve some further discussion here.

The clause (b) in Definition 3.1 requires that and are recursive predicates. It means that a computer virus must make injury and infection in some definite conditions (random activations of some computer viruses can be thought as definite functions of mathematically). Computer viruses found up to now satisfy this restriction, but there may be computer viruses whose injury and infection conditions are semi-recursive predicates. This restrictive condition is necessary in the proof of Theorem 4.4, thus, if we can prove the theorem without it, then it can be removed from clause (b) in all definitions of computer viruses.

Clause (c) in Definition 3.1, which requires that an infected program imitate the original program at infinite points, is a quite strong condition (even though most computer viruses in the real world satisfy this condition) and excludes several rogue programs. The proof of Theorem 4.3 makes use of it. In fact, Theorem 4.3 can be proved in a weaker condition that the infected program has infinite points satisfying branches (ii) and (iii) in clause (a), namely, the set is an infinite set. It is unknown whether Theorem 4.3 can be proved without clause (c) or not.

Another important fact that needs to be known in definitions of computer viruses is that there are two different ways of infection. One is infecting programs first and then executing the original program, i.e. , and another is executing the original programs, i.e. . We adopt the former in our definitions of computer viruses and there is no essential difference if we use the latter.

Corollary 4.1 looks strange at first glance because it was proved in Adleman's paper [3] that the set of all computer viruses is -complete. This variation comes from the difference between our definitions and Adlemsn's definitions of computer viruses. It is known from the proof of Theorem 4.4 that the reason that is -set is that there exist recursive sets , and which satisfy some conditions, but this requirement is not needed in Adleman's definitions of computer viruses.

We only define some familiar and interesting kinds of computer viruses in Section 3, however, other kinds of computer viruses can be defined easily in this way (sometimes, slight modification is needed). We give illustrative definitions of some other kinds of computer viruses as follows.

Definition 5.1. (Non-resident overwriting virus) A total recursive function is called a non-resident overwriting virus if for all ,

A non-resident overwriting virus does not imitate the original program. When injury condition is met, it causes damage; otherwise, it selects a program and overwrites it by itself, In this sense, non-resident overwriting viruses still satisfy the requirement in clause (b) in Definition 3.1 that no satisfies and simultaneously.

Definition 5.2. (Interconvertible virus) The pair of two different total recursive functions and is called an interconvertible virus if for all ,

where (resp. ) is different from (resp. ).

An interconvertible virus looks like it consists of two different computer viruses and . But, there is a crucial distinction that when infects a program, the result is infected by (not ), and vice versa. The interconvertible virus can be defined similarly. The main dissimilarity between interconvertible viruses and polymorphic viruses is that each form of a polymorphic virus has the same kernel, but each component of an interconvertible virus has its own different kernel.

Definition 5.3. (Compositive virus) The pair of two different computer viruses and is called a compositive virus if and only if is a computer virus too, namely for all , and satisfy the following equations respectively,

where (resp. ), (resp. ) and (resp. ) are different from each other, namely, viruses and have different kernels.

Compositive viruses resemble combinatorial viruses in the sense that they both create a new virus when their components meet in the computer system. The difference between them is that the components of a compositive virus are viruses themselves whilst the components of a combinatorial virus are not.

Definition 5.4. (Multipartite virus) A total recursive function is called a multipartite virus if for all ,

where and denote master boot record and a system call, respectively.

Multipartite viruses use a combination of techniques including infecting documents, executables and boot sectors to infect computers. The above definition is that of a typical multipartite virus. Each of its infected programs infects MBR (under condition ) or a selected program (under condition ). The infected MBR employs a system call to reside in memory when booting and then infect other programs. Multipartite viruses are examples of computer viruses which have more than one spreading mode and can be described similarly.

We may define almost all kinds of computer viruses in this way, however, there are some special kinds of computer viruses that are still difficult to describe. For example, there may be a computer virus that does nothing but modify a .C file on computer. After the user compiles and links the .C source file, the infected program is workable. Extending our definitions of computer viruses so as to describe all kinds of computer viruses is one of our further research works.

## Acknowledgements

Prof. Qingxin Zhu read the whole paper and helped us with the English for which the authors are gratefull. The referees made useful comments that improved the presentation of the paper enormously.

## References

1. Cohen, F. (1989) Computational aspects of computer viruses. Comput. Security, 8(4), 325-244.
2. Cohen, F. (1994) A short Course on Computer Viruses. Wiley.
3. Adleman, L. M. (1988) An Abstract theory of computer viruses. In Goldwasser, S. (ed.), Advances in Cryptology (CRYPTO'88), LNCS 403, pp. 354-374. Springer-Verlag, Berlin.
4. Thimbleby, H., Anderson, S. and Cairns, P. (1999) A framework for modelling trojans and computer virus infection. Comput. J., 41, 444-458.
5. Rogers, H., Jr (1967) Theory of Recursive Functions and Effective Computability. McGraw-Hill.
6. Soare, R.I. (1987) Recursively Enumerable Sets and Degrees. Springer-Verlag.