Paper - Proceedings of the 8th USENIX Security Symposium, August 23-36, 1999, Washington, D.C. [Technical Program]

Pp. 141–152 of the Proceedings

A Study in Using Neural Networks for Anomaly and Misuse Detection

A Study in Using Neural Networks for Anomaly and Misuse Detection¹

Anup K. Ghosh & Aaron Schwartzbard
Reliable Software Technologies
21351 Ridgetop Circle, Suite 400
Dulles, VA 20166
`poc:aghosh@rstcorp.com`
`https://www.rstcorp.com`

Abstract

Current intrusion detection systems lack the ability to generalize from previously observed attacks to detect even slight variations of known attacks. This paper describes new process-based intrusion detection approaches that provide the ability to generalize from previously observed behavior to recognize future unseen behavior. The approach employs artificial neural networks (ANNs), and can be used for both anomaly detection in order to detect novel attacks and misuse detection in order to detect known attacks and even variations of known attacks. These techniques were applied to a large corpus of data collected by Lincoln Labs at MIT for an intrusion detection system evaluation sponsored by the U.S. Defense Advanced Research Projects Agency (DARPA). Results from applying these techniques for both anomaly and misuse detection against the DARPA evaluation data are presented.

1 Introduction

Results from a recent U.S. Defense Advanced Research Projects Agency (DARPA) study highlight the strengths and weaknesses of current research approaches to intrusion detection. The DARPA scientific study is the first of its kind to provide independent third party evaluation of intrusion detection tools against such a large corpus of data. The findings from this study indicate that a fundamental paradigm shift in intrusion detection research is necessary to provide reasonable levels of detection against novel attacks and even variations of known attacks. Central to this goal is the ability to generalize from previously observed behavior to recognize future unseen, but similar behavior. To this end, this paper describes a study in using neural networks for both anomaly detection and misuse detection.

Research in intrusion detection research has begun to shift from analyzing user behavior to analyzing process behavior. Initial work in analyzing process behavior has already shown promising results in providing very high levels of detection against certain classes of attacks. In particular, process-based anomaly detection approaches have shown very good performance against novel attacks that result in unauthorized local access and attacks that result in elevated privileges - a vulnerable area for most intrusion detection tools [6]. In spite of the good detection capability of process-based anomaly detection approaches, the results indicate high rates of false alarms that can make these tools unusable for the practical security administrator. Current wisdom is that false alarm rates must be reduced to the level of one to two false alarms per day in order to make the system usable by administrators.

One of the largest challenges for today's intrusion detection tools is being able to generalize from previously observed behavior (normal or malicious) to recognize similar future behavior. This problem is acute for signature-based misuse detection approaches, but also plagues anomaly detection tools that must be able to recognize future normal behavior that is not identical to past observed behavior, in order to reduce false positive rates.

To address this shortcoming, we utilize a simple neural network that can generalize from past observed behavior to recognize similar future behavior. In the past, we have applied backpropagation networks in addition to other neural networks with good performance to the problem of anomaly detection [8]. Here we present using a neural network for both anomaly and misuse detection. The approach is evaluated against the DARPA intrusion detection evaluation data.

2 Prior Art in Intrusion Detection

Some of the earliest work in intrusion detection was performed by Jim Anderson in the early 1980s [1]. Anderson defines an intrusion as any unauthorized attempt to access, manipulate, modify, or destroy information, or to render a system unreliable or unusable. Intrusion detection attempts to detect these types of activities. In this section we establish the foundations of intrusion detection techniques in order to determine where they are strong and where they need improvement.

2.1 Anomaly detection vs. misuse detection

Intrusion detection techniques are generally classified into two categories: anomaly detection and misuse detection. Anomaly detection assumes that misuse or intrusions are highly correlated to abnormal behavior exhibited by either a user or the system. Anomaly detection approaches must first baseline the normal behavior of the object being monitored, then use deviations from this baseline to detect possible intrusions. The initial impetus for anomaly detection was suggested by Anderson in his 1980 technical report when he noted that intruders can be detected by observing departures from established patterns of use for individual users. Anomaly detection approaches have been implemented in expert systems that use rules for normal behavior to identify possible intrusions [15], in establishing statistical models for user or program profiles [6,4,22,19,16,18,17], and in using machine learning to recognize anomalous user or program behavior [10,5,2,14].

Misuse detection techniques attempt to model attacks on a system as specific patterns, then systematically scan the system for occurrences of these patterns. This process involves a specific encoding of previous behaviors and actions that were deemed intrusive or malicious. The earliest misuse detection methods involved off-line analysis of audit trails normally recorded by host machines. For instance, a security officer would manually inspect audit trail log entries to determine if failed root login attempts were recorded. Manual inspection was quickly replaced by automated analysis tools that would scan these logs based on specific patterns of intrusion. Misuse detection approaches include expert systems [15,3], model-based reasoning [13,7], state transition analysis [23,12,11,21], and keystroke dynamics monitoring [20,13]. Today, the vast majority of commercial and research intrusion detection tools are misuse detection tools that identify attacks based on attack signatures.

It is important to establish the key differences between anomaly detection and misuse detection approaches. The most significant advantage of misuse detection approaches is that known attacks can be detected fairly reliably and with a low false positive rate. Since specific attack sequences are encoded into misuse detection systems, it is very easy to determine exactly which attacks, or possible attacks, the system is currently experiencing. If the log data does not contain the attack signature, no alarm is raised. As a result, the false positive rate can be reduced very close to zero. However, the key drawback of misuse detection approaches is that they cannot detect novel attacks against systems that leave different signatures. So, while the false positive rate can be made extremely low, the rate of missed attacks (false negatives) can be extremely high depending on the ingenuity of the attackers. As a result, misuse detection approaches provide little defense against novel attacks, until they can learn to generalize from known signatures of attacks.

Anomaly detection techniques, on the other hand, directly address the problem of detecting novel attacks against systems. This is possible because anomaly detection techniques do not scan for specific patterns, but instead compare current activities against statistical models of past behavior. Any activity sufficiently deviant from the model will be flagged as anomalous, and hence considered as a possible attack. Furthermore, anomaly detection schemes are based on actual user histories and system data to create its internal models rather than pre-defined patterns. Though anomaly detection approaches are powerful in that they can detect novel attacks, they have their drawbacks as well. For instance, one clear drawback of anomaly detection is its inability to identify the specific type of attack that is occurring. However, probably the most significant disadvantage of anomaly detection approaches is the high rates of false alarm. Because any significant deviation from the baseline can be flagged as an intrusion, non-intrusive behavior that falls outside the normal range will also be labeled as an intrusion - resulting in a false positive. Another drawback of anomaly detection approaches is that if an attack occurs during the training period for establishing the baseline data, then this intrusive behavior will be established as part of the normal baseline. In spite of the potential drawbacks of anomaly detection, having the ability to detect novel attacks makes anomaly detection a requisite if future, unknown, and novel attacks against computer systems are to be detected.

2.2 Assessing the Performance of Current IDSs

In 1998, the U.S. Defense Advanced Research Projects Agency (DARPA) initiated an evaluation of its intrusion detection research projects.² To date, it is the most comprehensive scientific study known for comparing the performance of different intrusion detection systems (IDSs). MIT's Lincoln Laboratory set up a private controlled network environment for generating and distributing sniffed network data and audit data recorded on host machines. Network traffic was synthesized to replicate normal traffic as well as attacks seen on example military installations. Because all the data was generated, the laboratory has a priori knowledge of which data is normal and which is attack data. The simulated network represented thousands of internal Unix hosts and hundreds of users. Network traffic was generated to represent the following types of services: http, smtp, POP3, FTP, IRC, telnet, X, SQL/telnet, DNS, finger, SNMP, and time. This corpus of data is the most comprehensive set known to be generated for the purpose of evaluating intrusion detection systems and represents a significant advancement in the scientific community for independently and scientifically evaluating the performance of any given intrusion detection system.

TCP/IP data was collected using a network sniffer and host machine audit data was collected using Sun Microsystem's Solaris Basic Security Module (BSM). In addition, dumps of the file system from one of the Solaris hosts were provided. This data was distributed to participating project sites in two phases: training data and test data. The training data is data labeled as normal or attack and is used by the participating sites to train their respective intrusion detection systems. Once trained, the test data is distributed to participating sites in unlabeled form. That is, the participating sites do not know a priori which data in the test data is normal or attack. The data is analyzed off-line by the participating sites to determine which sessions are normal and which constitute intrusions. The results were sent back to MIT's Lincoln Labs for evaluation.

The attacks were divided into four categories: denial of service, probing/surveillance, remote to local, and user to root attacks. Denial of service attacks attempt to render a system or service unusable to legitimate users. Probing/surveillance attacks attempt to map out system vulnerabilities and usually serve as a launching point for future attacks. Remote to local attacks attempt to gain local account privilege from a remote and unauthorized account or system. User to root attacks attempt to elevate the privilege of a local user to root (or super user) privilege. There were a total of 114 attacks in 2 weeks of test data including 11 types of DoS attacks, 6 types of probing/surveillance attacks, 14 types of remote to local attacks, 7 types of user to root attacks, and multiple instances of all types of attacks.

The attacks in the test data were also categorized as old versus new and clear versus stealthy. An attack is labeled as old if it appeared in the training data and new if it did not. When an attempt was made to veil an attack, it was labeled as stealthy, otherwise it was labeled as clear.

The reason we present this evaluation study is because we believe it to represent the true state of the art in intrusion detection research. As such, it represents the foundation of more than 10 years of intrusion detection research upon which all future work in intrusion detection should improve. From this study, we can learn the strengths of current intrusion detection approaches, and more importantly, their weaknesses. Rather than identifying which systems performed well and which did not, we simply summarize the results of the overall best combination system.

Lincoln Laboratory reported that if the best performing systems against all four categories of attacks were combined into a single system, then roughly between 60 to 70 percent of the attacks would have been detected with a false positive rate of lower than 0.01%, or lower than 10 false alarms a day. This result summarizes the combination of best systems against all of the attacks simulated in the data. It shows that even in the best case scenario over 30% of the simulated attacks would go by undetected. However, the good news is that the false alarm rate is acceptably low - low enough that the techniques can scale well to large sites with lots of traffic. The bad news is that with over 30% of attacks going undetected in the best combination of current intrusion detection systems, the state of the art in intrusion detection does not adequately address the threat of computer-based attacks.

Further analysis showed that most of the systems reliably detected old attacks that occurred within the training data with low false alarm rates. These results apply primarily to the network-based intrusion detection systems that processed the TCP/IP data. This result is encouraging, but not too surprising since most of the evaluated systems were network-based misuse detection systems. The results were mixed in detecting new attacks. In two categories of attacks, probing/surveillance and user to root attacks, the performance in detecting new attacks was comparable to detecting old attacks. In the other two categories - denial of service and remote to local attacks - the performance of the top three network-based intrusion systems was roughly 20% detection for new denial of service attacks and less than 10% detection for new remote to local attacks. Thus, the results show that the best of today's network-based intrusion detection systems do not detect novel denial of service attacks nor novel remote to local attacks - arguably two of the most concerning types of attacks against computer systems today.

3 Monitoring Process Behavior for Intrusion Detection

In the preceding section, intrusion detection methods were categorized into either misuse detection or anomaly detection approaches. In addition, intrusion detection tools can be further divided into network-based or host-based intrusion detection. The distinction is useful because network-based intrusion detection tools usually process completely different data sets and features than host-based intrusion detection. As a result, the types of attacks that are detected with network-based intrusion detection tools are usually different than host-based intrusion detection tools. Some attacks can be detected by both network-based and host-based IDSs, however, the ``sweet spots'', or the types of attacks each is best at detecting, are usually distinct. As a result, it is difficult to make direct comparisons between the performance of a network-based IDS and a host-based IDS. A useful corollary of distinct sweetspots, though, is that in combination both techniques are more powerful than either one by itself.

Recent research in intrusion detection techniques has shifted the focus from user-based intrusion detection to process-based intrusion detection. Process-based monitoring intrusion detection tools analyze the behavior of executing processes for possible intrusive activity. The premise of process monitoring for intrusion detection is that most computer security violations are made possible by misusing programs. When a program is misused its behavior will differ from its normal usage. Therefore, if the behavior of a program can be adequately captured in a compact representation, then the behavioral features can be used for intrusion detection.

Two possible approaches to monitoring process behavior are: instrumenting programs to capture their internal states or monitoring the operating system to capture external system calls made by a program. The latter option is more attractive in general because it does not require access to source code for instrumentation. As a result, analyzing external system calls can be applied to commercial off the shelf (COTS) software directly. Most modern day operating systems provide built-in instrumentation hooks for capturing a particular process's system calls. On Linux and other variants of Unix, the strace(1) program allows one to observe system calls made by a monitored process as well as their return values. On Sun Microsystem's Solaris operating system, the Basic Security Module (BSM) produces an event record for individual processes. BSM recognizes 243 built-in system signals that can be made by a process. Thus, on Unix systems, there is good built-in support for tracing processes' externally observable behavior. Windows NT currently lacks a built-in auditing facility that provides such fine-grain resolution of program behavior.

Most process-based intrusion detection tools are based on anomaly detection. A normal profile for program behavior is built during the training phase of the IDS by capturing the program's system calls during normal usage. During the detection phase, the profile of system calls captured during on-line usage is compared against the normal profile. If a significant deviation from the normal profile is noted, then an intrusion flag is raised.

Early work in process monitoring was pioneered by Stephanie Forrest's research group out of the University of New Mexico. This group uses the analogy of the human immune system to develop intrusion detection models for computer programs. As in the human immune system, the problem of anomaly detection can be characterized as the problem of distinguishing between self and dangerous non-self [6]. Thus, the intrusion detection system needs to build an adequate profile of self behavior in order to detect dangerous behavior such as attacks. Using strace(1) on Linux, the UNM group analyzed short sequences of system calls made by programs to the operating system [6].

More recently, a similar approach was employed by the authors in analyzing BSM data provided under the DARPA 1998 Intrusion Detection Evaluation program [9]. The study compiled normal behavior profiles for approximately 150 programs. The profile for each program is stored in a table that consists of short sequences of system calls. During on-line testing, short sequences of system calls captured by the BSM auditing facility are looked up in the table. This approach is known as equality matching. That is, if an exact match of the sequence of system calls captured during on-line testing exists in the program's table, then the behavior is considered normal. Otherwise an anomaly counter is incremented.

The data is partitioned into fixed-size windows in order to exploit a property of attacks that tends to leave its signature in temporally co-located events. That is, attacks tend to cause anomalous behavior to be recorded in groups. Thus, rather than averaging the number of anomalous events recorded over the entire execution trace (which might wash out an attack in the noise), a much smaller size window of events is used for counting anomalous events.

Several counters are kept at varying levels of granularity from a counter for each fixed window of system calls to a counter for the number of windows that are anomalous. Thresholds are applied at each level to determine at which point anomalous behavior is propagated up to the next level. Ultimately, if enough windows of system calls in a program are deemed anomalous, the program behavior during a particular session is deemed anomalous, and an intrusion detection flag is raised.

The results from the study showed a high rate of detection, if not a low false positive rate [9]. Despite the simplicity of the approach and the high levels of detection, there are two main drawbacks to the equality matching approach: (1) large tables of program behavior must be built for each program, and (2) the equality matching approach does not have the ability to recognize behavior that is similar, but not identical to past behavior. The first problem becomes an issue of storage requirements for program behavior profiles and is also a function of the number of programs that must be monitored. The second problem results from the inability of the algorithm to generalize from past observed behavior. The problem is that behavior that is normal, yet slightly different from past recorded behavior, will be recorded as anomalous. As a result, the false positive rate could be artificially elevated. Instead, it is desirable to be able to recognize behaviors that are similar to normal, but not necessarily identical to past normal behavior as normal. Likewise, the same can be said for a misuse detection system. Many misuse detection systems are trained to recognize attacks based on exact signatures. As a result, slight variations among a given attack can result in missed detections, leading to a lower detection rate. It is desirable for misuse detection systems to be able to generalize from past observed attacks to recognize future attacks that are similar.

To this end, the research described in the rest of the paper employs neural networks to generalize from previously observed behavior. We develop an anomaly detection system that uses neural networks to learn normal behavior for programs. The trained network is then used to detect possibly intrusive behavior by identifying significant anomalies. Similarly, we developed a misuse detection system to learn the behavior of programs under attack scenarios. This system is then used to detect future attacks against the system. The goal of these approaches is to be able to recognize known attacks and detect novel attacks in the future. By using the associative connections of the network, we can generalize from past observed behavior to recognize future similar behavior. A comparison of the two systems against the DARPA intrusion data is provided in Section 5.

4 Using Neural Networks for Intrusion Detection

Applying machine learning to intrusion detection has been developed elsewhere as well [5,2,14]. Lane and Brodley's work uses machine learning to distinguish between normal and anomalous behavior. However, their work is different from ours in that they build user profiles based on sequences of each individual's normal user commands and attempt to detect intruders based on deviations from the established user profile. Similarly, Endler's work [5] used neural networks to learn the behavior of users based on BSM events recorded from user actions. Rather than building profiles on a per-user basis, our work builds profiles of software behavior and attempts to distinguish between normal software behavior and malicious software behavior. The advantages of our approach are that vagaries of individual behavior are abstracted because program behavior rather than individual usage is studied. This can be of benefit for defeating a user who slowly changes his or her behavior to foil a user profiling system. It can also protect the privacy interests of users from a surveillance system that monitors a user's every move.

The goal in using artificial neural networks (ANNs) for intrusion detection is to be able to generalize from incomplete data and to be able to classify online data as being normal or intrusive. An artificial neural network is composed of simple processing units, or nodes, and connections between them. The connection between any two units has some weight, which is used to determine how much one unit will affect the other. A subset of the units of the network acts as input nodes, and another subset acts as output nodes. By assigning a value, or activation, to each input node, and allowing the activations to propagate through the network, a neural network performs a functional mapping from one set of values (assigned to the input nodes) to another set of values (retrieved from the output nodes). The mapping itself is stored in the weights of the network.

In this work, a classical feed-forward multi-layer perceptron network was implemented: a backpropagation neural network. The backpropagation network has been used successfully in other intrusion detection studies [10,2]. The backpropagation network, or backprop, is a standard feedforward network. Input is submitted to the network and the activations for each level of neurons are cascaded forward.

Our previous research in intrusion detection with BSM data used an equality matching technique to look up currently observed program behavior that had been previously stored in a table. While the results were encouraging, we also realized that the equality matching approach had no possibility of generalizing from previously observed behavior. As a result, we are pursuing research in using artificial neural networks to accomplish the same goals, albeit with better performance. Specifically, we are interested in the capability of ANNs to generalize from past observed behavior to detect novel attacks against systems. To this end, we constructed two different ANNs: one for anomaly detection and one for misuse detection.

To use the backprop networks, we had to address five major issues: how to encode the data for input to the network, what network topology should be used, how to train the networks, how to perform anomaly detection with a supervised training algorithm, and what to do with the data produced by the neural network.

Encoding the data to be used with the neural network is in general, a difficult problem. Previous experiments indicated that strings of six consecutive BSM events carried enough implicit information to be accurately distinguished as anomalous or normal for programs in general. One possible encoding technique was simply to enumerate all observed strings of six BSM events, and use the enumeration as an encoding. However, part of the motivation of using neural nets was their ability to classify novel inputs based on similarity to known inputs. A simple enumeration will fail to capture information about the strings. Therefore, a neural net will be less likely to be able to correctly classify novel inputs. In order to capture the necessary information in the encoding, we devised a distance metric for strings of events. The distance metric took into account the events common to two strings, as well as the difference in positions of common events. To encode a string of data, the distance metric was used to measure the distance from the data string to each of several ``exemplar'' strings. The encoding then consisted of a set of measured distances. A string could then be thought of as a point in a space where each dimension corresponded to one of the exemplar strings, and the point is mapped in the space by plotting the distance from each dimension.

Once an appropriate encoding method was developed, an appropriate network topology must be employed. We had to determine how many input and output nodes were necessary, and if a hidden layer was to be used, how many nodes should it contain. Because we seek to determine whether an input string is anomalous or normal, we use a single continuously valued output node to represent the extent to which the network believes the input is normal or anomalous. The more anomalous the input is, the closer to 1.0 the network computes its output. Conversely, the closer to normal the input is, the closer to 0.0, the output node computes.

The number of input nodes has to be equal to the number of exemplar strings (since each exemplar produced a distance for input to the network). With an input layer, a hidden layer, and an output layer, a neural network can be constructed to compute any arbitrarily complex function. Thus, a single hidden layer was used in our networks. A different network must be constructed, tuned, and trained for each program to be monitored, since what might have been quite normal behavior for one program might have been extremely rare in another. The number of hidden nodes varied based on the performance of each trained network.

During training, many networks were trained for each program, and the network that performed the best was selected. The remaining networks were discarded. Training involved exposing the networks to four weeks of labeled data, and performing the backprop algorithm to adjust weights. An epoch of training consisted of one pass over the training data. For each network, the training proceeded until the total error made during an epoch stopped decreasing, or 1,000 epochs had been reached. Since the optimal number of hidden nodes for a program was not known before training, for each program, networks were trained with 10, 15, 20, 25, 30, 35, 40, 50, and 60 hidden nodes. Before training, network weights were initialized randomly. However, initial weights can have a large, but unpredictable, effect on the performance of a trained network. In order to avoid poor performance due to bad initial weights, for each program, for each number of hidden nodes, 10 networks were initialized differently, and trained. Therefore, for each program, 90 networks were trained. To select which of the 90 to keep, each was tested on two weeks of data that were not part of the four weeks of data used for training. The network that classified data most accurately was kept.

4.1 Anomaly detection

In order to train the networks, it is necessary to expose them to normal data and anomalous data. Randomly generated data was used to train the network to distinguish between normal and anomalous data. The randomly generated data, which were spread throughout the input space, caused the network to generalize that all data were anomalous by default. The normal data, which tended to be localized in the input space, caused the network to recognize a particular area of the input space as non-anomalous.

After training and selection, a set of neural networks was ready to be used. However, a neural network can only classify a single string (a sequence of BSM events) as anomalous or normal, and our intention was to classify entire sessions (which are usually composed of executions of multiple programs) as anomalous or normal. Furthermore, our previous experiments showed that it is important to capture the temporal locality of anomalous events in order to recognize intrusive behavior. As a result, we desired an algorithm that provides some memory of recent events.

The leaky bucket algorithm fits this purpose well. The leaky bucket algorithm keeps a memory of recent events by incrementing a counter of the neural network's output, while slowly leaking its value. Thus, as the network computes many anomalies, the leaky bucket algorithm will quickly accumulate a large value in its counter. Similarly, as the network computes a normal output, the bucket will ``leak'' away its anomaly counter back down to zero. As a result, the leaky bucket emphasizes anomalies that are closely temporally co-located and diminishes the values of those that are sparsely located.

Strings of BSM events are passed to a neural network in the order they occurred during program execution. The output of a neural network (that is, the classification of the input string) is then placed into a leaky bucket. During each timestep, the level of the bucket is decreased by a fixed amount. If the level in the bucket rises above some threshold at any point during execution of the program, the program is flagged as anomalous. The advantage of the using a leaky bucket algorithm is that it allows occasional anomalous behavior, which is to be expected during normal system operation, but it is quite sensitive to large numbers of temporally co-located anomalies, which one would expect if a program were really being misused. If a session contains a single anomalous execution of a program, the session is flagged as anomalous.

4.2 Misuse detection

Having developed a system for anomaly detection, we chose to evaluate how well the same techniques could be applied to misuse detection. Our system is designed to recognize some type of behavior. Thus, it should not matter whether the behavior it is learning was normal system usage, or attack behavior. Aside from trivial changes to the way the leaky bucket is monitored, our system should not require any modification to perform misuse detection. Having made the trivial modification to the leaky bucket, we tested our system as a misuse detector.

Figure 1: Anomaly detection results for two different leak rates.

Unfortunately, two issues particular to our data-set made misuse detection difficult. The first issue was a lack of data. In the DARPA data, there was between two to three orders in magnitude less intrusion data than normal data. This made it quite difficult to train networks to learn what constituted an attack. The second issue was related to the labeling of intrusions. Intrusion data were labeled on a session-by-session basis. Whereas several programs might be executed during an intrusive session, as few as one might be anomalous. Thus, while all data labeled non-intrusive could be assumed to be normal, not all data labeled intrusive could be assumed to be anomalous. Despite these stumbling blocks, we configured our neural network system for misuse detection.

5 Experimental Results

The anomaly and misuse detection systems were tested on the same test data. The test data consisted of 139 non-intrusive sessions, and 22 intrusive sessions. Although it would have been preferable to use a larger number of intrusive sessions for testing, there were so few intrusive sessions in the DARPA data that all other intrusion data were used to train the misuse detection system.

The performance of any intrusion detection system must account for both the detection ability and the false positive rate. We observed both of these factors while varying the leak rate used by the leaky bucket algorithm. A leak rate of 0 results in all prior timesteps being retained in memory. A leak rate of 1 results in all timesteps but the current one being forgotten. We varied the leak rate from 0 to 1.

The performance of the IDS should by judged in terms of both the ability to detect intrusions, and by false positives-incorrect classification of secure behavior as insecure. We used receiver operating characteristic (ROC) curves to compare intrusion detection ability to false positives. A ROC curve is a parametric plot, where the parameter is the sensitivity of the system to what it perceives to be insecure behavior. The curve is a plot of the likelihood that an intrusion is detected, against the likelihood that a non-intrusion is misclassified for a particular parameter, such as a threshold. The ROC curve can be used to determine the performance of the system for any possible operating point. The ROC curve allows the end user of an intrusion detection system to assess the trade-off between detection ability and false alarm rate in order to properly tune the system for acceptable tolerances.

Figure 2: Misuse detection results for two different leak rates.

Different leak rates produced different ROC curves. Figure 1 displays two ROC curves-one for a low leak rate, and one for a high leak rate. For the leak rate of .2, to achieve detection better than 77.3%, one must be willing to accept a dramatic increase in false positives. At 77.3% detection, the false positive rate is only 3.6%. When the leak rate is .7, a detection rate of 77.3% can be achieved with a false positive rate of only 2.2%.

ROC curves were also produced for the performance of our misuse detection system. While the performance was not nearly as good as the anomaly detection system in terms of false positives (which was a high as 5% for even low sensitivity rates), the misuse detection system displayed very high detection abilities-especially surprising due to the small number of sessions used to train the system. As illustrated in Figure 2, with a leak rate of 0.7, the system was able to detect as much as 90.9% of all intrusions with a false positive rate of 18.7%. Other host-based misuse detection systems can currently provide similar detection capabilities with lower false positive rates. Thus, this approach to misuse detection may not be suitable for detecting attacks in comparison to signature-based approaches. However, our technique demonstrated the ability of the system to detect novel attacks by generalizing from previously observed behavior.

While that false positive rate is clearly unacceptable, it should be remembered that the misuse detection system was trained on data which contained not only intrusion data, but also normal data. This would naturally lead this system to produce a large amount of false positives. By eliminating the non-intrusion data from the training data, it is believed that significantly lower false positive rates could be achieved, without lowering the detection ability.

The results of our experiments indicate that neural networks are suited to perform intrusion detection and can generalize from previously observed behavior. Currently, the false positive rates are too high to be practical for commercial users. In order to be a useful tool, false positive rates need to be between one and three orders of magnitude smaller. We continue to investigate how to improve the performance of our neural networks. Tests of variations on our techniques indicate that we have not yet achieved optimal performance.

6 Conclusions

This paper began with an examination of current intrusion detection systems. In particular, the DARPA 1998 intrusion detection evaluation study found that novel attacks against systems are rarely detected by most IDSs that use signatures for detection. On the other hand, well-known attacks for which signatures from network data or host-based intrusion data can be formed, perform very reliably with low rates of false alarms. However, even slight variations of known attacks escape detection by signature-based IDSs. Similarly, program-based anomaly detection systems have performed very well in detecting novel attacks, albeit with high false alarm rates. To overcome the problems in current misuse detection and anomaly detection approaches, a key necessity for IDSs is the ability to generalize from previously observed behavior to recognize future similar behavior. This capbility will permit detection of variations of known attacks as well as reduce false positive rates for anomaly-based IDSs.

In this paper, we presented the application of a simple neural network to learning previously observed behavior in order to detect future intrusions against systems. The results from our study show the viability of our approach for detecting intrusions. Future work will apply other neural networks more suited toward the problem domain of analyzing temporal characteristics of program traces. For instance, applying recurrent, time delay neural networks to program-based anomaly detection has proved to be more successful than using backpropagation networks for the same purpose [8]. Our next step is to apply these networks to misuse detection as well.

References

[1]: J.P. Anderson. Computer security threat monitoring and surveillance. Technical Report Technical Report, James P. Anderson Co., Fort Washington, PA, April 1980.
[2]: J. Cannady. Artificial neural networks for misuse detection. In Proceedings of the 1998 National Information Systems Security Conference (NISSC'98), pages 443-456, October 5-8 1998. Arlington, VA.
[3]: W.W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.
[4]: P. D'haeseleer, S. Forrest, and P. Helman. An immunological approach to change detection: Algorithms, analysis and implications. In IEEE Symposium on Security and Privacy, 1996.
[5]: D. Endler. Intrusion detection: Applying machine learning to solaris audit data. In Proceedings of the 1998 Annual Computer Security Applications Conference (ACSAC'98), pages 268-279, Los Alamitos, CA, December 1998. IEEE Computer Society, IEEE Computer Society Press. Scottsdale, AZ.
[6]: S. Forrest, S.A. Hofmeyr, and A. Somayaji. Computer immunology. Communications of the ACM, 40(10):88-96, October 1997.
[7]: T.D. Garvey and T.F. Lunt. Model-based intrusion detection. In Proceedings of the 14th National Computer Security Conference, October 1991.
[8]: A.K. Ghosh, A. Schwartzbard, and M. Schatz. Learning program behavior profiles for intrusion detection. In Proceedings of the 1st USENIX Workshop on Intrusion Detection and Network Monitoring. USENIX Association, April 11-12 1999. To appear.
[9]: A.K. Ghosh, A. Schwartzbard, and M. Schatz. Using program behavior profiles for intrusion detection. In Proceedings of the SANS Intrusion Detection Workshop, February 1999. To appear.
[10]: A.K. Ghosh, J. Wanken, and F. Charron. Detecting anomalous and unknown intrusions against programs. In Proceedings of the 1998 Annual Computer Security Applications Conference (ACSAC'98), December 1998.
[11]: K. Ilgun. Ustat: A real-time intrusion detection system for unix. Master's thesis, Computer Science Dept, UCSB, July 1992.
[12]: K. Ilgun, R.A. Kemmerer, and P.A. Porras. State transition analysis: A rule-based intrusion detection system. IEEE Transactions on Software Engineering, 21(3), March 1995.
[13]: S. Kumar and E.H. Spafford. A pattern matching model for misuse intrusion detection. The COAST Project, Purdue University, 1996.
[14]: T. Lane and C.E. Brodley. An application of machine learning to anomaly detection. In Proceedings of the 20th National Information Systems Security Conference, pages 366-377, October 1997.
[15]: W. Lee, S. Stolfo, and P.K. Chan. Learning patterns from unix process execution traces for intrusion detection. In Proceedings of AAAI97 Workshop on AI Methods in Fraud and Risk Management, 1997.
[16]: T.F. Lunt. Ides: an intelligent system for detecting intruders. In Proceedings of the Symposium: Computer Security, Threat and Countermeasures, November 1990. Rome, Italy.
[17]: T.F. Lunt. A survey of intrusion detection techniques. Computers and Security, 12:405-418, 1993.
[18]: T.F. Lunt and R. Jagannathan. A prototype real-time intrusion-detection system. In Proceedings of the 1988 IEEE Symposium on Security and Privacy, April 1988.
[19]: T.F. Lunt, A. Tamaru, F. Gilham, R. Jagannthan, C. Jalali, H.S. Javitz, A. Valdos, P.G. Neumann, and T.D. Garvey. A real-time intrusion-detection expert system (ides). Technical Report, Computer Science Laboratory, SRI Internationnal, February 1992.
[20]: F. Monrose and A. Rubin. Authentication via keystroke dynamics. In 4th ACM Conference on Computer and Communications Security, April 1997.
[21]: P.A. Porras and R.A. Kemmerer. Penetration state transition analysis - a rule-based intrusion detection approach. In Eighth Annual Computer Security Applications Conference, pages 220-229. IEEE Computer Society Press, November 1992.
[22]: P.A. Porras and P.G. Neumann. Emerald: Event monitoring enabling responses to anomalous live disturbances. In Proceedings of the 20th National Information Systems Security Conference, pages 353-365, October 1997.
[23]: G. Vigna and R.A. Kemmerer. Netstat: A network-based intrusion detection approach. In Proceedings of the 1998 Annual Computer Security Applications Conference (ACSAC'98), pages 25-34, Los Alamitos, CA, December 1998. IEEE Computer Society, IEEE Computer Society Press. Scottsdale, AZ.

Footnotes:

¹ This work was funded by the Defense Advanced Research Projects Agency (DARPA) under Contract DAAH01-97-C-R095. the views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the defense advanced research projects agency or the u.s. government.

² See www.ll.mit.edu/IST/ideval/index.html for a summary of the program.

File translated from T_EX by T_TH, version 2.00.
On 12 Jul 1999, 13:23.

This paper was originally published in the Proceedings of the 8th USENIX Security Symposium, August 23-36, 1999, Washington, D.C., USA
Last changed: 26 Feb 2002 ml