Monday, November 21, 2005

Issues with pattern matching in Network Intrusion Detection Systems

Other than Pattern matching “algorithm” decision, there are a lot of other issues that also needs to considered before choosing any one of them. Of course, fast matching is the natural need for the decision but there are some other issues to be kept in mind like fighting false positives example in some cases it is possible that payload contains a pattern for buffer overflow attack via telnet application protocol but what if there was no active telnet session between two hosts. Then, other issue can be what if pattern is split over multiple packets? Some of issues with respect to choice of algorithm and limitations of signature matching has been stated below.

  1. Memory vs Speed
  2. Signature format
  3. Session-Based and Application Level Signature Matching
  4. State Holding issues in-cases of pattern extending over multiple pkts
  5. Packet Fragmentation Issues.
  6. Getting packet dumps or testing data set? (other than attack tools and DARPA set.)

While one always needs to compromise between memory requirements vs speed available. As we can see in the existing algorithms itself, Aho/Corasick provides O(1) time pattern matching but requires quite large memory for the storage of the state machine. While the other string matching algorithms such as Boyer-Moore can lead to O(mn) time requirements in cases of algorithmic attacks. One must need to payoff one depending upon the constraints.

Most of the IDS’s except a few use the byte or character based string as the patterns presentation format. While this is also needed as the most common algorithms used are Boyer-Moore, KMP etc. But if State-machine matching is being deployed then regular expression can provide a better pattern which can be more informative and will be more unique to the attack it is identifying. Other than these, most of the Snort rules do contains multiple patterns with different offset and depth values which can be very well expressed in single regular expression with the usage of basic regex patterns like . and * etc. [1] provides some examples also. Also, Bro contains patterns in regex (regular expression) format itself.

Then, [7] also discusses about the statesful packet matching where IDS stores the information about the context of the traffic between two peers providing more efficient pattern matching results but the overheads involved are the massive because of the information that needs to be stored specific to content of the traffic for large amount of the flows. While over this, one can also provide application level pattern matching to provide even better results.

One of the most important issues with IDS systems is the state holding issue which can be explained as the amount of the information that needs to be stored for each flow flowing
through it. Incase of pattern matching over individual packets, this is not of much concern since this does not even comes into picture. But with the invent of attack packet split over multiple packets, pattern matching has gone to name packet stream matching since now packet needs to be matched over multiple packets, demanding more memory for storing information about session flows and packets flowing, the partially matched patterns, other flow specific data structures etc. Although there is Snort preprocessor for counter-attack to this issue namely Stream4, but these issues are with this plugin also. For how much time, does the information needs to stored before dropping the information, (it should not be the case that IDS declares timeout and drops the session information while the destination host still keeps waiting, or vice-versa). Then, what is the number of maximum sessions that can be stored, since information that needs to be stored can vary from flow to flow.

Continuing the above discussion, issue of fragmented packets [2],[3], [4], [5] even complicate the situation more. Since, some of new issues comes into picture like

  1. Out-of-order arrival of TCP segments
  2. Re-transmitted segments
  3. Overlapping TCP packets hence issues with reassembly
  4. Missing of fragments in between or losing the state of the connection while connection is still alive?
  5. How much data should be buffered (TCP window)
  6. Varying TTL of the fragments for evasion of NIDS. If the NIDS believes a packet was received when in fact it did not reach the end-system, then its model of the end-system's protocol state will be incorrect. If the attacker can find ways to systematically ensure that some packets will be received and some not, the attacker may be able to evade the NIDS.

While Authors in [6] has examined the character and effects of fragmented IP traffic as monitored on highly aggregated Internet links. They had shown the amount of fragmented packets in normal internet traffic and their characterizations, classifications as per the statistics, protocol and application layer. They show that the amount of “fragmented packet” traffic at internet links is less than 1% but there are two cases first they are talking at internet level with good connection speeds and secondly, but what if traffic is fragmented attack specific. These issues pops up some new questions other than existing ones like because different operating systems have unique methods of fragment reassembly, if an intrusion detection system uses a single “one size fits all” reassembly method, it may not reassemble and process the packets the same way the destination host does. An attack that successfully exploits these differences in fragment reassembly can cause the IDS to miss the malicious traffic and fail to alert. While much of these have been solved in existing tools heuristically. The above mentioned papers themselves have discussed few of them. Snort even contains a preprocessor plugin i.e. Frag2 for most of these issues with some assumptions like if next few fragments doesnot arrives in next 30 seconds, it will be dropped, then one can/needs to specify the end hostsystem OS so that specific reassembly is done for that session. Some tools even use bifurcating analysis [5], what it means is if the NIDS does not know which of two possible interpretations the end-system may apply to incoming packets, then it splits its analysis context for that connection into multiple threads, one for each possible interpretation, and analyzes each context separately from then onwards. Some other methodologies has also been discussed in the same paper.

Then, one of the major issue we have come across is the testing of existing approaches. While there exists MIT DARPA Datasets but there are two issues with them, firstly they contain very few attacks and secondly they are of 1998-99 period and since that attack technologies has advanced a lot. Even the attack tools are too specific for producing individual attacks rather a generic traffic in-between including attack packets. While recently,[7] has designed a new tool for IDS testing namely AGENT which takes other than producing ”pattern strings”, also generate other type of traffic like the ones has been described in [5], but then always its also synthetic.

[1] Sommer, R., Paxson, V.: Enhancing Byte-Level Network Intrusion Detection Signatures with Context. In: Proceedings of the 10th ACM conference on Computer and Communication Security, Washington, DC (2003) 262-271.
[2] C. A. Kent and J. C. Mogul. Fragmentation Considered Harmful. Computer Communications Review — Proceedings of SIGCOMM’87, 17(5):390–401, August 1987.
[3] Thomas H. Ptacek and Timothy N. Newsham. Insertion, evasion, and denial of service: Eluding network intrusion detection, January 1998.
[4] Judy Novak, Target-Based Fragmentation Reassembly, WhitePaper from Sourcefire Inc., April 2005.
[5] Mark Handley, Vern Paxson, and Christian Kreibich. Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics. In Proceedings of USENIX Security Symposium, August 2001.
[6] Colleen Shannon, David Moore, K. Claffy. Characteristics of fragmented IP traffic on internet links. Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, 2001.
[7] Shai Rubin, Somesh Jha, Barton P. Miller: Automatic Generation and Analysis of NIDS Attacks. ACSAC 2004: 28-38



Blogger SanjayR said...

Hi Nakul:
It is a nice posting. you touched many issues related to IDS implementaion. I have doubts on few things. I see a contradiction when you say that matching speed is major thing and on the same time, sugeest the use of regular expression. In my opinion, regular expression are very costly to work with, specially the operators like * and +. Aho, Boyer Moore and simple DFA can produce results faster if there are 2-3 patterns with AND or OR operators, i.e. it should be faster to call BM twice to search two patterns than using PCRE for regexp. I will explore more on this and write on my blog. Another thing is state information. As you mentioned, Snort has keyword "flow bit" to take care of this. My experience with vulnerability research says that one need not to buffer payload/state infoamtion for a long time as attacking pattern should be visible in few initial packets.

10:52 PM  

Post a Comment

<< Home