29.11.10

Packet Payloads, Encryption and Bacon

I received an email from a private mailing list recently, asking for some help in reviewing the contents of a packet capture file:

“I have a 2.5 GB pcap file which I want to verify that it contains only encrypted content. […] I’m wondering if anyone knows of a way that I can accomplish this using Windump or some other Windows utility.”


This kind of analysis happens frequently when performing a black-box pentest against a protocol.  Over the years I’ve used a couple of techniques to evaluate the content of packet captures to determine if the traffic is encrypted or just obfuscated.

Strings and Other Fun Unix Tools

The strings utility can be used to evaluate the contents of a packet capture to pull out any content that is within the ASCII character set.  First, for a packet capture I know to contain unencrypted content:

root@bt:~# strings capture1.dump | more
,7G9
I\3S
{#_3#\
HnL2`
o:|?
4"D
Ke6
RO*t
xtB,}X
981fh)
5JcA
84+
(P;u
'n]Ii
--More--
This content isn't particularly illustrative, since these strings aren't ASCII words or other content that would immediately identify the presence of unencrypted traffic. We can modify the strings command to only show longer strings by adding the "-n" argument:
root@bt:~# strings -n 8 capture1.dump | more
:Azf`^Y/
UOV;uqbe-
BRW00234DDA2223
BRW00234DDA2223
BRW00234DDA2223
BRW00234DDA2223
BRW00234DDA2223
BRW00234DDA2223
jwright-x300
MSFT 5.0
jwright-x300
MSFT 5.0
Og{7n_S!
jwright-x300
--More--

This is more interesting, allowing us to easily recognize plaintext strings representing hostnames and Windows client traffic. It's not always this easy though; consider the case of obfuscated traffic, even something as simple as XOR with a fixed value. This would obscure the presence of plaintext strings, but not actually be encrypted.


Packet Payload Histogram


We have a golden rule with encryption: encrypted content should be indistinguishable from random content.  This is an attribute we can use to visually assess the bytes of a packet payload.  A packet payload histogram tool reads through a packet capture and counts how frequently each byte of packet payload occurs.  My pcaphistogram tool identifies TCP and UDP packets, counting the payload data and creating a gnuplot-compatible script to graph the results.  Lets look at an example using the capturte1.dump packet capture:

root@bt:~# perl pcaphistogram.pl capture1.dump | gnuplot
Skipping non-ip packet.
Unknown IP Protocol: 50. Skipping packet.
Skipping non-ip packet.
Unknown IP Protocol: 80. Skipping packet.
root@bt:~# ls -l capture1.png
-rw-r--r-- 1 root root 2685 Nov 29 12:48 capture1.png
root@bt:~#

The capture1.png file is displayed below.


capture1


The “+” signs map out the frequency of each byte of the packet capture with frequency on the Y-axis and the byte values themselves (in hex) on the X-axis.  We can see a cluster of various byte values around 0x61, which coincides with the lowercase ASCII character set.  With this content, we can easily determine that the packet capture is not encrypted.


By comparison, let’s look at an encrypted packet capture example:


capture2


Here the byte values are very narrowly distributed, with very little variation in frequency from one byte to the next.  This pattern is unlike the prior unencrypted pattern, but it is still difficult to ascertain if this content is encrypted.  We know the content appears random, which is an attribute of encrypted traffic, so it is likely to be encrypted, but it could also be just random traffic.


A third example is shown below.


capture3


This is another unencrypted packet capture, but the content is obfuscated using a static XOR key.  No ASCII strings would be present in the packet capture, but from the packet content variance we can ascertain that this data is not encrypted.


The pcaphistogram tool has served me well for many assessments, but lately I’ve been favoring a different approach.



Entropy Analysis with Ent


Ent is a simple yet very useful tool that reads from a specified file and performs several tests to evaluate the content of the data including:



  • Entropy: The information density of the file content in a number of bits per byte.
  • Chi-Square-Test: Measures the percentage of data randomness.
  • Arithmetic Mean: Calculates the mean of the values in the file divided by the file length.
  • Monte Carlo Value for Pi: Uses successive sequences of 24-bit X and Y coordinates to graph a square and measure the radius of an inscribed circle attempting to calculate the value of Pi.  Random content will produce a product very close to the actual value of Pi.
  • Serial Correlation Coefficient: Measures the extent where each byte in the file depends on the previous byte.

Ent is not installed by default on Backtrack and unlikely to be present by default in other distributions.  On Backtrack, you can install Ent by running “apt-get install ent”.


Using Ent is very simple.  For example, dumping data from the Linux /dev/urandom device and evaluating it with Ent displays the following content:

root@bt:~# dd if=/dev/urandom bs=16384 count=1000 of=rand.bin
1000+0 records in
1000+0 records out
16384000 bytes (16 MB) copied, 6.28625 s, 2.6 MB/s
root@bt:~# ent rand.bin
Entropy = 7.999989 bits per byte.

Optimum compression would reduce the size
of this 16384000 byte file by 0 percent.

Chi square distribution for 16384000 samples is 245.34, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4926 (127.5 = random).
Monte Carlo value for Pi is 3.141788853 (error 0.01 percent).
Serial correlation coefficient is -0.000268 (totally uncorrelated = 0.0).


Ent reports that the entropy for this file is more than 7.999 bits per byte, which is very random.  A second example reading from a dictionary file:

root@bt:~# ent /pentest/passwords/wordlists/bt4-password.txt
Entropy = 5.083091 bits per byte.

Optimum compression would reduce the size
of this 32952523 byte file by 36 percent.

Chi square distribution for 32952523 samples is 318231422.65, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 86.6330 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.180351 (totally uncorrelated = 0.0).


This time we have a very different result, indicating significantly less entropy.  We can use this same technique to review the content of a packet capture to solve the problem posed at the beginning of this article.  First we need to extract the payload data out of the packet capture; this is straightforward with Scapy:

root@bt:~# scapy
INFO: Can't import PyX. Won't be able to use psdump() or pdfdump().
INFO: No IPv6 support in kernel
WARNING: No route found for IPv6 destination :: (no default route?)
Welcome to Scapy (2.1.0)
>>> fp = open("payloads.dat","wb")
>>> def handler(packet):
... fp.write(str(packet.payload.payload.payload))
...
>>> sniff(offline="capture1.dump",prn=handler,filter="tcp or udp")

Here we created an output file "payloads.dat" and used the Scapy sniff() function to read from "capture1.dump", using the Berkeley Packet Filter syntax to process only TCP or UDP traffic, calling the function "handler" for each packet in the capture file.  The handler() function simply writes a string representation of the packet.payload.payload.payload (that's the original packet, Ethernet -> IP -> TCP (or UDP) payload) to the "payloads.dat" file.  Next we can assess the payloads.dat file with Ent:

root@bt:~# ent payloads.dat
Entropy = 5.216871 bits per byte.

Optimum compression would reduce the size
of this 2461696 byte file by 34 percent.

Chi square distribution for 2461696 samples is 24219040.94, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 91.6014 (127.5 = random).
Monte Carlo value for Pi is 3.977878630 (error 26.62 percent).
Serial correlation coefficient is 0.187293 (totally uncorrelated = 0.0).
Ent is telling us the content is not encrypted in each of the tests; not even close. Let's look at capture2.dump:
>>> fp = open("payloads.dat","wb")
>>> sniff(offline="capture2.dump",prn=handler,filter="tcp")
<Sniffed: TCP:3840 UDP:8 ICMP:0 Other:0>

Repeating the Ent analysis with the results from the 2nd packet capture reveals the following:

root@bt:~# ent payloads.dat
Entropy = 7.999926 bits per byte.

Optimum compression would reduce the size
of this 2912256 byte file by 0 percent.

Chi square distribution for 2912256 samples is 299.50, and randomly
would exceed this value 5.00 percent of the times.

Arithmetic mean value of data bytes is 127.5390 (127.5 = random).
Monte Carlo value for Pi is 3.139916271 (error 0.05 percent).
Serial correlation coefficient is 0.001657 (totally uncorrelated = 0.0).

As expected, Ent reveals that the 2nd capture has a very high entropy level with all tests indicating the high degree of randomness in the payloads.dat file.


Finally, just to validate our earlier analysis for the obfuscated data in capture3.dump, one last analysis exercise:

>>> fp = open("payloads.dat","wb")
>>> sniff(offline="capture3.dump",prn=handler,filter="tcp")
<Sniffed: TCP:2626 UDP:0 ICMP:0 Other:4>

And our analysis with Ent:

# ent payloads.dat
Entropy = 4.914504 bits per byte.

Optimum compression would reduce the size
of this 2914852 byte file by 38 percent.

Chi square distribution for 2914852 samples is 56249721.62, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 168.2762 (127.5 = random).
Monte Carlo value for Pi is 1.609483582 (error 48.77 percent).
Serial correlation coefficient is 0.927980 (totally uncorrelated = 0.0).
Again we show that the obfuscated data, despite not looking like unencrypted data, is not encrypted.

Wrapping Up


Evaluating the content of a packet capture is a common task for a pentest, though it can be equally useful for incident response analysis for unknown protocols, or perhaps even forensic analysis.  Both pcaphistogram and Ent gave us the answers we needed to answer the opening question.  I generally use Ent for my own analysis purposes, though I’ll re-run the data through pcaphistogram when I’m doing reporting since the output of pcaphistogram is much more interesting to look at and easier to understand than explaining the Chi Square Distribution test.






IMG_0167

In my SANS Ethical Hacking Wireless course, we leverage tools such as pcaphistogram and Ent for evaluating proprietary wireless technologies over WiFi and ZigBee/IEEE 802.15.4 networks, as well as attacking Bluetooth, DECT and more.  I’m teaching next in New Orleans the week of 1/20/2011 – 1/25/2011; come check it out!  New Orleans is one of my favorite places in the world for the food, culture and history.  If you are planning on attending, let me know and I’ll coordinate reservations for Arnaud’s for a dining and music experience you’ll always remember.


Bacon


IMG_0189Everything’s better with bacon.  Seriously.


-Josh

6 comments:

  1. The next release of Bro-IDS (1.6) will have entropy testing functionality built into it's scripting language based on the ENT code. You can take your entropy testing down to the pieces parsed out of the protocol to look for hidden encryption within "normal" traffic.

    http://tracker.icir.org/bro/ticket/265

    ReplyDelete
  2. This is awesome! Had fun playing with all of this in python for most of the day today. Looking forward to digging into it more.

    I did have a bit of an issue when feeding large sets of data into the scapy code that strips out the data payload of the packets. Feeding large sets of data tends to make python take up a TON of memory. As a matter of fact, trying to parse a 300 MB file ended up crashing the system via too much memory utilization. Haven't had a chance to play around with another version of scapy to see if problem still exists...but have you seen any similar results with large data sets? Curious if its just a natural limitation of scapy or if there is a way to make the process more efficient.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Some email filtering services offer encryption so you don't have to do it manually. This is a very interesting read, though.

    ReplyDelete
  5. its awesome post

    I am using this for detecting VoIP payload is encrypted or not .
    but already data is encoded it not give much difference between encrypted and unencrypted payload..

    ReplyDelete