Potential Evasion Where IPS Fails to Validate TCP Checksums

Is it a problem if an Intrusion Protection System (IPS) does not validate TCP checksums? If you aren't familiar with the concept of TCP checksums, they're used as a means to ensure that data in the TCP header and TCP payload has not gotten corrupted in transit. This is done by taking the one's complement of all 16-bit fields of the TCP header and payload layers and some select fields from the IP header, adding them up, and placing the value in the checksum field of the TCP header. A receiving host performs this same calculation. If the value that the receiving host calculates matches the one found in the delivered TCP header checksum, the packet is accepted. If there is a mismatch, the packet is silently dropped.

Now, it might be tempting for an IPS solution to skip this validation process to save some time. I used to think that the only harm that could come from this was a false positive. For instance, let's say that someone sent a TCP segment with an invalid TCP checksum that contains malicious content "EVILSTUFF" covered by a rule or signature. The IPS would alert, but the end host would drop the packet. That's annoying but not harmful.

But, after some consideration, there are some ruses we can use for evasions. What if we can fool the IPS into resetting a session by sending a TCP reset with a bad checksum and then sending segments with malicious content? In this case, the IPS prematurely terminates the session with the bad reset, but the receiving host keeps the session open to receive the malicious data. Let's take a look at how this can be done. I'll use Snort to demonstrate the concept by using a default configuration that validates TCP checksums and later one that disables TCP checksum validation.

I've crafted a session and emulated the client-side of the conversation using Scapy. Here is the tcpdump output of the client's part of the session.

Assume that we have a Snort rule that looks for the content of "EVILSTUFF" in the payload. Segments 1 and 2 are part of the three-way handshake that establishes the session. Segment 3 is a bogus reset segment since it has an invalid checksum. Segment 4 sends contains half of the malicious content and segment 5 contains the other half. When the malicious content is split into two separate segments, the IPS is forced to reassemble the content. Finally, we send a valid reset in segment 6.

Let's run Snort using the pcap named "badchksum-rst.pcap" that captured both sides of the session created from the Scapy script. First, let's use the following configuration that defaults to performing TCP checksums.

This pertinent portion of the configuration simply uses the default stream5 configuration and contains a single Snort rule that looks for the uricontent of "EVILSTUFF" in an established session going to destination port 80. This configuration is stored in a file named "normal.conf".

Next, we'll run it using the following configuration.

The only difference is that I've added a configuration option to disable TCP checksum validation. This configuration is stored in a file named "turnoff-checksums.conf".

This first run using the default configuration of checksum validation generates an alert.

snort –A console –q –K none –c normal.conf –r badchksum-rst.pcap

07/04-21:56:59.145909 [**][1:123435:0] EVILSTUFF [**] [Priority: 0] {TCP} ->

However, the second run fails to alert using the configuration where TCP checksum validation has been disabled.

snort –A console –q –K none –c turnoff-checksums.conf –r badchksum-rst.pcap

This demonstrates that it is fairly easy to evade an IPS that fails to perform TCP checksum validation. There are several more tricks that can cause IPS evasions when TCP checksums are not validated. I'll demonstrate one or more in upcoming posts.

If you're interested in learning packet crafting such as the session I created for this experiment, I'm teaching a one-day SANS Scapy course at SANS Network Security at 2010 in Las Vegas in September.

I still have no students signed up for the course and I'm getting desperate. I've got 6 copies of the book I co-authored with Stephen Northcutt "Network Intrusion Detection: An Analyst's Handbook" sitting in a box in my basement gathering cricket legs. If you are one of the first six students to sign up and actually show up, I'll give you one of these copies free – cricket legs and all! And, I'll even sign it for you if you'd like if you feel it won't devalue it! Just tweet me - judy_novak - to let me know you've signed up!


Linux 2.4/2.6 Kernel Off-by-one TCP Timestamp Issue and Potential IDS/IPS Evasion

Back in my Sourcefire days, I did a lot of research on TCP timestamp behavior. In a nutshell, TCP timestamps can be included as a TCP option to specify the sending host's timestamp and echo the most recently received timestamp from the other side of the connection. The notion of time or timestamp is not the typical one since it denotes, for most operating systems except OpenBSD, a representation of the uptime of the host since the last reboot.

My interest was understanding how general and operating-specific timestamp behavior might evade notice by an intrusion detection or prevention system. In general, a host that receives a TCP segment with a TCP timestamp must compare the current timestamp in the segment with what it considers the previous timestamp. If the timestamp is equal or greater than the previous one, it is acceptable. Otherwise, the segment should not be acknowledged. There are many different nuances involved in this process including how to deal with out-of-order segments. If you'd like to read way more than you'd ever consider interesting about all the nuances of TCP timestamps, take a look at the paper that Steve Sturges and I wrote "Target-Based TCP Timestamp Stream Reassembly".

I discovered some very unusual behavior with Linux 2.4+ kernels where a timestamp of one less than the previous one was actually accepted. Not only that, when the acknowledgement of that segment was returned,the echoed timestamp was one more than it should have been.

I've crafted the client side of the conversation using my favorite tool Scapy and am displaying the tcpdump output from some sessions. Speaking of Scapy....

*** Warning Shameless Marketing ***

I'm teaching a SANS class "Power Packet Crafting Using Scapy" at SANS Network Security 2010 in Las Vegas on September 26.

Contrary to the popular belief - what your learn and craft in Vegas, doesn't have to stay in Vegas!

I've currently got 0, count 'em - 0, students signed up and I'm feeling awfully unloved and lonely. So, please consider attending this course, since this is probably my last shot at teaching it unless the numbers improve. I know, I know it's awfully unbecoming to have to beg and grovel, but it's all I have left.

*** End Shameless Marketing ***

Take a look first at how a receiving Windows Vista host (IP properly handles an old timestamp.

Record 1. I craft the client SYN to have an initial timestamp value of 100.

Record 2. The server responds and echoes the client's timestamp value of 100 after it's own timestamp value.

Record 4. I send the data "GET /index.html" in relative bytes 1:17 with a good timestamp value of 100.

Record 5. The server acknowledges the data with "ack 17" and echoes the good timestamp value of 100.

Record 6. I attempt to send the rest of the HTTP GET request of "HTTP/1.0\r\n\r\n" in relative bytes 17:29 with a one-less-than-acceptable timestamp value of 99.

Record 7. The server predictably issues a duplicate acknowledgement with "ack 17" indicating it did not accept the previous segment. It echoes the rejected timestamp value of 99.

Now, take a look at how a receiving Linux 2.6 kernel host (IP improperly handles an old timestamp. The session is identical to the Windows Vista one until Record 7.

Record 7. The Linux server erroneously accepts the one-less-than-acceptable timestamp value of 99 and acknowledges the data sent in Record 6. with "ack 29". As well, it echoes a timestamp value of 100 instead of 99.

You're probably thinking to yourself "Who cares?" - no wonder no one wants to attend your Scapy class - you're long-winded and tiresome and you probably went to band camp as a kid! Wait a second, though. What if an IDS/IPS is not aware of this bizarre behavior? Say, for instance, that "GET /EVILSTUFF" in a payload represents suprisingly enough - something malicious. And, say you have a rule that looks for this content in a URL. Further, suppose we divide this into two segments in our crafted session and place "GET /EVIL" in one segment with a timestamp of 100 and "STUFF HTTP/1.0\r\n\r\n" in the next segment with a timestamp of 99. This forces the IDS/IPS to reassemble the two segments.

Now, if the receiver is a Linux host, it will accept this content and perhaps something evil will happen. If you have a TCP timestamp-aware IDS/IPS, it is possible that it will discard the segment with a timestamp of 99. After all - it is invalid. However, if you have a target-based IDS/IPS such as Snort, it can be configured to use a stream5 reassembly policy of "linux" for host and it will alert on the rule.

Once again, this demonstrates the merits of having an IDS/IPS, such as Snort, that is aware of operating system specific behaviors and is adept at handling them. And again, this demonstrates what a complex protocol TCP is, the nuances for receiving host and IDS/IPS interpretation, the the wealth of possibilities for TCP evasions.


Special Look: Face Time (part 3: Call Connection Initialization)


In part 1 of this series evaluating the FaceTime protocol, we established that the FaceTime network traffic exchange looks like this:

  • Unknown TCP protocol starts the conversation (TCP/5223);
  • Unknown UDP traffic between the iPhone and two hosts with similar IP addresses (UDP/16385 and UDP/16386);
  • Certificate validation through an Akamai server (HTTP);
  • HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup, negotiation and authentication;
  • UDP stream data for video/audio (RTP streaming H.264 with AAC audio).

In part 2 we looked at the SIP and RTP traffic in more depth, identifying what I believe is a proprietary authentication protocol in the SIP MESSAGE verb and H.264 and AAC audio data in an RTP stream, extracting that data with videosnarf.  Jason Ostrom, one of the authors of videosnarf has even indicated that they plan to work on getting video extraction working so we can record and play-back FaceTime calls.

In this installment of the series we’ll look at the unknown protocol that starts the FaceTime conversation over TCP/5223.

Traffic Analysis

Wireshark does a great job evaluating a packet capture and applying heuristics or standard port designations when applying packet dissectors.  Sadly, the FaceTime traffic over TCP/5223 is not interpreted any further than the TCP layer, as shown below (due to some lost traffic during my 888-Facetime packet capture, I’ve switched to a different capture which was more complete):


We’ll have to apply our own creativity to evaluate this traffic further.  First, Wireshark’s wonderful TCP stream reassembly feature gives us the ability to view the TCP exchange in a hexadecimal view, with the option to save the data in binary format (“Raw”), ASCII, hex-dump or even C Arrays (great for taking data and dumping it into a C tool for manipulation, or otherwise modifying it to work with Python or other popular languages).


Although obviously a binary protocol (e.g. non-ASCII based) we can see plaintext strings that look similar to certificate content.  This is a common characteristic of SSL-based protocols, though Wireshark wasn’t able to identify this automatically.  Fortunately, Wireshark is also an extremely flexible tool with a little know-how.  Using the “Analyze | Decode As” feature, we can tell Wireshark to treat this traffic as SSL-encrypted to gather a bit more information from the protocol.

First, select one of the packets of the exchange that you want to decode using an alternate protocol and click Analyze | Decode As.  From the Wireshark: Decode As menu, select the Transport tab.  Specify that both ports should be decoded as SSL, as shown below:


Clicking “Apply” will cause Wireshark to reload the capture data, applying the SSL decoder to the specified port pair, as shown.


One of the great features of the Wireshark SSL dissector is that it will do stream reassembly for us, giving us the option to extract data even if it is transmitted across multiple TCP segments.  For example, in the screen-shot above I’ve selected the certificate information, highlighting the bytes in the hex view below.  For any highlighted data in Wireshark, we can export it to a binary file by selecting “File | Export | Selected Packet Bytes”.  In the Export Raw Data dialog, save the data with the filename extension “.der” to allow Windows to open it as a certificate.


Double-clicking on the file with the “.der” (or “.cer”) extension will open the certificate viewer.  We can navigate the certificate details to gather some additional information about the server service.

888-facetime-tcp5442-cert-general 888-facetime-tcp5442-cert-detail-keyusage 888-facetime-tcp5442-cert-detail-eku-client server_auth

A few points of interest from this certificate:

  • Issued to courier.push.apple.com by Entrust on April 13, 2010;
  • Key use is for Digital Signatures and Key Encipherment (e.g. key encryption)
  • Enhanced Key Usage indicates that it is valid for Server and Client authentication (e.g. mutual authentication)

Other certificates are also delivered through this exchange, including the root certificate for Entrust.

A Gentle Tap

Curiosity getting the better of me, I decided to give the Apple server at a “gentle tap” to find out more about the authentication requirements here.  One of my favorite tools is “openssl”, the binary that ships with the OpenSSL suite.  We can use this tool to connect to SSL services, extracting debug information as shown:

$ openssl s_client -msg -connect | grep -v "^ "
>>> SSL 2.0 [length 008c], CLIENT-HELLO
<<< TLS 1.0 Handshake [length 002a], ServerHello
<<< TLS 1.0 Handshake [length 0dc7], Certificate
depth=2 /O=Entrust.net/OU=www.entrust.net/CPS_2048 incorp. by ref.
(limits liab.)/OU=(c) 1999 Entrust.net
Limited/CN=Entrust.net Certification Authority (2048)
verify error:num=19:self signed certificate in certificate chain
verify return:0
<<< TLS 1.0 Handshake [length 000a], CertificateRequest
<<< TLS 1.0 Handshake [length 0004], ServerHelloDone
>>> TLS 1.0 Handshake [length 0007], Certificate
>>> TLS 1.0 Handshake [length 0086], ClientKeyExchange
>>> TLS 1.0 ChangeCipherSpec [length 0001]
>>> TLS 1.0 Handshake [length 0010], Finished
<<< TLS 1.0 Alert [length 0002], fatal handshake_failure
52865:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handsha
ke failure:/usr/src/secure/lib/libssl/../../../crypto/openssl/ssl/s3_
pkt.c:1052:SSL alert number 40
52865:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/

I’ve filtered out the hex-dump data with grep, leaving us just the informational messages in this output.  The traffic marked with “>>>” is from my system to the Apple server, “<<<” is from the Apple server to my client.

First, my system attempts to do a SSL 2.0 negotiation sending a CLIENT-HELLO message.  Apple’s server responds with a TLS 1.0 ServerHello response, followed by the certificate information (such as we saw earlier).  Following this delivery, Apple’s server sends a CertificateRequest to my client.  My client sends an empty certificate response (as indicated with a length of 7 bytes) and tries to complete the ClientKeyExchange without the use a client-side certificate.  The Apple server rejects this with a fatal “handshake_failure” and terminates the connection.

From this exchange we can see that this TLS protocol uses mutual certificate authentication; a certificate on the Apple server from Entrust and a certificate on the iPhone to complete the exchange.  This is interesting since Apple has stated that FaceTime will be an open protocol, but will apparently require a client-side certificate to connect to the Apple server, which gives them a grant/deny option for all connections on a per-device basis.  Steve Papa Esteban is no dummy (here’s looking at you, Android users!)

Client-Side Certificate

Returning to the Wireshark capture decoding SSL traffic over TCP/5223, we can extract the client certificate sent from the iPhone to the Apple server using the technique detailed above.

888-facetime-tcp5442-cert-client-issuer 888-facetime-tcp5442-cert-client-subject  888-facetime-tcp5442-cert-client-keyusage

More interesting observations are now possible:

  • The iPhone client certificate is issued by the “Apple iPhone Device CA”;
  • The iPhone client certificate common name (CN) is a GUID, likely generated at the factory;
  • Key constraints are for authenticating the iPhone as a device entity;
  • Key usage is similar to the Apple Server certificate, intended for digital signatures and key encipherment.

I Probably Should Have Started Here

I probably should have started here, but it would have been much less fun.  The Apple well-known TCP and UDP ports list used by Apple products indicates that TCP/5223 is used for XMPP over SSL.  XMPP is the Extensible Messaging and Presence Protocol, the formal name for Jabber.  Apple indicates that TCP/5223 is used for authentication in unencrypted Jabber conversations, as well as for authentication and data exchange for SSL-protected Jabber sessions.

From this analysis, we can determine that FaceTime uses XMPP to authenticate and establish a connection to an Apple “Jabber” server.  Although I don’t have a packet capture for the remote session, I imagine that some kind of GSM message is sent from the initiating device to the responding device to have both devices join the Jabber server, authenticate and exchange data that initiates the FaceTime conversation including the subsequent SIP exchange.  Due to the use of certificate-based mutual authentication, it’s unlikely that anyone will be sufficiently reproducing the FaceTime protocol on another device without Apple’s assistance for certificate issuance.

Evil Thinking For Future … Evil

I’ll leave you with a final thought to consider for future evildoing.  The private portion of the certificate used for XMPP authentication by the iPhone is stored on the iPhone device; unless the iPhone uses a TPM, it is probably stored somewhere on the file system.  If you were to jailbreak your iPhone 4g and extract that certificate, you could likely use a standard Jabber client to connect to the Apple Jabber server and monitor the activity there, including the connections on who is joining and leaving the network.  Maybe even setup a Jabber Bot and automate your evil manipulation of Apple’s server.

There’s someone knocking very loudly at my door, so that’s it for me today.  Next time we’ll catch up on the HTTPS traffic and more FaceTime analysis fun.



Special Look: Face Time (part 2: SIP and Data Streams)


In part 1 of this series we looked at the protocols involved in a Facetime call. The basic outline of the Facetime network exchange is as follows:
  • Unknown TCP protocol starts the conversation (TCP/5223);
  • Unknown UDP traffic between the iPhone and two hosts with similar IP addresses (UDP/16385 and UDP/16386);
  • Certificate validation through an Akamai server (HTTP);
  • HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup and negotiation;
  • UDP stream data for video/audio.
In this installmemt of the series we'll look at the last two components: SIP and the UDP stream information.

Examining SIP

SIP is the Session Initiation Protocol, used for controlling the setup and establishment of audio and video calls over TCP or UDP. As a text-based protocol, it looks a lot like HTTP (verbs like INVITE and BYE and numeric response codes), with a little SMTP love thrown in there as well.

Wireshark does a great job of identifying SIP traffic, even on non-standard ports. While SIP is typically done over port 5060 (or 5061 for SIP over TLS), Facetime is using UDP/16402. Wireshark gives us a summary of SIP activity in the packet capture by selecting the Telephony | SIP option, as shown.

Of interesting note here is the lack of the SIP request method REGISTER, which would be used with digest authentication to authenticate the device. This isn't a statement of vulnerability, but it indicates that Apple is not using the standard SIP authentication method, instead relying on an alternate exchange to authenticate the devices.

Also interesting here is the use of the SIP MESSAGE verb. According to RFC3428, the MESSAGE verb is used for instant messaging as part of the SIP exchange.

Otherwise, the SIP exchange is straightforward, as follows:

  • INVITE from the initiator to the responder;
  • ACK from the responder;
  • Several MESSAGE frames back and forth;
  • After a few minutes (the duration of the video call), a BYE from the responder to terminate the session.
The SIP exchange is shown in Wireshark packet list form below:

A few IP addresses worth clarifying here:
  • The remote iPhone from Apple's 888-Facetime service;
  • My iPhone's IP address on my open WiFi network;
  • My NAT address from my ISP, previously negotiated with STUN.
In the packet list we see that Apple is using user@address:port for the SIP address (URI). Looking at the detail of the INVITE frame we can gather additional detail. First we'll look at the message header content (content has been omitted to protect the privacy of the remote caller):

More interesting stuff here:
  • The Display component in the To and From fields reveals the cell phone number of both parties. In the first "To:" field shown, my cell phone number is listed "4015242911" followed by an unknown "570". This is interesting since the 888-Facetime caller's phone number was blocked from my phone display, but accessible to me from a packet capture.
  • The User-Agent of the 888-Facetime caller is "Viceroy 1.4/GK", which is similar to the User-Agent used by the iChat video client ("Viceroy 1.3", or "1.2" in older iChat clients).
Looking at the message body detail reveals more details about the session:

The message body details the Session Description Protocol (SDP) content, including the SDP session owner as "GKVoiceChatService" which is documented in Apple's iPhone SDK. We can also see the Real Time Control Protocol (RTCP) negotiated for UDP/16402, as well as multiple negotiated media attributes, essentially reduced to AAC for audio and X-H.264 for video.

Later in the SIP exchange, we see several of the MESSAGE verbs. Although intended for use in instant messaging applications, the MESSAGE verb is used by Facetime to exchange arbitrary data between the two iPhone devices. The MESSAGE verb payload data repeats the "User-Agent: Viceroy 1.4/GK" information, then includes the message "Content-Type: application/ske", similar to a HTTP exchange. Following this tag we have a Content-Length tag and "SKESeq: 1;0" for the first of the 4 MESSAGE verbs. Each subsequent MESSAGE verb also includes this content, changing the numeric identifier "1" for the successive packets (e.g. "SKESeq: 2;0", "SKESeq: 3;0" and "SKESeq: 4;0").

We can apply the display filter "sip.Request-Line contains "MESSAGE"" to focus the Wireshark display on these MESSAGE frames, as shown below.

A quick Google search doesn't turn up anything about the SKE protocol, though I'll speculate here that it is some kind of authentication negotiation mechanism. A summary of the 4 payloads following the SKESeq header is as follows:
  • SKESeq 1: A large-ish payload commonly around 785 bytes which appears to include certificate-looking information.
  • SKESeq 2: Always 4 bytes of payload: "61 f4 27 9f" (in one capture)
  • SKESeq 3: A consistent payload length of 170 bytes, no significant ASCII strings.
  • SKESeq 4: Always 4 bytes of payload: "53 a0 8e a3" (in one capture)
This data requires further analysis, possibly representing a proprietary authentication protocol used by Facetime through SIP MESSAGE verbs.  I'll devote further analysis to a later article so we can move on to the good stuff.

Data Streams

Following the SIP exchange we see a RTP exchange over UDP/16402 with a reflexive source port.  To evaluate this stream we'll turn to the videosnarf tool by Arjun Sambamoorthy and Jason Ostrom.  Videosnarf and the parent tool ucsniff are really impressive, and Jason and Arjun are really cool guys as well.

Videosnarf can read from a libpcap file, but the current version of the tool does not properly accommodate wireless packet capture link types other than native 802.11 (e.g. it cannot interpret PPI or Radiotap headers), with the following error:

# videosnarf -i 4g-inbound-888FACETIME-session-1.pcap
Starting videosnarf 0.63
[+]Starting to snarf the media packets
[+] Please wait while decoding pcap file...
[-] Invalid IP header length: 0 bytes
[-] Invalid IP header length: 0 bytes
[-]No RTP media stream found
[+]Snarfing Completed

My packet capture uses the PPI header, so I added support to handle this link type with videosnarf.  Download and apply the patch as shown (against videosnarf 0.63, future versions will hopefully integrate this functionality and not require patching):

# cd videosnarf-0.63
# wget -q http://www.willhackforsushi.com/code/videosnarf-wifi-ppiheader.diff
# patch -p1 <videosnarf-wifi-ppiheader.diff
patching file src/videosnarf.c
patching file src/videosnarf.h
# ./configure && make && make install

Once videosnarf includes the ability to read from wireless packet captures with the PPI header, we can run it against the packet capture again:

# videosnarf -i 4g-inbound-888FACETIME-session-1.pcap
Starting videosnarf 0.63
[+]Starting to snarf the media packets
[+] Please wait while decoding pcap file...
[-] Invalid IP header length: 16 bytes
Protocol: Unsupported
[-] Invalid IP header length: 16 bytes
[-] Invalid IP header length: 16 bytes
[+]Stream saved to file H264-media-1.264
[+]Stream saved to file H264-media-2.264
[+]Stream saved to file H264-media-3.264
[+]Stream saved to file H264-media-4.264
[+]Number of streams found are 4
[+]Snarfing Completed
# ls -l H264-media-*
-rw-r--r-- 1 root root  413160 Jul  5 18:24 H264-media-1.264
-rw-r--r-- 1 root root  272459 Jul  5 18:24 H264-media-2.264
-rw-r--r-- 1 root root 3765017 Jul  5 18:24 H264-media-3.264
-rw-r--r-- 1 root root 1761492 Jul  5 18:24 H264-media-4.264

Videosnarf was able to extract four H.264 data streams, saving them to files.  We can quickly evaluate the contents of the files to determine if the content itself is encrypted using the "ent" tool:

# ent H264-media-3.264
Entropy = 4.509034 bits per byte.

Optimum compression would reduce the size
of this 3765017 byte file by 43 percent.

Chi square distribution for 3765017 samples is 298830527.55, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 55.8586 (127.5 = random).
Monte Carlo value for Pi is 3.626079279 (error 15.42 percent).
Serial correlation coefficient is 0.622531 (totally uncorrelated = 0.0).

Ent applies several tests to evaluate the entropy and randomness of a given file.  In this example, entropy is fairly low at 4.5 bits per byte.  Compare this to a data stream collected from the Linux /dev/urandom device:

# dd if=/dev/urandom of=rand bs=4096 count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 1.64117 s, 2.5 MB/s
# ent rand
Entropy = 7.999961 bits per byte.

Optimum compression would reduce the size
of this 4096000 byte file by 0 percent.

Chi square distribution for 4096000 samples is 224.01, and randomly
would exceed this value 90.00 percent of the times.

Arithmetic mean value of data bytes is 127.5652 (127.5 = random).
Monte Carlo value for Pi is 3.142760882 (error 0.04 percent).
Serial correlation coefficient is 0.000133 (totally uncorrelated = 0.0).

Or an encrypted file of all 0's:

# dd if=/dev/zero of=zero bs=4096 count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 0.0211891 s, 193 MB/s
# openssl enc -aes-128-cfb -in zero -out zero.enc
enter aes-128-cfb encryption password:
Verifying - enter aes-128-cfb encryption password:
# ent zero.enc
Entropy = 7.999947 bits per byte.

Optimum compression would reduce the size
of this 4096016 byte file by 0 percent.

Chi square distribution for 4096016 samples is 300.32, and randomly
would exceed this value 5.00 percent of the times.

Arithmetic mean value of data bytes is 127.4714 (127.5 = random).
Monte Carlo value for Pi is 3.137092793 (error 0.14 percent).
Serial correlation coefficient is 0.000067 (totally uncorrelated = 0.0).

Clearly the output from the Facetime video stream as extracted by videosnarf is not encrypted. Sadly, it does not appear that the extracted data is viable to play with mplayer:

# mplayer H264-media-3.264 -fps 17
MPlayer 1.0rc2-4.3.2 (C) 2000-2007 MPlayer Team
CPU: Intel(R) Core(TM)2 Duo CPU     L7100  @ 1.20GHz (Family: 6, Model: 15, Step
ping: 11)
CPUflags:  MMX: 1 MMX2: 1 3DNow: 0 3DNow2: 0 SSE: 1 SSE2: 1
Compiled with runtime CPU detection.
mplayer: could not connect to socket
mplayer: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.

Playing H264-media-3.264.
H264-ES file format detected.
xscreensaver_disable: Could not find XScreenSaver window.
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffh264] vfm: ffmpeg (FFmpeg H.264)
Audio: no sound
FPS forced to be 17.000  (ftime: 0.059).
Starting playback...
[h264 @ 0x896a290]illegal POC type 5
[h264 @ 0x896a290]sps_id out of range
[h264 @ 0x896a290]sps_id out of range
[h264 @ 0x896a290]decode_slice_header error
[h264 @ 0x896a290]concealing 12 DC, 12 AC, 12 MV errors

MPlayer interrupted by signal 11 in module: decode_video
- MPlayer crashed by bad usage of CPU/FPU/RAM.
  Recompile MPlayer with --enable-debug and make a 'gdb' backtrace and
  disassembly. Details in DOCS/HTML/en/bugreports_what.html#bugreports_crash.
- MPlayer crashed. This shouldn't happen.

It appears the reconstructed file is close to a H264 file, but has some errors preventing it from being played back.  This is still positive from an attack perspective though, since we know the content is not encrypted; hopefully the videosnarf developers will release an updated version soon that can address any problems with reconstructing and saving the H.264 stream.


Let's summarize what we learned today:

  • While Facetime uses SIP, it does not use the standard authentication mechanisms;
  • Phone number information is disclosed in the SIP exchange, even if it is blocked on the phone itself;
  • Facetime uses the SIP MESSAGE verb for passing arbitrary data between iPhone devices involved in a Facetime call.  This could be a proprietary authentication mechanism;
  • Videosnarf with a minor patch can extract video and audio stream data.
  • The video and audio content of a Facetime conversation are NOT encrypted, leaving them susceptible to eavesdropping attacks if the underlying WLAN infrastructure is weak or otherwise compromised;
  • Mplayer is unable to play back this stream data today; hopefully fixes can be applied by the videosnarf team to resolve this in the future.

Next time we'll spend some time looking at the initial TCP exchange between the iPhone 4g and the authorization process that initiates the connection.  Comments and questions are welcome, thanks!



Special Look: Face Time (part 1: Introduction)

Facetime Introduction

With the iPhone 4g, video chat through Facetime is a reality in a mobile device. As a frequent traveler, I use Skype on my laptop or netbook to stay in touch with family and friends, but it usually requires some planning and coordination. With Facetime, we can initiate a voice call over the cellular network, then switch to video on demand, when WiFi service is also available (which hopefully not be a requirement in the future).

As a packet junkie, I find the concept of Facetime very interesting. The intended usage for Facetime, as described by SteveEsteban, is for a user to place a call over the cellular network with the freedom to switch to video, then back and forth as desired. Focusing on the network protocol components, there are several interesting challenges:

  • Device capabilities negotiation and call setup over WiFi;
  • Video content streaming between devices;
  • Authorization to accept the video stream by recipient;
  • NAT traversal for users behind a WiFi NAT interface;
  • Binding between GSM and WiFi traffic to mitigate spoofing attacks.

Knowing this, a lot of interesting questions come to mind. How is the management and streaming traffic protected? How is the call authorized by the end-user? What can we deduce by sniffing the WiFi-side of a Facetime transaction?

In this multi-part series, we'll look at how the Facetime protocol works, answering these and other questions while looking at tools and techniques for network protocol analysis. It's my hope that you'll learn about the Facetime protocol by reading this series, and furthermore, be able to apply these techniques to other protocols as well.

High-Level Assessment

To assess the protocol, I've taken several packet captures from my unencrypted wireless network, c alling 888-Facetime (Apple's service for customers to try out Facetime) and a colleague at the SANS Institute. Most of the analysis will be on the call to 888-Facetime, though I'll introduce other packet captures as needed.

The Facetime call with 888-Facetime was initiated by Apple's representative, which I'll herein refer to as an "inbound" session, due to the differences in Facetime calls in the role of initiator or responder. The details of my iPhone 4g are as follows:

iOS Version:4.0 (8A293)
IP Address:
MAC Address:5c:59:48:02:8a:65

My AP was running in 802.11b mode (for simplifying the packet capture process), also acting as a NAT at

Loading up the packet capture in Wireshark, I applied a display filter to include traffic only from or to my address:

ip.addr eq
Using Wireshark's Protoco l Hierarchy summary (Statistics | Protocol Hierarchy), we can get a quick look at all the protocols in this 28,034 packet capture file, as shown.

Besides the low-layer protocols, we can see different activity here:

  • UDP DNS traffic (to be expected);
  • Session Traversal Utilities for NAT (STUN);
  • Session Initiation Protocol (SIP);
  • Lots of unrecognized UDP data packets;
  • HTTP traffic transmitting XML data;
  • HTTPS traffic;
  • Unrecognized TCP traffic;
  • ICMP.

Wireshark doesn't give us the option to sort this traffic view by time, but we can switch to the Conversations view (Statistics | Conversations) to view time-relative data by protocol, as shown (TCP first, then UDP):

We can see a few nodes are involved here:

Address Name Note DNS NameApple, Inc system in the 17/8 netblock Akamai server, a239.da1.akamai.net
Verisign's CRL server DNS Name
Apple, Inc system in the 17/8 netblock DNS Name
Apple, Inc system in the 17/8 netblock DNS Name
Apple, Inc system in the 17/8 netblock
My ISP's DNS server DNS Name
Apple, Inc system in the 17/8 netblock

Using the timing and address information, we can construct a timeline of what happens in this session:




1172.16.0.114 -> iPhone 4g initiates a TCP session to the remote host over TCP/5223. Wireshark does not have a dissector for this protocol, though it believes the port number is associated with the HP Virtual Group protocol.
2172.16.0.114 -> UDP connections from the iPhone 4g to Apple's server over UDP/59007.
3172.16.0.114 -> UDP traffic to a host with the next 4th octet over UDP/59007
4172.16.0.114 -> traffic to the Akamai server over XML, retrieving certificate information from Apple's servers.
5172.16.0.114 -> traffic to an Apple server.
6172.16.0.114 -> STUN traffic to an Apple server for NAT traversal.
717.109.28.227 -> SIP traffic from Apple revealing phone numbers, among other details.
817.155.5.14 -> traffic over port 16402; making up the majority of the packet capture data, this is likely the video stream information which continues until a SIP BYE message is observed.


Based on this analysis we can determine several critical pieces of how Facetime works:

  • Unknown TCP protocol starts the conversation, likely initiated following an event that starts on the GSM network;
  • Unknown UDP traffic between two hosts with similar IP addresses;
  • Certificate validation through an Akamai server, followed by an HTTPS request to an Apple server;
  • STUN traffic for NAT traversal;
  • SIP traffic for call setup and negotiation;
  • UDP stream data for video/audio.
In the next part of this series,we'll spend some more time look at the SIP and video/audio streaming traffic and look at some tools we can use to extract that data. Stay tuned!