With the iPhone 4g, video chat through Facetime is a reality in a mobile device. As a frequent traveler, I use Skype on my laptop or netbook to stay in touch with family and friends, but it usually requires some planning and coordination. With Facetime, we can initiate a voice call over the cellular network, then switch to video on demand, when WiFi service is also available (which hopefully not be a requirement in the future).As a packet junkie, I find the concept of Facetime very interesting. The intended usage for Facetime, as described by SteveEsteban, is for a user to place a call over the cellular network with the freedom to switch to video, then back and forth as desired. Focusing on the network protocol components, there are several interesting challenges:
- Device capabilities negotiation and call setup over WiFi;
- Video content streaming between devices;
- Authorization to accept the video stream by recipient;
- NAT traversal for users behind a WiFi NAT interface;
- Binding between GSM and WiFi traffic to mitigate spoofing attacks.
Knowing this, a lot of interesting questions come to mind. How is the management and streaming traffic protected? How is the call authorized by the end-user? What can we deduce by sniffing the WiFi-side of a Facetime transaction?
In this multi-part series, we'll look at how the Facetime protocol works, answering these and other questions while looking at tools and techniques for network protocol analysis. It's my hope that you'll learn about the Facetime protocol by reading this series, and furthermore, be able to apply these techniques to other protocols as well.High-Level Assessment
To assess the protocol, I've taken several packet captures from my unencrypted wireless network, c alling 888-Facetime (Apple's service for customers to try out Facetime) and a colleague at the SANS Institute. Most of the analysis will be on the call to 888-Facetime, though I'll introduce other packet captures as needed.The Facetime call with 888-Facetime was initiated by Apple's representative, which I'll herein refer to as an "inbound" session, due to the differences in Facetime calls in the role of initiator or responder. The details of my iPhone 4g are as follows:
| iOS Version: | 4.0 (8A293) |
| IP Address: | 172.16.0.114 |
| MAC Address: | 5c:59:48:02:8a:65 |
Loading up the packet capture in Wireshark, I applied a display filter to include traffic only from or to my address:
ip.addr eq 172.16.0.114Using Wireshark's Protoco l Hierarchy summary (Statistics | Protocol Hierarchy), we can get a quick look at all the protocols in this 28,034 packet capture file, as shown.
Besides the low-layer protocols, we can see different activity here:
- UDP DNS traffic (to be expected);
- Session Traversal Utilities for NAT (STUN);
- Session Initiation Protocol (SIP);
- Lots of unrecognized UDP data packets;
- HTTP traffic transmitting XML data;
- HTTPS traffic;
- Unrecognized TCP traffic;
- ICMP.
Wireshark doesn't give us the option to sort this traffic view by time, but we can switch to the Conversations view (Statistics | Conversations) to view time-relative data by protocol, as shown (TCP first, then UDP):
We can see a few nodes are involved here:| Address | Name | Note |
| 17.149.36.103 | No DNS Name | Apple, Inc system in the 17/8 netblock |
| 72.215.224.43 | init.ess.apple.com.edgesuite.net | An Akamai server, a239.da1.akamai.net |
| 199.7.52.190 | crl.verisign.net | Verisign's CRL server |
| 17.155.4.14 | No DNS Name | Apple, Inc system in the 17/8 netblock |
| 17.155.5.251 | No DNS Name | Apple, Inc system in the 17/8 netblock |
| 17.155.5.252 | No DNS Name | Apple, Inc system in the 17/8 netblock |
| 68.105.28.11 | cdns1.cox.net | My ISP's DNS server |
| 17.109.28.227 | No DNS Name | Apple, Inc system in the 17/8 netblock |
Using the timing and address information, we can construct a timeline of what happens in this session:
Step | Nodes | Description |
| 1 | 172.16.0.114 -> 17.149.36.103 | The iPhone 4g initiates a TCP session to the remote host over TCP/5223. Wireshark does not have a dissector for this protocol, though it believes the port number is associated with the HP Virtual Group protocol. |
| 2 | 172.16.0.114 -> 17.155.5.251 | Several UDP connections from the iPhone 4g to Apple's server over UDP/59007. |
| 3 | 172.16.0.114 -> 17.155.5.252 | More UDP traffic to a host with the next 4th octet over UDP/59007 |
| 4 | 172.16.0.114 -> 72.215.224.43 | HTTP traffic to the Akamai server over XML, retrieving certificate information from Apple's servers. |
| 5 | 172.16.0.114 -> 17.155.4.14 | HTTPS traffic to an Apple server. |
| 6 | 172.16.0.114 -> 17.109.28.227 | UDP STUN traffic to an Apple server for NAT traversal. |
| 7 | 17.109.28.227 -> 172.16.0.114 | UDP SIP traffic from Apple revealing phone numbers, among other details. |
| 8 | 17.155.5.14 -> 172.16.0.114 | UDP traffic over port 16402; making up the majority of the packet capture data, this is likely the video stream information which continues until a SIP BYE message is observed. |
Summary
Based on this analysis we can determine several critical pieces of how Facetime works:- Unknown TCP protocol starts the conversation, likely initiated following an event that starts on the GSM network;
- Unknown UDP traffic between two hosts with similar IP addresses;
- Certificate validation through an Akamai server, followed by an HTTPS request to an Apple server;
- STUN traffic for NAT traversal;
- SIP traffic for call setup and negotiation;
- UDP stream data for video/audio.
-Josh




First of all this was in awesome post. I spent a large part of my day explaining to our CIO why this might not be the best thing since sliced bread and that I wouldn't discuss anything business critical since we don't know yet what is being recorded/captured.
ReplyDeleteI also have a feeling that this is really just Apple iChat server in the cloud.
UDP is the RTP (Real Time Protocol) stream which is the voice stream. You will have two RTP streams over ephemeral UDP ports one for sending and one for receiving. SIP is the setup protocol and RTP (which is over UDP) is the voice protocol. In Wireshark (because I doubt the UDP packets are encrypted) you can assemble that together and actually hear the voice conversation. Then again if the conversation passes through an iChat server it MIGHT be encrypted which would be smart. If not encrypted just remember that your conversation is going out over the public Internet that way and anyone with a packet capture between you and the remote party can hear it. Video should also be occurring via UDP as well which Wireshark currently cannot compile and show you. It might be a part of the RTP traffic as well but I doubt it and I'd have to see the actual PCAP.
ReplyDelete-Jeremy Combs, CCIE Voice #23890
Sorry, let me clarify: RTP and SIP both use UDP (SIP can use TCP as well, RTP is exclusively RTP) but UDP is the preferred method of transport for SIP considering SIP takes up more processing resources when used over TCP.
ReplyDeleteTCP/5223 might be Jabber over SSL. you should be able to verify using
ReplyDeleteopenssl s_client -connect IP:5223
A good test is to verify whether the SSL check operated by the client will prevent a MITM, for this stunnel is rather useful.
openssl s_client -connect 17.149.36.103:5223 -showcerts -verify 5
ReplyDeleteverify depth is 5
CONNECTED(00000003)
depth=2 /O=Entrust.net/OU=www.entrust.net/CPS_2048 incorp. by ref. (limits liab.)/OU=(c) 1999 Entrust.net Limited/CN=Entrust.net Certification Authority (2048)
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=2 /O=Entrust.net/OU=www.entrust.net/CPS_2048 incorp. by ref. (limits liab.)/OU=(c) 1999 Entrust.net Limited/CN=Entrust.net Certification Authority (2048)
verify return:1
depth=1 /C=US/O=Entrust, Inc./OU=www.entrust.net/rpa is incorporated by reference/OU=(c) 2009 Entrust, Inc./CN=Entrust Certification Authority - L1C
verify return:1
depth=0 /C=US/ST=California/L=Cupertino/O=Apple Inc/OU=Internet Services/CN=courier.push.apple.com
verify return:1
If you are not seeing traffic to registration.ess.apple.com, then you need to look at what the iPhone is doing right after set up (you will have to restore to get there).
ReplyDeleteYou also need to take a look at your phone bill. It sends a SMS to Apple after set up as an out of band verification of the phone number. AT&T might be filtering that SMS, talk to someone who has a SIMlock free iPhone on a other Carrier.
This comment has been removed by the author.
ReplyDelete5223 = XMPP over SSL (This is for current Jabber authentication methods)
ReplyDelete"Well known" TCP and UDP ports used by Apple software products
http://support.apple.com/kb/TS1629?viewlocale=en_US
16384-16403 UDP = Real-Time Transport Protocol (RTP), Real-Time Control Protocol (RTCP)
ReplyDeleteAre you sure with the 8th step ?
ReplyDelete7.155.5.14 -> 172.16.0.114 UDP traffic over port 16402; making up the majority of the packet capture data, this is likely the video stream information which continues until a SIP BYE message is observed.
I did a capture and the stream data was sent from a third party server (NTT)
This comment has been removed by the author.
ReplyDeleteI appreciate your CCIE voice renewal thoughts. Actually I am collecting information about the ccie voice online. Thanks for adding this information in my knowledge.
ReplyDelete_____________________
ccie voice
thanx for info, but i'd better recommend to use this nice FaceTime calls recorder http://www.imcapture.com/IMCapture-for-FaceTime/, i do enjoy it!)
ReplyDelete