In VoIP, audio samples are placed into data packets for transmission over the IP network. Typically, a single packet contains anywhere from 10 to 30 milliseconds of audio. TCP and UDP are two of the most commonly used connection protocols used for data traversal across the Internet.
Data travels across the Internet in packets. Think of them like letters: Like letters, the packets have an envelope with a to/from address on them. TCP and UDP are just two types of envelopes. They both carry data and both use IP addresses, but the outside envelope is different. Think USPS versus FedEx. The address on the envelope is the IP address for where the packet came from (source address) and where it's going (destination address). TCP is so prevalent on the Internet that it's typically combined with IP and written as TCP/IP.
TCP functions as the “FedEx” part of the analogy from above. Whenever two servers “speak” TCP, they set up a formal connection. Every time a packet is sent from one side, the other side sends a packet back acknowledging the packet's arrival. If no acknowledgment packet arrives after a certain amount of time or if the acknowledgment states that there was a problem, then the packet is re-sent. It can sometimes take a few seconds for a packet to be fully successfully transmitted. TCP is optimized for accurate delivery, not timeliness, and is the protocol for WWW sites and email, among others.
Because TCP is connection oriented, it also guarantees that data packets will be delivered in the same order in which they were sent. The process goes something like this:
Endpoint A sends packet 1 to endpoint B.
Endpoint B receives packet 1 without error and sends acknowledgement packet back to endpoint A.
Endpoint A receives the acknowledgement packet and proceeds to send packet 2 to endpoint B.
If no acknowledgement packet is received after a certain amount of time, the original packet is retransmitted. This guarantees that all data sent is without error and in the correct order.
Unlike TCP, UDP is connectionless, which means that data packets can be sent without warning, preparation, or negotiation. UDP also lacks any kind of error control. Not only can packets be delivered in the incorrect order, but they can also get completely left out. UDP is meant for applications where you are more concerned with keeping the stream of information going than making sure you receive every single packet. This makes UDP ideal for real-time services such as VoIP.
UDP is a protocol optimized for getting data packets to their destination in a timely fashion; it’s meant for real-time services like VoIP where it’s important to keep the data stream going.
Why is UDP ideal for real-time services and not TCP? Believe it or not, it’s actually TCP’s “reliable” nature that hurts the end user experience; delays happen every time an error like packet loss occurs. These delays, which are caused by retransmitting broken packets and any following packets that may have already been sent, translate into an unacceptable level of jitter for the end user.
Luckily, real-time communications services such as VoIP do not require a completely reliable transport layer protocol, which allows UDP to shine. Errors like packet loss usually only have minor impacts on the audio output. It is much better to drop a packet and have a few milliseconds of silence than to have seconds of lag.
UDP and TCP protocols come into play with VoIP because they structure the way web traffic travels through the Internet. TCP and UDP packets are sent from a source to your phone or computer, and if any of these packets are dropped, it will affect the quality of your call. Voices will crackle, static will emanate, and frustration will build.
At this moment, Junction Networks has thousands of devices attempting to connect with Junction Networks. These devices include everything from individual SIP phones to SIP devices to other PBXs. Most of the connection attempts are simple SIP registrations. A SIP registration is when a SIP device tells the server, in this case Junction Networks, that it's available for calls and what its IP address is. This communication happens anywhere from every minute to every hour for every device. That's a lot of packets.
If these were TCP packets, each time a phone wanted to tell us that it's available, it would have to go through the whole TCP connection setup. That would be a huge amount of overhead for a VoIP carrier. In a LAN environment, it would be manageable, but for thousands of individual devices and hundreds of them attempting to register every second, a TCP connection would grind servers to a halt.
VoIP traffic is best left as UDP traffic for both server load and call-quality reasons.
Once the phones are registered and a call is set up, it's time for UDP to take center stage. A phone conversation is a stream of packets meant to be created, sent, and received in real time. With TCP, a lag—any lag—would mean a degradation in the quality of the phone call. Imagine hearing something on the call one to two seconds after the person on the other end says it. You're replying to what they're saying, but they've already moved on. It would be totally disconcerting. And, since it's real time, there's no catching up. Better to drop a packet and have a millisecond of silence than seconds of lag.