The WebRTC API is a collection of three different APIs: getUserMedia(), RTCPeerConnection, and RTCDataChannel. Each of the APIs works concurrently with the others to establish in-browser, plugin-free media streams. While the two RTC APIs deal with the transmission of data between two peers (i.e. browsers), the getUserMedia() API deals with the actual synchronization of local audio and video elements into objects that can be passed between two browsers using simple HTML5 commands.
Before HTML5 and the WebRTC API, developers needed Flash proprietary plugins to transmit audio and video data on the web. Flash often delivered poor quality experiences, and would require costly server licenses. Other plugins were a headache for end users to download, and an even larger hassle for organizations to continually maintain across different browsers and operating systems. The idea of an in-browser mechanism that could temporarily capture and transmit audiovisual data always seemed to be a desirable alternative, though no options presented themselves before the WebRTC API.
In 2011, Google released the specs for the WebRTC API after acquiring several real-time communications (RTC) related companies. The challenge with any form of RTC is not necessarily the capture of individualized audio and video streams. Rather, it is the synchronization of these data into accurate and transmittable forms of media that poses the real hurdle. The WebRTC API uses getUserMedia() to combine the audio from a computer's microphone and the video from a computer's camera into a synchronized stream that can be passed from browser to browser as a JavaScript object.
getUserMedia() takes three parameters:
Constraints are objects that specify details about the types of media to be accessed. It can be used to specify what inputs to use like { video: true, audio: true } for both the microphone and camera, for example, or to place more requirements on the stream ( resolution, SD, HD). Usually, calling getUserMedia() will prompt the end user with a browser warning that asks him/her to grant access to their camera and microphone. If the end user does not explicitly grant permission, then the errorCallback will trigger. Secured browsers (https://) will remember if the user granted permission during the first call, and not ask for permission on subsequent calls from the WebRTC API.
Every MediaStream generated by getUserMedia() has its own input, which can be captured by navigator.getUserMedia(). This inputted data can be outputted to a video element, or to RTCPeerConnection, the signaling mechanism that connects two peers (i.e. browsers). Media from the computer’s camera and microphone are captured by the getAudioTracks() and getVideoTracks() methods. Streams captured by the WebRTC API can pass back multiple MediaStreamTracks as arrays. For a simple voice application that does not involve video, one audio MediaStreamTrack will be passed as an array. For a chat application involving multiple cameras, the microphone, and a screenshare application, upwards of four MediaStreamTracks will be returned.
The getUserMedia() API is currently built into 63% of the world’s browsers. This is welcome news to developers who are searching for simple, effective methods of sharing audiovisual data across the internet. The WebRTC API is a free, open source project that does not require any licensing fees. The days of cobbling together costly codecs and unwelcome plugins are over for developers who decide to take advantage of WebRTC’s seamless ability to capture and transmit any form of media. When paired with superior signaling architectures such as OnSIP's platform, the WebRTC API can connect users without anything more than a Chrome or Firefox browser.