
Phone: 412.831.3131
Fax: 412.831.8168
E-Mail: office@americominc.net
An Introduction To
Computer Telephony
Carl R. Strathmeyer
Dialogic Corporation
Appeared in IEEE Communications Magazine May 1996
ABSTRACT
One significant hurdle blocking the effective utilization of computer-telephone technology is the historical lack of communication between practitioners of the information processing and telephony disciplines. These two disciplines have grown up isolated from one another, with very different technical viewpoints and vocabularies. There are few practitioners who are competent in both disciplines. The inevitable result is a lack of effective communication, making it difficult to identify useful applications and to organize effective projects spanning the two disciplines. This article provides an introduction to basic computer-telephone concepts, with the goal of paving the way for better inter-disciplinary communication and a more widespread commercial utilization of computer-telephone technology.
What Is Computer Telephony?
In simplest terms, computer telephony is the technique of coordinating the actions of telephone and computer systems. This technology has existed in commercial form since the mid-1980s, but it has been exploited only in a few niche markets -- particularly in large call centers, where call volumes easily justified the cost of complex custom-built systems. But in the 1990s, several factors have combined to significantly simplify computer-telephone systems and increase the marketplace's interest in computer telephony. International standards for interconnecting telephone and computer systems have been defined, notably the Computer-Supported Telephony Application (CSTA) call modeling and protocol standards from ECMA. Mass-market application programming interface (API) specifications have been heavily promoted by major market players such as Microsoft and Novell, and are gaining rapid acceptance. Voice processing technologies have advanced steadily, providing advanced features and high port densities at attractive prices. Public networks are offering more and more services which enable computer-telephone applications, such as Calling Line ID. And most important, the world economy is doing business over the telephone at an increasing rate, prompting business organizations to look for ways to make this process more efficient and economical.
The Convergence of Computers and Telephony
Public and private telephone systems provide real-time information paths between two or more parties. Traditionally, these information paths have taken the form of voice connections, originally through hardwired analog circuitry but later through an increasingly broad range of technologies such as radio transmission, digital signal encoding, and fiber. Over time, these transmission paths were also exploited for non-voice applications such as facsimile and data transmission.
At first, each non-voice application required a distinct set of dedicated "terminal equipment", the telephony term for any user device connected to the telephone network. Facsimile machines conversed only with other facsimile machines, computer devices sent data files only to other computer devices, and so forth. But in the 1990s, these disparate sets of equipment have begun to overlap, and the general-purpose computer has emerged as the point of intersection.
Computers can now send and receive every kind of information that passes through the telephone network: They can act as facsimile machines; they can interact with human speakers through voice synthesis and recognition; and of course they can send and receive data in many formats. It is this intersection, with the general-purpose computer serving as the interface point, which makes computer telephony so intriguing and potentially valuable to the marketplace.
Call Control and Media Processing
As they play this crucial interface role, computer systems must interact with the telephone network in two fundamental ways.
A computer telephone application usually requires some combination of both functions.
These call control and media processing functions have counterparts in ordinary human telephone usage:
The first computer telephone applications concentrated on media processing, with only limited call control functions. For example, the first voicemail systems answered incoming calls, presented a greeting, and then recorded the caller's message. Such a system consists primarily of media processing functions, with call control functions limited to detecting a ring, answering the call, and hanging up after the message has been taken.
By comparison, newer voicemail and automated attendant applications have added functions such as call transferring, outdialling and paging. Applications like these require more comprehensive call control. As the cost of signal processing technologies have come down, these applications have also added advanced media processing functions such as voice synthesis, voice recognition, and fax interfaces.
Call center applications require even more sophisticated call control functions. These applications implement features such as greeting the caller with an extensive range of voice response options and then transferring the caller to wait in a queue, ultimately coordinating the simultaneous arrival of call and associated caller data at a service representative's desk. Call center applications typically utilize the most advanced call control and media processing functions, including special call control functions to monitor calls as they pass through holding queues on their way to their ultimate destinations, and comprehensive media processing functions which allow some callers to complete their business without ever speaking to a human service representative.
Back to top of this page [ Back
to FAQ ] [ Home ]
Modular Media Processing Hardware
Media processing hardware is relatively simple so long as each telephone line has a dedicated set of hardware resources. For example, a typical voice processing board might support four analog telephone lines, with speech digitization and playback circuitry hard-wired on each channel.
Media processing hardware gets considerably more complex, however, when applications need to be able to reconfigure resources on-the-fly. Larger systems also need to be expandable in modular increments to accommodate application growth.
For example, a medium-scale application may require a pool of two T1 circuit interfaces (providing a total of 48 voice channels), 48 voice digitizers and playback units, eight speech recognizers, eight facsimile processing channels, and twenty-four analog interfaces for headsets. These resources must be reconfigurable on-the-fly, meaning that an incoming call on a given T1 channel must be assignable to the digitizers, playback units, recognizers, facsimile processors and analog interfaces in any combination.
Such a configuration cannot fit onto a single circuit board (and would not be easily expandable even if it could), so several architectures have been proposed by which such systems can be assembled. The two leading proposals, MVIP and SCbus, specify time-division buses for talk path interconnection and a separate communication mechanism for coordinating the subsystems. The MVIP effort is administered by the GO-MVIP organization; the SCbus was developed by the SCSA working group, recently subsumed within the Enterprise Computer Telephony Forum (ECTF). Both of these groups have also proposed programming interfaces for the control of such systems; these are discussed later.
Signaling: The Call Control Connection
The telephone network is a widely-distributed system of intelligent switching nodes. For these nodes to cooperate successfully for the establishment and tearing down of calls, they must communicate with each other and with the users' terminal equipment. This process is called "signaling". An accurate and reliable signaling connection between telephone and computer systems is essential to successful computer-telephone applications, since signaling is the means of call control and constitutes the only communication between the intelligent systems in the two domains.
Signaling can take place "inband", that is, through the telephony talk path channel, or "out-of-band", that is, through some communication channel other than the talk path. In today's telephone network, terminal equipment signaling is generally in-band (except for ISDN devices), while signaling between telephone switches is often done out-of-band for security and performance reasons.
The original terminal-equipment signaling was, of course, the human voice as a subscriber spoke to the operator. The first automatic terminal equipment signaled with timed make-break pulses across an analog telephone line and special switch-generated tones to alert the subscriber to call states such as ringing and busy/engaged. In many telephone systems, tone signaling is now used for inband terminal equipment signaling in both directions. The best-known scheme for terminal-equipment-to-network signaling is Dual Tone Multi-Frequency (DTMF), under which the terminal equipment generates simultaneous pairs of tones to represent each dialed digit.
Unfortunately, the signaling from the telephone network back to the terminal equipment has not been similarly standardized, a situation all too familiar to subscribers trying to make international calls. The signaling tones returned from the far end of an international call often do not resemble local signaling tones, and the subscriber may not be able to tell the difference between another country's busy/engaged signal and a ringing signal.
Needless to say, it is a significant challenge to design computer-telephone terminal equipment which can accurately interpret the widely-varying tones and other in-band signals generated by various elements of the worldwide telephone network. Indeed, achieving accurate and reliable signaling between computer-based telephone interfaces and traditional telephone equipment is one of the greatest difficulties in building reliable computer-telephone applications.
This difficulty can be somewhat alleviated by shifting to out-of-band signaling schemes, which generally rely on unambiguous digital messaging. For example, the digital message-oriented signaling of an ISDN basic rate terminal device is much more reliable than analog in-band signaling. (But note that even ISDN basic rate signaling is not yet completely standardized around the world.) A similar digital, message-oriented (but but non-standard) signaling capability is provided by the signaling schemes used by the digital telephone sets offered by many PBX vendors. And computer-telephone integration (CTI) links, now offered on most modern PBXs, offer a signaling mechanism through which a computer system can receive consolidated signaling for groups of telephone extensions.
Multiple signaling methods are often available on a single telephone system. One PBX might simultaneously support a CTI link, ISDN trunk circuits, and proprietary digital set signaling. Any of these will provide more accurate signaling information for computer-telephone applications than is available through inband analog terminal device signaling on those same switches.
Back to top of this page [ Back
to FAQ ] [ Home ]
First-Party and Third-Party Call Control
The relationship between a computer application and the call control it exerts over a telephone line is classified as first-party or third-party call control.
First-party call control is call control exerted over a telephone line on which the computer application is also a "talking" party -- that is, a call on which the application is also capable of exercising media processing functions.
For example, if a computer application receives an inbound call on a voice board having a normal telephone line interface, senses the ring signal, answers the call, and initiates the system's voicemail application to greet the caller, it is using first-party call control.
Third-party call control is call control exerted over telephone lines on which the computer application is not necessarily also a "talking" party.
For example, if a server-based application is monitoring several users' telephone lines (without the benefit of an actual physical connection to each of those lines), is alerted to an arriving call on one of the lines, and causes that call to be diverted to some other user's telephone, it is exerting third-party call control. Third-party call control usually also implies out-of-band signaling, since there is by definition no direct connection between the computer system running the application and the telephone line being controlled. Generally, first-party call control functions are those which could be accomplished by a human attendant via a standard telephone set attached to the telephone system in the same manner as the application equipment. Third-party call control functions are those which would require a human attendant to use a specialized telephone set with special priveleges, such as an operator's console.
Sharing Computer-Telephone Resources
Computer-telephone applications vary considerably in complexity depending upon whether they allow the sharing of telephone-related resources. For example, an application that has sole control of a voice card and telephone line (such as a voice response application connected to a dedicated line) is much simpler in design and construction from an application which must share control of resources with several other applications and/or a human user. Control mechanisms for these shared applications are often one of the most difficult aspects of computer-telephone application design.
For example, a telephone line terminating at a facsimile card installed in a user's personal computer would be a non-shared resource. (Figure 1) The only applications which can use this telephone line and its associated facsimile capability are those residing on that one particular computer system. On the other hand, a telephone line terminating on a server with a pool of facsimile cards could be used by any system connected to the same local area network and authorized to use the facsimile server. (Figure 2)
Each of these configurations has advantages and disadvantages. The shared configuration requires the overhead of more sophisticated access control and management capabilities, but the pooling of resources inherent in this scheme offers more efficiency in resource allocation and thus better handling of peaks and valleys in usage patterns as compared resources dedicated to individual systems. From an economic perspective, dedicated resources are more appropriate for individuals or very small work groups; server-based resources are better for medium to large work groups and for enterprise-wide systems.
Resource-sharing modes are often confused with first-party and third-party call control modes. Shared resources, accessed through a server, are usually configured for third-party call control, while dedicated resources are usually restricted to first-party call control functions. But this is not always the case. A dedicated ISDN line, terminating at a single computer system, can accomplish third-party call control functions through the capabilities of the ISDN D-channel signaling protocol without ever establishing an actual talk-path through an ISDN B-channel. Conversely, a call control server connected to a PBX via a CTI link may offer only first-party call control functions to client applications, even though the application call control requests pass through a shared server.
Back to top of this page [ Back
to FAQ ] [ Home ]
Choices For Out-of-Band Signaling
The most challenging aspect of computer-telephone applications is signaling, that is, achieving accurate and reliable call control. The most important recent commercial advances in computer telephony have been in this area, with improvements both in the underlying signaling connections and in the programming interfaces (APIs) which enable application software to exercise that signaling capability.
As mentioned earlier, the most reliable way to implement signaling between a telephone system and a computer telephone application is to use out-of-band signaling, which creates a direct message-based digital information link between the intelligent telephone switch and the computer-based application. This approach is much more accurate than in-band signaling, under which the application must attempt to generate and recognize widely-varying and ambiguous analog signals in the call's talk path.
Out-of-band signaling is available in several forms:
The practitioner will frequently need to choose between these mechanisms when designing a computer-telephone system.
Many interesting computer telephone applications can be built using only the out-of-band signaling capabilities of the ISDN basic and primary rate specifications. (Figure 3) For example, an application system connected to the telephone network through an ISDN facility can provide a network-based automatic call distributor (ACD) distributing calls to remote public network subscribers, or a call routing application for private PBX networks. These applications, however, are often limited by the telephone domain where the ISDN signaling is valid and consistent. For example, the ACD application may not operate correctly when calls cross between public telephone network boundaries, and the call routing application depends on inter-PBX feature transparency and may not work in a heterogeneous network of different manufacturers' PBXs. These limitations will gradually disappear as ISDN telephone service becomes consistent worldwide.
In contrast to ISDN D-channel signaling, the SS7 and CTI link techniques can provide a more complete view of calls passing through the corresponding telephone domains. The domain for SS7 signaling can be as large as an entire public telephone network; the domain for a CTI link is a single telephone switch or a small number of tightly-integrated switches.
SS7 is a complex protocol, and is closely tied to the internal operation of a telephone network. Because of this, terminal equipment is not usually granted the privilege of an SS7 connection. A few long-distance telephone carriers do offer such a connection via appropriate security firewalls.
A typical such service announces each call to the customer's computer application via the SS7 protocol and then allows the application to choose among a set of pre-determined call routing options by replying with another SS7 message. (Figure 4) An arrangement based on SS7 requires sophisticated customer premises equipment, and is usually only appropriate for call centers handling large call volumes.
CTI links serve a similar purpose, but on a smaller scale more suitable for the relatively simpler environment of a customer premises PBX or a single public telephone exchange switch. (Figure 5) CTI links also offer a broader range of call control functions than commercial customer-premises SS7 services, including call initiation and hangup as well as call routing. CTI links can operate using either a proprietary protocol (such as Northern Telecom's Meridian Link Protocol and AT&T's ASAI protocol) or a standard protocol (such as the ECMA CSTA protocol mentioned earlier).
The CSTA protocol has now been implemented by a growing number of switch vendors including major manufacturers such as Siemens ROLM, Ericsson, and Alcatel. Note that commercial CTI link implementations vary in the set of features supported, and although though they are standards-based, even CSTA implementations are not necessarily equivalent or interoperable.
Because they provide access to shared resources, both the SS7-based connections and CTI links typically terminate in a server rather than a specific application computer. This allows multiple applications to influence calls flowing through a common telephone domain, and provides greater flexibility regarding the computer systems on which these applications can be installed.
Application Programming Interfaces
An application programming interface (API) is the mechanism through which application software manipulates telephone resources. APIs are necessary for both the call control and media processing functions.
Several existing non-telephony APIs have found a useful role in computer telephony, particularly for controlling media processing functions.
For example, once a telephone call is established, the Microsoft Windows APIs used for the manipulation of desktop multimedia objects (for example, the playing of sound files through a local speaker) can be used to send and receive similar multimedia content over the telephone connection. Because of their heritage, however, the resource models used by these existing APIs turn out to be more suitable for local (non-shared) resources than for remote or shared resources. New APIs and resource models are needed to implement shared media processing resources on shared servers.
Several cross-vendor efforts have sprung up to address this need, including the Multi-Vendor Interface Program (MVIP) and the Enterprise Computer Telephony Forum (ECTF), each of which has activities relating to software architectures and APIs for shared media processing resources.
Proprietary APIs for first-party call control were first developed by modem, voice board, and fax board manufacturers to support their own products. The only API in this group to achieve de facto standards status was the Hayes modem command set, which included basic functions for dialing and hanging up telephone calls.
APIs for third-party call control did not have equivalents in traditional application environments and had to be developed specifically to support computer telephony. The first third-party APIs were developed by computer manufacturers to support applications running on their own systems. For example, IBM introduced the CallPath API and Digital Equipment introduced the Computer-Integrated Telephony (CIT) API in the late 1980s for use on their respective systems.
The industry took a major step forward in the 1990s with the introduction of two call control APIs which were not linked to any individual computer manufacturer:
These APIs, both strongly oriented towards the desktop personal computer and its flourishing software industry, have made mass-market computer telephone applications economically feasible for the first time.
APIs vs. Commercial Products
A programming interface is simply a specification; it is not a commercial product in its own right. As straightforward as this may sound, the two concepts are often confused in the marketplace.
An API is the meeting point for two commercial products:
Like the application, the service provider is software, typically taking the form of a device driver which implements an interface to a particular type of telephone equipment. The rapid commercial advance of computer telephony in the 1990s can be attributed to the development and marketing of commercial service-provider software products which implement the TAPI and/or TSAPI APIs.
Novell offers a commercial CTI server product, Netware Telephony Services, which operates within the Novell Netware environment. It provides a TSAPI interface between applications on remote client machines and telephone system driver modules provided by third parties, thus creating an interface between those applications and telephone switching systems.
Microsoft has taken a similar path with TAPI, building a capability into the Microsoft Windows family of operating systems which provides a TAPI interface between Windows-based client applications and third-party service provider driver modules.
In both cases, the driver modules for specific telephone systems must be built by third parties, much as printer drivers must be supplied by printer manufacturers before their printers can be used under the Microsoft Windows operating system. Each of these products also supports a single API.
It is possible to build a CTI server which supports multiple APIs simultaneously, mapping requests from all APIs into a single common function set. This is the approach taken by the CTI server from Dialogic Corporation, CT Connect, which supports both TAPI and TSAPI interfaces. The Dialogic software also differs from the Novell and Microsoft products in that it includes built-in drivers for the ECMA CSTA link protocol and several other proprietary CTI link protocols.
Back to top of this page [ Back
to FAQ ] [ Home ]
Non-Traditional Telephony
Computer telephone applications are not restricted to the traditional forms of telephone systems based on switches, transmission circuits, and telephone instruments.
For example, the new isoEthernet technology provides telephony talk paths operating across an enhanced Ethernet local area network physical plant. Such an environment is capable of delivering standard telephone service, that is, a real-time voice path between two or more endpoints. It just accomplishes this in a new way, without the necessity of installing traditional telephone switches and wiring. And because of its inherently wider bandwidth, telecommunications facilities such as isoEthernet can handle new kinds of calls such as interactive video and images. All of these new capabilities stretch the limits of today's definition of telephony and expand the potential meaning of call control and media processing.
To accommodate these expanding definitions, new models and implementation methods for computer telephony will have to be found. For example, with isoEthernet, the switching points are highly distributed in a potentially complex toplogy. This distributed connection model is significantly different from traditional telephony, and will require new models and methods through which applications can exercise call control in that environment. The goal should be to provide these new capabilities in an forward-compatible manner from current computer-telephony architectural models.
Computer Telephony and Client-Server Computing
Appropriate application software architectures are essential to the effective use of computer telephone technology. A difficult hurdle in the early adoption of computer-telephone systems was the unfortunate requirement to modify business application software in order to make use of the new telephone features. Because most application software ran in a centralized mainframe or minicomputer, the central application software had to be changed to implement a computer-telephone feature. Most companies elected not to attempt such changes, and declined to implement computer-telephone application features even though the business benefit was attractive.
For example, an insurance company might have wanted a certain screen of database information to "pop up" for its service representatives as they answered customer telephone calls. Computer telephone technology has long been capable of generating the necessary telephone-based trigger event to accomplish this. But the insurance company would probably have balked at the necessity of modifying its central customer database application in order to achieve this feature. The risk and effort involved in changing centralized mission-critical application software was simply too great.
However, as corporations shift away from a total dependency on mainframe-based applications and towards client-server architectures, the integration of computer telephone features becomes easier and less risky.
Client-server applications depend on intelligence at the desktop, and rely on pulling information to the desktop rather than pushing data outwards from the mainframe to a dumb terminal. With the client-server approach, computer-telephone application features can be implemented at the desktop or in a department-level server rather than in the mainframe system, an easier and less risky approach which makes computer telephony accessible to a wider range of organizations.
For example, with the client-server approach, when a call arrives at a customer service representative's desk a corresponding telephone event message can be sent to an application running at that user's desktop. This event message, delivered through a computer telephony API, can trigger a desktop application to retrieve the desired information via whatever retrieval mechanism is appropriate -- including fetching the data from a mainframe.
This retrieval logic can be built into the existing client-server desktop application, or implemented as a new desktop application which interoperates with the existing one. In the latter case, desktop application integration tools such as Microsoft's Dynamic Data Exchange (DDE) and Object Linking and Embedding (OLE) can be used as an open, standard inter-application communication mechanism, further simplifying the integration effort with existing applications and eliminating the necessity to change them.
Back to top of this page [ Back
to FAQ ] [ Home ]
Computer Telephony: A Wealth of Options
Computer telephony today is characterized by a wealth of choices and options. Computer-telephone applications can be small or large in scope, simple or complex in operation. Any single feature can be implemented in a staggering number of ways, with implementation choices on both the telephone and computing sides of the equation. There is no right or wrong way to build a computer-telephone system.
With all of these choices, it is essential that the systems practitioner become knowledgeable about both computing and telephony, and begin to learn ways in which these two systems environments can be linked together.
Competence in both disciplines will become essential as the two technologies become even more closely integrated. The current focus on linkage between discrete telephone and computing systems is just a transition phase. Very soon, the distinction between telephone switches and LAN servers will disappear, as hybrid telephony servers are brought to market containing both switching and application-interface functions.
Computer telephony is at an important turning point: The necessary elements of the technology have been developed; now we need to educate large numbers of insightful practitioners who can put it to productive use.
HOME
Contact Americom / Comments
& Requests