Adaptive Rate Delta Modulation for Non-Synchronous, Full Duplex, Wireless Digital Communications

 

Dr. Bruce A. Harvey

FAMU-FSU College of Engineering

Department of Electrical Engineering

2525 Pottsdamer Street

Tallahassee, FL  32310-6046

bharvey@eng.fsu.edu

(850) 410 - 6451

 

 

Abstract

This paper introduces the concept of a low-cost wireless digital communication system for voice communication based on a technique called adaptive rate delta modulation.  This technique uses a tri-level threshold to produce delta-modulated pulses only when the amplitude has changed by a predetermined amount.  This technique significantly reduces the average pulse rate required for transmission and lowers the noise generated by delta modulation under zero input conditions.  The receiver for such a modulation is reduced to simple pulse detection and a counter to recreate the original signal.  The characteristics of delta modulation result in a system that does not require headers, packets or other synchronizing schemes.  Multiple users can concurrently use the same channel for "open microphone" communication allowing for simultaneous transmit and receive, and simple voice combining.

 1. Introduction

A typical push-to-talk digital communication system requires separate channels for each transmitter that must be received independently by each receiver.  The independent channels received are combined after conversion back into the original analog voice signals.  The separate channels are implemented as individual time slots (time division multiple access or TDMA), distinct transmitting frequencies (frequency division or FDMA) or individual coded sequences (code division multiple access or CDMA).  The receiver in essence must have multiple receive channels to handle multiple simultaneous transmitted signals.  This requirement greatly increases the cost of the communication system, especially if the number of required channels is large (greater than 3 or 4).

                                                                    

 

Analog push-to-talk communication systems typically use frequency modulation (FM).  A FM receiver has the inherent ability to combine simultaneously received voice signals on the same channel.  It is not particularly sensitive to the differing amplitudes of the received signals and thus is somewhat immune to near-far problems.  The drawback of this form of analog communications is its vulnerability to interference.  Characteristics of a FM push-to-talk system such as an intercom or walkie-talkie are hissing and spurious sounds due to interference.  The primary goal of this effort was to develop an inexpensive digital communications system that will maintain the natural functionality of the FM system and will improve the clarity of the received voice communications.

This paper focuses on the modulation, data encoding and error-control techniques that will allow a single receiver channel to simultaneously receive and combine multiple voice channels.  This system will employ an adaptive rate delta modulation (AR-DM) technique that will eliminate the requirement for synchronization typical of most TDMA systems.  The ADM technique introduces a fixed but tolerable probability of error into the system.  The resulting communication system was developed with a very simple, low-cost design.

2. Development of the Non-Synchronous, Full-Duplex Digital Communication System

2.1 Justification for Non-Synchronous Digital Communications

Synchronous digital communications efficiently utilizes the transmission channel through coordinated and cooperative use of the channel.  The channel is partitioned into time slots (TDMA), frequency bands (FDMA), uncorrelated codes (CDMA) or a combination of these techniques.  Synchronous

                                                                    

 

      Partial Support of this research was provided by the Woodrow W. Everett, Jr. SCEEE Development Fund in cooperation with Southeastern Association of Electrical Engineering Department Heads.

 


 

 

communications has the net effect of partitioning a channel into some designated number of distinct sub-channels.  Each sub-channel is a single half-duplex communications channel that must be combined with another channel in order to achieve full-duplex communications.  The communication sub-channels must each be processed independently at the receiver.  Also, each sub-channel is a point-to-point or point-to-multi-point link.  Multi-point-to-multi-point communications can only be achieved by designing the receiver to combine independent sub-channels after receiving each channel independently.

Synchronous digital communications is an efficient channel user, but the cost of efficient channel use is not always warranted.  Voice communications, for instance, has relatively low data rate requirements and it is sometimes not cost effective to efficiently utilize the communications channel.  Assume an application requires a “party-line” voice communications system where some numbers of users need to communicate simultaneously.  The users wish to essentially carry on a conversation over an extended area.  Examples include repair or construction crews operating on large objects (planes, buildings, spacecraft, etc.), emergency services systems or intercom systems for multi-point communications.

Push-to-talk radios can achieve some semblance of conversation it enables only one user to talk at a time and often requires the user to manually press a button.  Push-to-talk systems do not allow the full-duplex communications needed to achieve a natural conversation among multiple persons.  Full duplex digital communications between multiple users can be achieved with synchronous digital methods.  However, a separate transmit sub-channel is required for each user, and each receiver must be capable of receiving all channels simultaneously and combining the channels into a single audio output.  Such a system is very complex, requires considerable analog and/or digital processing, and is thus rather expensive.  Also, voice communications over a dedicated channel is inherently inefficient since the voice output from a single person is sporadic and has many pauses. 

A non-synchronous digital system using a single channel that can be implemented in a simple, cost-effective manner while providing relatively clear voice communications would be an ideal alternative to analog FM, push-to-talk or synchronous digital communications.  In the following sections a system is developed which achieves these goals is described.  Rather than present a system and then describe its attributes, this report will instead detail the approach that led to the discovery of the non-synchronous digital system.  The resulting system is a low-cost,

 

 

voice communications only system that allows a moderate number of users to carry on a conversation as if they were all together in a single room.  The system will take advantage of the natural pauses and amplitude variations of voice to efficiently utilize the transmission channel and reduce the number of errors due to collisions.  Errors due to collisions will be higher than what is acceptable for most data communication systems, but still low enough to provide clear voice communications.  An initial goal of the development was to provide a wireless system, but the resulting system is also applicable to wired systems sharing a single channel.

2.2     Collisions Rates and Non-Synchronous Communications

The primary benefit of synchronous or cooperative communications schemes is the efficient utilization of bandwidth.  Such schemes cooperatively share some parameter of the communications channel: TDMA schemes share time, FDMA spectrum and CDMA orthogonal signal space.  Carrier sense schemes can be characterized as pseudo-synchronous since the algorithm attempts to cooperate by sensing other transmissions and delaying transmission in an attempt to avoid collisions.  This efficient use of the channel increases the complexity and cost of the communication system.  A non-synchronous (or non-cooperative) scheme sacrifices some efficiency for simplicity and lower cost.  The loss in efficiency of non-synchronous communications is due primarily to collisions.

For low data rates, the TDMA is arguably the easiest of the synchronous multiple access schemes to implement.  A single receiver operating at a fixed frequency can differentiate (de-multiplex) multiple transmitted data streams.  A typical TDMA scheme allocates one or more time slots within a time frame to each transmitter in which the transmitter can transmit a packet of data.  A packet for a mobile or simply wireless TDMA system consist of a preamble for receiver synchronization, a header for transmitter identification, and data or content.  A master source or transmitter establishes the start of each frame and often negotiates the users of the individual time slots within each frame (e.g. slotted ALOHA).

A candidate scheme for non-synchronous (and non-cooperative) digital communications is TDMA without a master transmitter or carrier-sense capability.  Each transmitter simply transmits its packets of data at a random time within each frame.  Since there is no master the boundaries of each frame are undefined and only the period of the frame is common to each transmitter.  In a voice application, each transmitter samples the voice signal and then

 

 

                                  

 

transmit the packet formed from the sample at a data rate much higher (>5 times) than the rate required for a single voice channel.  Therefore each packet only uses a small fraction of the frame for a single sample.  When not transmitting a packet, each user “listens” for packets transmitted by other users.  Other users form similar packets and transmit them at random times within  the

 
 

frame.  A collision occurs every time two or more users attempt to send a packet at the same time.  The collision can cause the loss of one or more packets (headers corrupted) and/or erroneous reception of packet data (samples).  Collision rates must be kept low in order to minimize the effect on the received voice signal.

 

Since each packet can be assumed to be randomly placed within a frame, the start time for each user’s packet is uniformly distributed between 0 and .  Also, the start of the frame is defined by the start of a particular user’s packet.  Therefore the other packets can arrive in  possible orders.  The probability of no packet collision within a frame can be expressed as follows:

 

 2.1.1 Pulse Code Modulation and Non-Synchronous TDMA: Assume a non-synchronous TDMA scheme is implemented for voice communications only.  A pulse-code modulation (PCM) is used to encode voice at a rate of 8ksample/sec. To transmit one sample perframe would require a frame length T of 125 msec.  Assume a packet consists of a 4-bit preamble, 4-bit header and 4-bit PCM sample (~25 dB SNR [1, p. 272]) for a total of 12 bits.  Transmitting just one voice channel in a frame requires a base transmit bit rate of or a base transmit bit length of  @ 7.8 msec.  To non-synchronously transmit N users (voice channels)

 

 at the same frame rate requires a  much higher transmit bit rate >> (or. << . The packetlength will be seconds.

The number of users and the acceptable packet error (collision) rate determines the transmission bit rate required for non-synchronous TDMA. Since frame start time is

not specified, assume the frame starts (t = 0) at the beginning of the packet for a given user (User 1).  For no collision to occur, the next packet (from User 2) to be transmitted must begin no earlier than time  where  is the length of a packet in seconds.  This second packet must begin before time  since there are N packets to be transmitted in a frame.  Define the start time for the second packet as .  The third packet in the frame similarly must have a start time  in the range .  The Nth packet must have a start time    in the range .  This is demonstrated for N = 4 in Figure 1.

 

 

Equation 1 requires numerical integration for values of N greater than 3 or 4.  An N-user analysis will require an (N – 1)th order numerical integration.  Therefore Equation 1 can only be of practical use for small values of N.

The probability of no collision within a frame can be bounded or approximated using a number of methods.  Self developed an estimate of the probability of overlap of periodic window functions with differing periods and dwell times, and random starting delay.  This method works well for many radar applications, but not as effectively for several functions of the same period and dwell, and random delay within each frame. [2] 

A rather simple bound can be derived by assuming that each packet is successively and randomly placed within each frame.  The start time for each user’s packet is uniformly distributed between 0 and .  The probability of no collision with a particular previously placed packet when randomly placing a packet within the frame is equivalent to the probability of the start time of the

 

 

           

 

new packet not being within (± 1 packet length) of any previously placed packet. 

The frame time is again assumed for analysis to start at the start of the first packet placed in the frame.  The start of the second packet must be placed within the range .  The start of the next packet must be placed in the same range, but not within of the start of the second packet.  Therefore the start of the third packet must be placed in a range size .  It follows that the bound for the probability of no collision is:

 

The probability of collision is therefore upper bounded by

         Figure 2 plots the upper bound of probability of collision as a function of the transmitted bit rate for the number of  users N=2,4,8 and 16. The exact probabilities from Equation 1are plotted for N = 2 and 4. Note that the bound and exact expressions are the same for N = 2.                         The results if Figure 2 demonstrate that the transmission bit rate must be greatly increased in order to change from synchronous to non-synchronous TDMA.  For example, 2 users under synchronous TDMA requires a transmission bit rate of  bits per second (assuming header and preamble is not reduced).  For non-

 

   
 

synchronous TDMA the transmission bit rate must be about 10 Mbps to keep the probability of packet collision below 0.01.  That is a bit rate increase of 50 times.  For 4 users the rate increase is 150 times (from exact expression).  The bounds for 16 users indicate that the rate increase may be over 10,000 times or on the order of 1 Gbps.  This rate is unrealistically high for an inexpensive communications system.

The analysis above indicates that non-synchronous TDMA using PCM is not practical for an inexpensive wireless communications system.  Therefore alternatives must be sought.  One alternative is the use of delta modulation (DM).

2.1.2 Delta Modulation and Non-Synchronous TDMA:  Delta modulation (DM) is well suited to voice communications.  While voice communications contains frequency components up to 4 kHz, it has been shown that DM can assume a maximum, full-amplitude frequency of 800 Hz.  DM also is comparable to PCM for moderate signal-to-noise ratios (SNR), up to ~25 dB (assuming equivalent bandwidth and voice communication). [1]  This SNR is quite adequate for normal conversation.

DM is generally more tolerant to bit errors in communications.  Each bit transmitted from a DM system is roughly equivalent to one of the least significant bits of PCM.  An error in one of the more significant bits of PCM has a much greater impact on the output than an error in a DM transmitted bit. 

The primary advantage of DM over PCM in non-synchronous communications is that DM does not require the overhead.  A DM signal can be detected by simple pulse detection like radar pulse detection.  The overhead required for bit synchronization in PCM is unnecessary.  A “positive” pulse reception causes the estimate of the signal to be incremented while a “negative” pulse results in a decrement of the estimate.  It is unnecessary for the receiver to know the source if the communication system is designed to combine voice signals.  The receiver simply increments or decrements according to the pulses received.

A comparison of DM to PCM at the same quantization SNR will provide useful information.  The SNR for DM for voice communications is approximately

                                                                 

                                         (4)

                                  

where  sampling rate, B = bandwidth = 4000 Hz and . [1, p. 287]  The sample rate required to achieve a  25 dB SNR is

 

 

.  Though the sample rate is much higher than the required sampling rate for PCM, each DM sample results in only 1 bit.

Figure 3 plots the upper bound of probability of collision (now DM bits) as a function of the transmitted bit rate  for the number of users N = 2,4,8 and 16 .  Note that the probabilities of collision bounds are only moderately improved for DM.  However, a collision in DM results in the loss or corruption of a single bit while a PCM collision results in the likely loss of multiple bits or samples.  The use of DM is an improvement, but greater gains are required if the resulting system is to be implemented in a simple and inexpensive system.

The sample rate for a DM system is a function the peak amplitude of the signal , , and the step size .  The equation for this relation is

                                                                                                              (5)

 

The variable  has been determined by empirical evidence and is related to the maximum slope of voice signals.  The maximum amplitude  can be normalized to 1 without loss of generality.  The sampling frequency and step size are inversely related and are determined by the required SNR.  First inspection of equation 5 leads to the conclusion that there is no way to lower the sample rate without adversely impacting the quantization SNR.  However, the actual obstacle is the assumption that the sampling rate must be fixed. 

A variable or adaptive sampling rate DM system will be demonstrated in the next section.  This system will lower the average sampling rate and thus greatly reduce collisions and the resulting bit losses.

3.       Adaptive Rate DM for Non-Synchronous Digital Communications

The block diagram for a standard delta modulator is depicted in Figure 4.  A comparator or hard limiter follows the difference between the signal  and the estimate of the signal . The fixed-rate sample-and-hold then generates the pulses that are transmitted.  The length of the hold determines the width of the pulses that are transmitted.

An adaptive rate delta modulator can be implemented with only moderate changes to the standard delta modulator.  First, the comparator is replaced with a tri-level comparator which has an output of zero for an input of the range  where  and s is the DM step size.  The output of the sample-and-hold is therefore non-zero only when

 

   

. The sample rate  therefore becomes the maximum sample rate.  The actual sample rate is determined by the variations of the input signal .  A block diagram of an adaptive rate delta modulator is depicted in Figure 5.

The adaptive-rate delta modulator (AR-DM) has several advantages when implemented in an open “party-line” voice communication system.  First, standard conversation is conducted with generally only one person speaking at a time with occasional responses or interruptions.  The AR-DM system will

only transmit pulses when the input is varying and thus will produce no or few pulses when a person is not speaking.  Second, since the AR-DM does not produce pulses when there is no input, the quantization noise from a non-speaking user is removed from the receiver’s input (unlike standard DM which continues to transmit pulses when the input is constant).  Also, voice energy and amplitude varies greatly during speech [3, pp. 116-164] and thus the maximum sample rate is achieved rather infrequently even while talking.  Therefore even if

     

 

   
 

several users are attempting to speak at the same time, the collision rate for the AR-DM will be lower than for standard fixed rate DM.  Simulations in Matlab have shown that AR-DM can reduce the average pulse rate by a factor of ~28 for a = 0.5.  Analysis of voice quality and signal-to-noise ratio as a function of a are continuing.

The AR-DM system behaves in a very naturally for “party-line” conversation.  While one user is talking there is little or no interference from other users.  As more users try to speak simultaneously the collision rate increases, as does the quantization noise.  Hence the more users speaking simultaneously, the less intelligible the conversation.  The same can be said of any group of people speaking in the same room or on an old telephone party line.  Some of the advantages of AR-DM would however be lost if the users were attempting to speak simultaneously or sing in harmony since the simultaneous voicing is intentional rather than accidental.

Note also that the quantization noise for the AR-DM is at least as good as the DM for the same .  The step size  determines the noise power for either DM system when the user is speaking.  However, the quantization noise is removed for the AR-DM system when a user is not speaking.  In a standard DM implemented with non-synchronous pulses the quantization noise from all users is essentially summed at each receiver, even when some of the users are not speaking.  Therefore the received SNR is inversely related to the number of users.  In an AR-DM implementation, the noise is a function of the number of users simultaneously speaking.
      There are a number of implementations of an AR-DM transmitter.  One of the more easily implementations includes an up/down counter and

 

 

D/A converter for and estimator, and FSK modulator for transmission.  A block diagram of this implementation is shown in Figure 6.
      The sample & hold and 2-pulse generator samples the output of the tri-level comparator at the maximum sample rate .  If the sample is positive then a pulse of preset width  is generated on the output “+ pulses”.  If the sample is negative then a pulse is generated on the output “+ pulses”.  The up/down counter simply increments on a + pulse and decrements on a – pulse.  The number of bits in the counter is determined by the number of steps in the maximum input amplitude of the delta modulation.  Finally, the + and – pulses are modulated by frequencies F1 and F2, respectively, to create a binary FSK-like signal.

The receiver for the AR-DM is shown in Figure 7.  The receiver consists of 2 pulse detectors centered at the frequencies F1 and F2 followed by the same estimator circuit as used in the transmitter.  The only difference in the estimator circuit will be extra bits in the up/down counter and A/D converter to accommodate the amplitude that may be present when multiple users are present and speaking simultaneously.  The detectors in the receiver can simply be modeled as ASK receivers with unbalanced ones and zeroes or as pulse detectors with a given false alarm rate as in a radar receiver.

Note that the pulses generated by the AR-DM transmitter can be as simple as rectangular pulses, shaped pulses to reduce bandwidth or direct sequence spread spectrum pulses.  The choice to use AR-DM does not prevent the implementation of spread spectrum systems to make use of unlicensed bands (e.g. 2.4 GHz).  Also, since the AR-DM system is a true digital system, the system designer can take advantage of digital repeaters to extend the operating

 
   

 

 

range or even use wired links to interconnect groups of users over longer distances.  The naturally efficient use of the channel is superior and much less complicated in this application than voice activation or voice activity detectors (VAD) currently being investigated for use in wireless communications. [4]

4.       Conclusions and Continuing Research

The investigation of non-synchronous digital communications for voice communications resulted in the use of adaptive rate delta modulation (AR-DM).  The primary goals of simultaneous voice communications and simple (low-cost) implementation prevented the implementation of FDMA or even CDMA systems that require multiple RF receivers or considerable digital processing to implement.  TDMA became the implementation of choice because of its simple, single-receiver implementation.  The use of PCM and data packets was investigated and found to have significant implementation problems.  Bit synchronization (even under optimistic assumptions) requires significant overhead and thus more bits transmitted per pulse.  Combining multiple samples in a single packet can reduce the overhead, but this will entail buffering and additional processing.  A stochastic analysis of the probability of packet collisions under non-synchronous TDMA demonstrated that unfeasible transmit bit rates are needed to reduce the probability of packet collision to a manageable level.

Delta modulation (DM) was considered because of its simplicity, suitability to voice communications and relative tolerance to bit errors.  DM does not require bit synchronization to decode the received packet since a packet will consist of a single bit.  The receiver is therefore simply a pulse detector.  However, the high sample rates required by DM resulted in packet (bit) collision rate only moderately better (lower) than for PCM. 

An adaptive rate delta modulation (AR-DM) scheme was developed whereby a DM bit (pulse) is generated only if the input signal varies more than some fraction of the DM step size  from the previous estimate.  An implementation of this scheme


 

 

 

using FSK modulation was demonstrated.  The AR-DM system transmits pulses only when the user is speaking.  Since typical conversation has one or only a few speakers at a time, this greatly reduces the probability of pulse collision.  The maximum bit (pulse) rate of the AR-DM system will be the sample rate of standard DM, but the average pulse rate will be much lower.  Simulations have demonstrated average pulse rates reduced by over a factor of 20.

The next phase of this research will be to emulate the AR-DM system to acquire some statistical parameters of DM under normal conversation.  Since human voices vary considerably, this phase will require the digital recording of multiple voices and the simulation of the AR-DM system.  The DM step size  and the parameter can be varied to determine the parametric effects on the pulse rates generated.  Multiple voice samples can be modulated simultaneously and pulse widths varied to ascertain the probability of collision.

After determining the requirements for implementation an AR-DM will be implemented and tested.  The tests will include statistical measurements of packet collision as well as subjective tests to determine how effectively conversations can be carried out with this system.

References

[1]   B. P. Lathi, Modern Digital and Analog Communication Systems, Third Edition, Oxford University Press, New York, 1998.

[2]   A. G. Self and B. G. Smith, “Intercept Time and its Prediction,” IEE Proceedings, Part F: Communications, Radar and Signal Processing, Vol. 132, No. 4, July 1985, p. 215-222.

[3]   L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, New Jersey, 1978.

[4]   F. Beritelli, “A Robust Voice Activity Detector for Wireless Communications Using Soft Computing,” IEEE Journal on Selected Areas in Communications, Vol. 16, No. 9, December 1998, p. 1818-1829.