APPARATUS, METHODS, AND COMPUTER PROGRAM PRODUCTS

UTILIZING A GENERALIZED TURBO PRINCIPLE

TECHNICAL FIELD:

The teachings in accordance with the exemplary embodiments of this invention relate generally to systems in which the Turbo principle may be employed and, more specifically, relate to communication systems.

BACKGROUND:

In telecommunication systems, a transmission channel often causes interference to data transmission. Interference occurs in all systems, but in particular in wireless telecommunication systems the radio path attenuates and distorts the signal to be transmitted in a variety of ways. On the radio path, interference is typically caused by multipath propagation, various fades and reflections and also other signals transmitted on the same radio path.

To reduce the effects of interference various encoding methods have been developed, which aim to protect the signal from interference and which also aim to eliminate interference-induced errors in the signal. One widely used encoding method is convolutional coding. In the convolutional coding the signal to be transmitted, consisting of symbols, is encoded into code words which are based on the convolution of the original signal with code polynomials. The convolutional code is determined by the coding rate and the coding polynomials. The coding rate (k/n) refers to the number (n) of produced coded symbols in relation to the number (k) of symbols to be coded. The encoder is often implemented by means of shift registers. The constraint length K of the code often refers to the length of the shift register. The encoder can be considered a state machine having 2^{KA} states. One encoding method further developed from the convolutional code is a parallel concatenated convolutional code (PCCC), which is also known as a Turbo code.

One suggestion for minimizing the effects of multipath fading involves transmitting redundant Turbo-coded packets with new parity bits when an error occurs during decoding of the original packet at the receiver. Turbo-coding in communications systems involves coding/decoding information in stages in order to avoid retransmission of a full L-bit packet upon occurrence of a packet error. In addition to a set of code bits generated by an encoder using a Turbo-coding scheme, a punctured set of code bits is typically generated and stored in transmitter memory. The original set of code bits is transmitted as an L-bit data packet to a receiver, which stores received data samples corresponding to the original set of code bits. The receiver decodes the data packet using a Turbo-decoder and determines whether an error occurred in decoding the received data packet. If so, the received data samples are maintained in memory, and a request for more information is made. Some or all of the punctured information is then forwarded from the transmitter to the receiver. A second stage of Turbo-decoding combines the new data samples with the stored original received data samples such that there is a high likelihood that decoding is correct at this point, but additional stages of decoding may be used. This example of Turbo-coding is generally known and referred to as incremental redundancy. Turbo-coding and the Turbo principle may be applied in additional applications and for other uses, as known to one of ordinary skill in the art.

Many signal processing challenges in communications and other fields can be efficiently solved by using the Turbo principle. According to this principle, the problem is split into two or more separate parts, each of which may be effectively carried out by individual components. So-called extrinsic information is then exchanged between the components representing the computed marginal densities, giving rise to an approximation of the overall joint problem. Applications of this principle include, for example, Turbo equalization for separate detection and decoding and the decoding of Turbo-like codes.

General reference with respect to Turbo codes and the encoding and decoding of Turbo codes may be made to commonly-assigned U.S. Patent Nos.: 6,621,871 to Ross et al.; 6,732,327 to Heinila; 6,771,705 to Kenney et al.; and 6,889,353 to Nieminen. General reference with respect to Turbo equalization may be made to commonly-assigned U.S. Patent No. 6,996,194 to Pukkila et al.

SUMMARY:

In an exemplary aspect of the invention, a method includes: receiving an encoded signal; and decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

hi another exemplary aspect of the invention, a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, said operations including: receiving an encoded signal; and decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

In another exemplary aspect of the invention, an apparatus includes: means for receiving an encoded signal; and means for decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

In another exemplary aspect of the invention, an apparatus includes: a receiver configured to receive an encoded signal; and a decoder configured to decode the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

In another exemplary aspect of the invention, a method for generating a graph structure for a system having a plurality of components includes: defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of the plurality of components, wherein each component specifies a subset of the set of secondary regions, wherein the set of primary regions and the set of secondary regions comprise the generated graph structure.

In another exemplary aspect of the invention, a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations for generating a graph structure for a system having a plurality of components, said operations including: defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of the plurality of components, wherein each component specifies a subset of the set of secondary regions, wherein the set of primary regions and the set of secondary regions comprise the generated graph structure.

In another exemplary aspect of the invention, an apparatus includes: means for defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and means for defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of a plurality of components of a system, wherein each component specifies a subset of the set of secondary regions, wherein the set of first regions and the set of second regions comprise a generated graph structure.

In another exemplary aspect of the invention, an apparatus includes: a processor configured to define a set of primary regions and to define a set of secondary regions describing intersections between two primary regions, wherein the set of first regions and the set of second regions comprise a generated graph structure; and a memory configured to store the generated graph structure, wherein each primary region comprises one observation and a corresponding set of variables required for conditional independence of the observation, wherein said intersections are in accordance with a component structure of a plurality of components of a system.

BRIEF DESCRIPTION OF THE DRAWINGS:

The foregoing and other aspects of exemplary embodiments of this invention are made more evident in the following Detailed Description, when read in conjunction with the attached Drawing Figures, wherein:

Figure 1 shows a simplified block diagram of various electronic devices that are suitable for use in practicing the exemplary embodiments of this invention;

Figure 2 A shows an exemplary wireless system model that may be utilized in conjunction with various aspects of the exemplary embodiments of the invention;

Figure 2B depicts an exemplary factor graph representing the exemplary distribution factorization of equation (2.13);

Figure 2C illustrates an example of region definition for an exemplary factor graph;

Figure 2D shows an exemplary region graph;

Figure 2E depicts an exemplary structure for disjoint detection and decoding;

Figure 2F illustrates an exemplary structure for Turbo-based detection and decoding;

Figure 2G shows an exemplary factor graph for the Markov model;

Figure 3 A illustrates an exemplary region graph found by the cluster variation method for a convolutionally coded signal over a multipath channel;

Figure 3B shows an example of a loopy region graph generated by the modified cluster variation method;

Figure 3 C depicts a comparison of LLRs for the modified cluster variation method and Turbo equalization with an exemplary signal;

Figure 3D illustrates an example of the underlying graph structure of Turbo equalization;

Figure 3 E depicts an exemplary region graph structure for generalized Turbo equalization in accordance with the exemplary embodiments of the invention;

Figure 3 F shows a comparison of traditional and generalized Turbo equalization for an exemplary signal;

Figure 4 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention;

Figure 5 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention;

Figure 6 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention; and Figure 7 depicts a flowchart illustrating another non-limiting example of a method for practicing the exemplary embodiments of this invention.

DETAILED DESCRIPTION:

The Turbo principle can be shown to be asymptotically optimal. That is, as the system size (i.e. interleaver length) goes to infinity, the exact marginals can be recovered as the graph approaches a tree. However, for finite systems, such a scheme will be an approximation. The exemplary embodiments of the invention generalize this principle by capturing more of the dependency between components and hence providing a better approximation for such finite systems.

One problem with the traditional Turbo principle is that dependencies between components are not correctly accounted for and, as a result of this, performance may be sub-optimal or the design may even fail to converge. This is especially a problem in smaller systems as powerful couplings are generally disregarded. This phenomenon is a significant reason why, for example, Turbo codes have not been deployed for systems with a "short" frame length. The generalized Turbo principle, as disclosed and further explained in conjunction with the exemplary embodiments of this invention, helps in such situations as a larger portion of the dependencies may be accounted for.

The exemplary embodiments of the invention generalize the traditional Turbo principle by exchanging vector-valued extrinsic information instead of scalar-valued extrinsic information as in the ordinary Turbo principle at the intersection points of the components. Another way of looking at the problem is from the underlying graph point-of-view: The traditional Turbo principle results in a graph where only scalar beliefs of the variables are exchanged at the intersection points of the components. In accordance with the exemplary embodiments of the invention, variables are grouped into regions and it is the beliefs of the regions that are exchanged at the intersection points of the components instead. Effectively, this boils down to performing so-called Generalized Belief Propagation (GBP) on a region graph, where the exchange of information is handled not by scalar- valued beliefs but by entire region beliefs.

One exemplary embodiment operates in a manner somewhat similar to the traditional Turbo principle, however the components are modified to accept region beliefs instead of single variable beliefs in the form of extrinsic information, as explained in further detail below.

Reference is made to Figure 1 for illustrating a simplified block diagram of various electronic devices that are suitable for use in practicing the exemplary embodiments of this invention. In Figure 1, a wireless network 12 is adapted for communication with a user equipment (UE) 14 via an access node (AN) 16. The UE 14 includes a data processor (DP) 18, a memory (MEM) 20 coupled to the DP 18, and a suitable RF transceiver (TRANS) 22 (having a transmitter (TX) and a receiver (RX)) coupled to the DP 18. The MEM 20 stores a program (PROG) 24. The TRANS 22 is for bidirectional wireless communications with the AN 16. Note that the TRANS 22 has at least one antenna to facilitate communication. The UE 14 further includes an encoder (ENC) 38 coupled to the DP 18. The ENC 38 is configured to encode a signal that is to be transmitted by the TRANS 22.

The AN 16 includes a data processor (DP) 26, a memory (MEM) 28 coupled to the DP 26, and a suitable RF transceiver (TRANS) 30 (having a transmitter (TX) and a receiver (RX)) coupled to the DP 26. The MEM 28 stores a program (PROG) 32. The TRANS 30 is for bidirectional wireless communications with the UE 14. Note that the TRANS 30 has at least one antenna to facilitate communication. The AN 16 is coupled via a data path 34 to one or more external networks or systems, such as the internet 36, for example. The AN 16 further includes a decoder (DEC) 40 coupled to the DP 26. The DEC 40 is configured to decode a signal that is received by the TRANS 30.

At least one of the PROGs 24, 32 is assumed to include program instructions that, when executed by the associated DP, enable the electronic device to operate in accordance with the exemplary embodiments of this invention, as discussed herein.

In general, the various embodiments of the UE 14 can include, but are not limited to, cellular telephones, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, as well as portable units or terminals that incorporate combinations of such functions.

The embodiments of this invention may be implemented by computer software executable by one or more of the DPs 18, 26 of the UE 14 and the AN 16, or by hardware, or by a combination of software and hardware.

The MEMs 20, 28 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory, as non-limiting examples. The DPs 18, 26 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non-limiting examples.

The ENC 38 and/or the DEC 40 may comprise one or more of the following: a specialized circuit, an application-specific integrated circuit (ASIC), an integrated circuit, a separate module, a function implemented by the associated DP 18, 26, a function resident in or stored by the associated DP 18, 26 (e.g., in local memory or a local storage structure), and a computer program (PROG) 24, 32 stored in the associated memory (MEM) 20, 28.

Although Figure 1 depicts the UE 14 as having an ENC 38 and the AN 16 as having a DEC 40, in other exemplary embodiments it is the UE 14 that has a decoder (DEC) 44 and the AN 16 has an encoder (ENC) 42. That is, in such exemplary embodiments the AN 16 is configured to encode a signal and transmit the encoded signal (via TRANS 30) to the UE 14, while the UE 14 is configured to receive the encoded signal (via TRANS 22) and decode it. hi further exemplary embodiments, the UE 14 and/or the AN 16 may have both an encoder and a decoder. Figure 1 illustrates this exemplary embodiment since the AN 16 has both the ENC 42 and the DEC 40 and the UE 14 has both the ENC 38 and the DEC 44. In such exemplary embodiments, the UE 14 and/or the AN 16 are configured to both encode signals and transmit the encoded signals and to receive encoded signals and decode them.

Although Figure 1 illustrates a wireless communication network 12, the exemplary embodiments of the invention may also be utilized in conjunction with a wired communication network, hi such exemplary embodiments, the various embodiments of the UE 14 can include, but are not limited to, desktop computers having communication capabilities, gaming devices having communication capabilities, music storage and playback appliances having communication capabilities, Internet appliances permitting Internet access and browsing, as well as units or terminals that incorporate combinations of such functions.

ACRONYMS

AWGN Additive White Gaussian Noise •

BER Bit Error Rate

BP Belief Propagation

BPSK Binary Phase Shift Keying

CDMA Code Division Multiple Access

CRC Cyclic Redundancy Check

DFE Decision Feedback Equalization

DFT Discrete Fourier Transform

EDGE Enhanced Data rate for GSM Evolution

EM Expectation Maximization

FBA Forward/Backward Algorithm

FDMA Frequency Division Multiple Access

FER Frame Error Rate

FFT Fast Fourier Transform

GBP Generalized BP

GMSK Gaussian Minimum Shift Keying

GSM Global System for Mobile Communications

HSPA High-Speed Packet Access

IIR Infinite Impulse Response

LAN Local Area Network

LDPC Low-Density Parity-Check

LLR Log-Likelihood Ratio

MMSE Minimum Mean-Square Error

LTI Linear Time-Invariant

MAP Maximum A-Posteriori

MIMO Multiple-Input Multiple-Output

ML Maximum Likelihood

MLSE ^{■} Maximum Likelihood Sequence Estimate

MMSE Minimum Mean-Squared Error

MSE Mean-Squared Error

PSK Phase Shift Keying

QAM Quadrature Amplitude Modulation

RF Radio Frequency

RRC Root- Raised Cosine

RSSE Reduced-State Sequence Estimation

SNR Signal-to-Noise Ratio

SVD Singular Value Decomposition

TDMA Time Division Multiple Access

VBEM Variational Bayesian EM

WCDMA Wideband CDMA

WP Weighted Projected

ZF Zero-Forcing NOTATION

General Notation

x Column vector

Xi Element i of x

X Matrix

IM Identity matrix of size M x M

OM_{XN} All-zero matrix of size M x N

[X] . ■ Element rcy of X

[X] _{:} \ The j ^{!}th column of X

p (•) Probability density of continuous variable

P (•) Probability of discrete variable

(/ (-))_{ς} , -. Average of function / (•) over posterior distribution q. (•)

E [•] Ensemble average

CH (μ, Σ) Complex-valued Gaussian distribution with mean μ and co- variance Σ

CW^{""1} (a, Σ) Complex-valued inverse- Wishart distribution with a degrees-of-freedom and covariance Σ

X% Chi-Square distribution with a complex-valued degrees-of- freedom

Scalar Operators

I • I Absolute value

mod (x, y) The value of x taken modulo y

Vector Operators

diag (•) , Diagonal matrix given by the vector ^{'}

Matrix Operators

(T Complex conjugation

(^{■}f Matrix transpose

Hermitian matrix transpose

H Matrix determinant

tτ {-} Matrix trace, i.e. sum of diagonal elements

I - 1 Matrix 2-norm

rank (•) Matrix rank

® Kronecker product

diag (•) Vector given by diagonal of the matrix

Set Operators

The set found by removal of y from X

min (•) Minimum of the set

I- ^{'} Cardinality of the set

χ \j y Union of the sets X and y

xny Intersection of the sets X and y SECTION 2 - PRELIMINARIES

This chapter builds the basic .framework in which the research has been carried out. First, the used system model is presented along with its graph representation. Next, the general topic of inference in graphs is introduced along with its application to the communication system model, including the Turbo principle.

2.1 Generic System Model

The wireless communication systems of interest are all of the classical narrowband type operating at a given carrier frequency and the equivalent complex baseband model therefore applies. For a general reference on this topic, see e.g. [Pro95, TV05]. Essentially, this permits the use of traditional linear models for many of the real-life effects on the actual signal.

A schematic oveiview of the faystem model is shown in figure 2A The transmit structure is split into separate channel encoding and modulation, but more general models with joint encoding and modulation can be constructed to account for various forms of pre-coding, but this is outside the scope of this thesis and has therefore been omitted. In addition, many alternative methods of mapping encoded bite onto a transmitted signal exist, but the linear, memoryless modulation outlined here is either used by the systems of interest or is a good approximation and this is therefore the focus of this thesis. In the following it is assumed that user 1 is the only desired user as this is typically the case for a mobile terminal, but the framework can easily be modified to support more than one desired user.

Assuming JV, information bits should be conveyed to the receiver as given by the binary vector i G {0, 1} ^{*}, the task of the channel encoder is to map this

N,

information to a new encoded vector c e {0, 1} ^{r} where 0 < r < 1 is the rate of the code. It is often assumed that the input to the encoder is i.i.d. with a uniform distribution, but as the information bits typically come from a source encoder, residual redundancies are likely to be present and thereby violating the assumptions. Additional gains can therefore be achieved by jointly performing the source decoding with the data estimation, but this has the drawback of increased complexity and dependence on the specific type of source and source encoding and this option is therefore not pursued further. The systems of interest typically utilize convolution^ codes and it is therefore assumed throughout this thesis that the encoder is a binary convolutional code of rate r having constraint length Λ^{r}_{c}.

Next, the order of the encoded bits are typically permuted by an interleaver to help make the bits appear as independent as possible to the next block, the modulator. Here, bits are collected into blocks of Q bits and mapped onto a complex- valued symbol in the set Ω out of |Ω| = 2® possible symbols. For example, if Q = 4 one could choose to map the bits onto e.g. a 16-QAM or a 16-PSK constellation set. Due to the symbol mapping, the number of transmitted symbols will be N_{x} = ^.

Finally, the symbols x^ belonging to the k'th user are filtered by a pulse-shaping filter to help control, the bandwidth of the transmitted signal. A typical choice of pulse-shaping filter is the Root-Raised Cosine (RRC) filter due to its theoretical properties and flexibility, but any filter can in principle be used. The spreading codes used in CDMA systems can be seen as nothing more than a special pulse-shaping filter. This will enforce special properties of the overall pulse-shaping filter that can be exploited, e.g. orthogonality between different codes may be achieved at the expense of excess bandwidth.

The signal is now transmitted across the wireless link by the antenna sub-system. This is accounted for by the time-varying multipath channel that models the effects of reflections and signal fading. However, real-life issues such as timing, frequency-offsets and other RP impairments are not included in this thesis as these effects arc typically not a limiting factor in the systems of interest. As discussed previously, interference from other users may occur and the model therefore includes a total of K users In addition, thermal noise will be present as modeled by the AWGN source n ~ CN (θ, σ^{2}l).

In the receiver, the signal from the antenna sub-system r is filtered to produce y in such a way that all available information about the transmitted bits is preserved in y. Although the text-book answer would be to perform matched-filtering at this point, a real-life implementation depend on the actual system and the environment in which it operates. However, as all operations between the pulse-shaping and the receive filtering are linear operations, the overall transfer function is linear and can be expressed as.

y = Hx + e (2.1)

The transfer matrix H € C^{M xN} is the overall frequency-selective MIMO channel matrix, x G Ω^ is the collection of transmitted symbols from the first K^{1} > 1 users and e e C^{M} is the overall noise term containing any remaining users plus filtered AWGN noise. Equation (2.1) looks deceptively simple, but further explanation will follow below in order to better understand it.

Finally, the task of the data estimator in figure 2 A is to determine the posterior distribution of the information given the observations, as taking decisions based on this distribution will minimize the probability of error [P0088]. However, for most interesting communication systems, finding this distribution is unfeasible and approximations must be used instead. Such approximations are the topic of this thesis.

Returning to (2.1), the overall. channel matrix H is effectively a linear convolution with temporal dispersion LT, where T is the symbol duration. Further, it is assumed that the overall channel coefficients are constant over the considered block of data, i.e. the model is a block-fading models If the rate of change in the channel coefficients is so rapid that the block-fading approximation is not valid, this can be accounted for by e.g, a Gaussian state-space model for the channel coefficients [NP03, KFSW02], but this is not considered further in this thesis. For nota.tional convenience, the ramp-up and ramp-down periods of the linear convolution are disregarded as they are typically not of major importance for the overall performance. However, a real-life implementation must take these boundary conditions into account. Based on these assumptions, the resulting structure for the overall channel matrix is

The sub-matrices H_{j} e ζ^{<N}_{r}xκ ^{N}_{t are} ^_{e} j_{a}g ^ channel matrices with N_{r} and N_{t} being respectively the number of receive and transmit dimensions per symbol. Finally, based, on the size of the sub-matrices, the size of the overall channel matrix is given by M = (JV_{x} — L + l)N_{r} and JV = N_{x}K^{1}N_{t}.

The interference term in the overall noise e has the same structure as (2.2), only now with an overall channel matrix H^ having sub-matrices H_{j} ' £ _{C}^{N}_{r}x^{(}κ-κ'^{)N}t defceπmαrαg the transfer function from users [K' + !, ■ ■ ■ , K] to the overall noise. The overall noise can therefore be expressed as

_{e = H}(i)_{χ}(/) _{+ ll} (2.3) where x.W holds the symbols from users {K' + 1, • , K} and fi is the thermal noise after receive filtering. Assuming that all transmitted symbols in the overall noise term are i.i.d., zero-mean and unit-power, we have

= J5 [66^{i/}] = H« (H«)^{H} + Σ_{fi} (2.4) where ∑g = E [nn^{fi}] is the covariance of 5 determined by the receive filter. It is then straight-forward to show that Σ is a block-banded block- Toeplitz matrix with block-bandwidth L - I. The Signal-to-Noise Ratio (SNR) of this system is defined as

(2.5)

Mσ^{2}

Under the assumption that e ~ C/V (0_{M X I}, 23)_{I} the likelihood of the symbols given the parameters is

p (y | x,H, Σ) oc |Σ| -l_{e}-(y-Hx ^{■}^)^{f} ∑^{"}-"^-y.-^{H}H-x)^{1} (2.6) For finite systems, the assumption that e is Gaussian only holds for K' = K, but it can serve as a valuable approximation for weak interfering users when K' < K. The vast majority of detectors/ decoders are most easily derived operating under the influence of AWGN and an equivalent system model fulfilling this requirement is therefore desired. One way of achieving this is to approximate e as being Gauss-Markov with a memory of N_{m} symbols, i.e. the block-bandwidth of Σ^{"1} is limited to JY_{m}. The closest distribution in the KL-divergence to the original distribution is then found by simply setting elements outside the bandwidth of the inverse to zero [KMOO] . By defining the whitening matrix F by the Cholesky factor F^{H}F ^= Σ^{"1} and letting y ^ Fy and H 4 FH, we can rewrite (2.6) as

p (V I x, H, F) OC |F|^{2}e^{H|y}-^{fix|12} (2.7)

Again disregarding boundary conditions, the structure of H is the same as in (2.2), but due to the Gauss-Markov assumption of the overall noise, the effective length of the whitened channel H is now L = L + N_{m} [ChrOόb]. This effectively means that any of the considered systems can be transformed into a system of the form

y = Hx + e , e ~ C7V (0MXI, IM) (2.8) where e = Fe. This form of the system model is used throughout the rest of this thesis and a sufficient set of parameters for this system model is then <? = {H, Σ}.

2.2 The Channel Capacity and Rate-Diversity Tradeoff

The modern research area of information theory was born with Shannon's ground-breaking theory of communication [Sha48]. Here, the channel capacity is for the first time described as the maximum amount of information carried by a channel such that it can be reliably detected and is found to be

G = log_{2} (1 + SNR) [bps/Hz] (2.9) for a scalar channel with AWGN. Designing practical communication systems capable of achieving capacity while having arbitrarily small error probability has been the goal ever since. As realized by Shannon, the channel capacity is easiry generalized to inultipath channels by frequency-domain water-filling, but it took nearly 50 years before it was generalized to "the general MIMO channel as^{1} [Tel99, XZ04]

C = N-Ho_{92} I_{M} + σ^{"2}HQH^{ff} [bps/Hz_{.} (2.10) assuming AWGN with Q = J? [xx^{H}] determined by water-filling. For fading channels, the so-called ergodic channel capacity can be found by averaging over the distribution of the channel.

For a given fixed channel at high SNR, the ML estimate of the transmitted information has an exponentially vanishing error probability, i.e. P_{0} ex e^{~SNR}. Howβλ'er considering a fading channel, the probability of error only decays as P_{e} oc SNR^{~d}, where d is the diversity-order [Pro95, TV05]. For example, if N different observations using independent fading realizations were available, a diversity-order of iV could be achieved. There are many ways of achieving this, one possibility being the use of JV receive antennas having independent fading between them. Unfortunately, sub-optimal processing may fail to take advantage of the true diversity-oider of a system resulting in sub-optimal performance. In general, maximizing the diversity-order is desired to help reliability, but it comes at the price of a reduction in channel capacity compared with that given in (2.10) [ZT03]. Hence, the maximum diversity-order can not be achieved at the rate specified by (2.10) giving rise to a rate-diversity tradeoff. A good example of this is the use of a real- valued modulation such as BPSK on a complex- valued fading channel. To reach capacity, a complex-valued modulation must be used, but the choice of only using half the degrees-of-freedom available results in an increased diversity-order. An example of a similar rate-diversity tradeoff is the choice of using space-time block codes instead of spatial multiplexing for MIMO systems in order to have a higher diversity-order.

As mentioned, sub-optimal processing may fail to extract the available diversity. A good example of this is again the scenario of using BPSK modulation on a complex-valued fading channel. Due to the real-valued modulation, the signal only spans half the signal-space and information should therefore only be extracted from this sub-space. A ML receiver would achieve this whereas the sub-optimal LMMSE detector would not. The reason for this problem is that the complex- valued domain is constrained in the sense that it can only support circular complex-valued distributions, i.e. independent and equal variance real and imaginary components over the complex space. A BPSK modulated signal in a complex- valued channel does not fulfill this circular constraint and the achievable diversity will therefore suffer from the incorrect model. A simple solution to this problem is to map the system onto the unconstrained real-valued domain having twice the number of output dimensions, i.e.

^{1} Channel capacity is here defined as the average channel capacity per input symbol oλ'or the considered block of data Yi I H_{1} -H_{Q} f x; l j β, (2.11)

j

YIQ ^{'} Hn_{3} e_{IQ}

^{■}where subscript I and Q indicates the real and imaginary part respectively, i.e. •_{/} = Re [■} and -_{Q} = Im {'}. This representation correctly captures the non-circular statistics of a real-valued modulation and all processing can then be rederived for this modified system. Approximate detectors based on the statistics of the signal can thus extract a greater share of the available diversity in the system [GSL03]. Interestingly, similar structures in space-time block codes can be exploited in the same manner [GOS^{+}CM].

2.3 Graph Representations and Inference

This section will provide an overview of how the considered system model can be represented and approximated by graphs and thereby help improve the understanding of its underlying structure. The goal of doing this is to exploit the structure of the problem in such a way that inference in these models, e.g. determining hidden variables and parameters, is performed in an efficient nmnner. This area of research is still very much active and the quest for the ultimate representation of systems as the one depicted in figure 2 A is still ongoing. The presented graphical framework is based mainly on the work presented in [YFW05], which again builds on decades of research on structured (local) computation. To indicate the versatility of the presented framework, classical methods of increasing generality that can be derived from the framework include: The FFT, forward/backward algorithm, Sum-Product algorithm, Bethe^{2} and Kikuchi approximations and the Generalized Distributive Law [AMOO]. A related framework is that of Expectation Propagation [WMT05], but this view is not pursued further in this thesis.

2.3.1 Factor Graphs and Belief Propagation

A factor graph [KPLOl] is a graphical way of expressing how a function of several variables factorizes into functions dependent only on subsets of the variables. For the purpose of this thesis, factor graphs are restricted to representing how

^{2}This is the approximation underlying the famed Turbo principle [BGT93, MMC98] a joini probability distribution function factorizes, i.e.

a

Here x_{a} indicates the o'th subset of the variables and f_{a} is a positive and finite function of the subset, so that p (x) is a ^{■}well-defined distribution. The factor graph contains the structure of (2.12) by a circular variable node for every variable X_{x} and a square factor node for every function f_{a}. If a given function node f_{a} depend on x_{z}, an edge will then connect the two. An example of a distribution factorizing in this manner is

P (XI, X2, X3, X4) OC /Λ («1, S2) /S (»2, ^_{3}, «4) /c (^_{4}) (2.13) which may bo represented by tho factor graph shown in figure 20. The task of computing marginals from distributions of the form given by (2.12) is what we are interested in. For the remaining part of this thesis, it is assumed that all variables in factor graphs are discrete. Although it is possible to have factor graphs with both discrete and continuous variables, e.g. for jointly determining information bits and model parameters, this is outside the scope of this thesis. Letting S be the set of variable nodes that we wish to determine the marginal for, the desired marginal is defined by

Ps (xs) = ∑> (^{x}) (2.14)

x\xs

where the sum over x\xs indicates summing over all combinations of x not in the set S. The problem with performing marginalization as shown in (2.14) is that it requires summing over an exponentially large number of combinations.

The method of Belief Propagation (BP) can help reduce the amount of computations required by exploiting the structure of the problem as represented by the factor graph. However, this may come at the price of marginals only being approximate, but if the factor graph is loop-free^{3}, results obtained through BP are guaranteed to converge to their true values once all evidence has been distributed [KFLOl]. The graph in figure 2 B is an example of such a system that has no loops and exact inference can therefore be performed by BP.

^{3}This means that there is no possible route fiorα any node and back to itself

The BP algorithm is a message-passing algorithm based on the idea of sending messages from nodes and to its neighbors. The message m_{a→τ} (xϊ) from factor node a to variable node i indicates the relative probabilities that X_{1} is in a given state based on the function /_{α}. Similarly, the message n_{ι→a} (a;;) from variable node i to factor node α indicates the relative probabilities that X_{1} is in a given state based on the information available to variable node i, except for that coming from the function f_{a} itself. The so-called beliefs, which are simply the approximation to a specific marginal computed by BP, is given by the product of incoming messages and any local factors, i.e.

&ι fø) oc JJ m_{a→ι} [Xi)

^{aeN{τ)} _{π} (2.15) b_{a} (x_{α}) oc . f_{a}(x_{a}) Jl n_{τ→a} (x_{t})

with N (i) indicating the set of neighbors to node i. By requiring consistency using the marginalization condition

^{&}. O=.) = ∑) ^{b}- W (^{2}-16)

the message-updates are found to be

n_{ι→a} (X_{1}) — JJ m_{c→ι} (x_{t})

^" _{π} (2-17) m_{a→t} (x_{t}) = 22 fa (x«) Jl n_{3→a} (x_{3})

The algorithm is sometimes also referred to as the sum-product algorithm due to the lower update of (2.17).

2.3.2 Region Graphs and Generalized Belief Propagation

If the factor graph contains loops, the resulting approximation may be far from the exact result, especially if the length of the loop is short. To illustrate this problem, assume the factor graph in figure 2 B also has a connection from variable node 3 to factor node C as shown in figure 2C . There is now a loop^{4} in the factor graph as there is a route from variable node 3 and back to itself and BP is therefore not guaranteed to provide exact results. The idea of Generalized Belief Propagation (GBP) is now to propagate messages between regions of nodes instead of single nodes and thereby hopefully providing a better approximation. In figure 2C , two regions Ri = {A, 1, 2} and R_{2} = {B, C, 2, 3, 4} have been

^{4}It could bo argued that this factor graph is in fact loop-free in that merging factor nodes B and C will eliminate the loop without causing a larger complexity. However, this kind of loop encapsulation is not possible for general loopy graphs.

defined. Region R_{2} encapsulates the loop that was causing BP problems and GBP will therefore be exact, but this comes at the price of increased complexity as the complexity scales exponentially with the region sizes. For this little example, the complexity would scale as -O (2^{2} + 2^{3}) compared with O (2^{4}) for exhaustive search assuming binary variables. However, the real strength of GBP is that even for region definitions that do not encapsulate all loops in the factor graph, the GBP algorithm is still well-defined and can provide improved results compared with BP. Furthermore, through the choice of regions, GBP can scale all the VJΆY from BP to exact inference by trading off complexity for improved performance.

In defining the regions, one must ensure that all variables connected to any factor node in the region must also be included in the region. In the example, this results in variable node 2 being included in two regions, but in general nodes may be included in several regions. This raises the question of how communication among regions should be performed, but also the fact that nodes can occur in several regions is a concern due Io potential over-counting. Region graphs are by definition directed graphs and a possible way to allow communication between regions R_{x} and R_{2} is then to define the region JJ_{3} = R_{x} f] R_{2} — {2} and let JZ_{3} and i?_{2} be connected to this region. Such a region graph, as shown in figure 2P , define the interactions between regions and the GBP algorithm operate on such region graphs similarly to how the BP algorithm can be formulated on factor graphs. As was the case for BP on loop-free factor graphs, the GBP algorithm provides exact results when operating on loop-free region graphs [YFW05]. As mentioned before, region R_{2} encapsulates the loop in the factor graph and the resulting region graph in figure 2 P is therefore loop-free.

The potential over-counting of nodes in the factor graph can be dealt with through the use of so-called counting numbers. These counting numbers indicate the weight with which a given region is included in the overall approximation and for the approximation to be well-behaved, the counting numbers of regions involving a given node should sum to one. If TZ is the set of all regions each having counting number CR, then the region-based approximation is said to be valid if for all variable nodes i and factor nodes a in the factor graph, we have

∑ CRIR (α) = ∑ CRIR (J) = 1 (2.18)

Ren Ren

where IR (a;) is a set-indicator function being one if x e R and zero otherwise. Given the structure of the region graph, it is easy to assign counting numbers that produce a valid approximation. If A (R) is the set of ancestors of a region , JZ, then defining the counting numbers as

CΛ = 1 - ∑ c_{r} (2.19)

r£A{R)

will produce a valid region graph. In figure 2 P , the counting numbers associated to each region are also shown and it can be easily verified that the resulting approximation is indeed valid.

Assuming that a given region-based approximation has been specified^{5}, a GBP algorithm must now be constructed to yield the desired marginals similar to how the sum-product algorithm may be used for regular BP. In fact, there are many such algorithms each generalizing the sum-product algorithm, but here only the so-called parent-to-cliild algorithm is outlined. The reader is referred to [YFW05] for other possible algorithms.

Advantages of this algorithm are the absence of explicit reference to the counting numbers of the underlying graph and, as the name implies, that it is only necessary to define messages going from parents to their children. In this GBP algorithm, as in regular BP, the belief at any region JR can be found by the product of incoming messages and local factors. However, to implicitly correct the potential over-counting, it turns out that we need to include messages into regions that are descendants of R coining from parents that are not descendants of R. This is exactly the Markov blanket of region R, making the region conditionally independent of any regions other than these. As a result of this, the belief of region R is given by

bn (XΛ) oc JJ Za(X_{4}) JJ m_{P→R} (x_{Λ}) JJ JJ m_{P}>_{→D} (x_{D}) a(ΞA_{R} PeV(R) De-D[R) P- eV(D)\ε (R)

(2.20) where mp_{→R} (XR) is the message from region P to region R and AR is the set of local factors in region R. Furthermore, V (R) is the set of parent regions to R and V (R) is the set of descendants with E (R) = R U V (R). From (2.20), the message-updates can be found by requiring consistency between parent and child regions yielding

The set JV(P, R) consists of the connected pairs of regions (J, J) where J is in E(P) but not in E(R) while I is not in E(P). Further, D(P, R) is the set of all connected pairs of regions (J, J) having J in E(R) and I in E(P), but not E(R).

2.3.3 Graph Approximations and Free Energies

Up to this point, it has been assumed that a given graph had somehow been specified as being either an exact or approximate model. First, this section will outline the underlying cost-function that GBP, and hence also BP, minimize. Next, the Bethe and Kikuchi methods of generating approximate graphs are outlined.

^{s}How such graphs may bo chosen is discussed m the next section Ib determine the cosi,-tunction of GBP, define the region energy of region R as

where again AR is the set of local factors in region R. The posterior mean of this energy term is called the region average energy and is naturally given by

U_{R}(b_{R}) = ]T b_{R}(x_{R})E_{R}(x_{R}) (2.23)

Also, let the region entropy H_{R}(b_{R}) be given by

H_{R}(b_{R}) = - ∑ b_{R}(κ_{R})ln[b_{R}(x_{R})} (2.24)

allowing us to define the region free-energy FRQ)S.) as

FRQR) = U_{R}(b_{R}) - H_{R}(b_{R}) (2.25)

Conceptually, one simply sums up the region free-energies over the entire graph and this is then the metric to minimize. However, due to the over-counting problem, the region free-energies must be weighted by their respective counting number CR to give the region-based free-energy

F_{n}({b_{R}}) = ∑ CRFRΦR) (2.26)

Ren

where Tl is the set of regions in the graph. Prom (2.26) it can be seen that if the region graph is valid, every variable and factor node from the factor graph is counted exactly once in the region-based free-energy. In [YFW05], the fixed-points of the various GBP algorithms are shown to be fixed-points of the region-based free-energy. What this means is that updating messages according to e.g. (2.21) will locally minimize the region-based free-energy. Furthermore, for the region-based free-energy minimization to make much sense, it must obey some basic constraints. First, the region beliefs &R(XB) must be valid probabilities, i.e. 0 < &j{(xjj) < 1 and sum to one. Additionally, marginals of the region beliefs should be consistent meaning that a marginal should be the same no matter what region it is derived from. If these constraints arc fulfilled, the approximation is called a constrained region-based free-energy approximation.

Similar to how the region-based free-energy was found by a weighted sum over the region free-energies, the region-based entropy can be defined in the same way from the region entropies. In [WWOS], it is argued that a good region graph approximation should achieve its maximum region-based entropy for uniform beliefs as the exact region graph must have this property. If a specific region graph fulfills this criteria, it is called a maxent-normal approximation.

2.3.4 The Bethe Approximation

An important class of free-energy approximations are those generated by the Bethe method also known simply as Bethe approximations [YFW05]. The region-based approximation generated by this method consists of two types of regions: The set of large regions TZx, ^{an}d the set of small regions TZs- Any region in TZi, contains exactly one factor node and all variable nodes connected to this factor node. On the other hand, regions in TZs consists of only a single-variable node and are used to connect large regions having variable intersections. The counting numbers guaranteeing a valid region graph are given by

C_{J}j = I - ]T c_{s} (2.27)

SSS(J?)

where S(B,) is the set of super-regions of R, i.e. regions having R as a subset. Further, all Bethe approximations can be shown to be maxent-normal [YFW05]. Due to the construction of small regions handling the interactions between regions, only single- variable marginals are exchanged and GBP therefore falls back to standard BP on factor graphs. In [YFW05], the Bethe method is generalized to allow multiple factor nodes to be in a region in the large set and similarly regions in the small set are allowed to contain full intersections between regions. This way of generating the region graph is termed the junction graph method and is essentially similar to the generalized distributive law [AMOO] , which for tree graphs falls back to the famed junction tree algorithm.

2.3.5 The Kikuchi Approximation

In the Kikuchi approximation, we use the so-called cluster variation method for generating the regions and associated counting numbers. We start out by a set of large regions TZo such that every factor and variable node is in at least one region in TZg. Furthermore, no legion in TZQ must be a sub-region of another region in TZQ. Having defined TZQ , the next level of regions 7Zχ is determined by all possible intersections between regions in 7^Q_{1} but again making sure that no region in !Zχ is a sub-region of another region in TZ_{\} . Finally, regions in TZQ are connected to their respective sub-regions in TZ_{\}. This process continues until level K where there are no more intersections and the region graph is then given by TZ = TZQ O TZ_{I} O ■ ■ ■ TZ^. The counting numbers required to make this a valid region graph is given by (2.27) as for the Bethe approximation.

Unfortunately, region graphs generated by this method are not guaranteed to be rnaxent-iioimal. Furthermore, it is argued in [YFW05] that for the free-energy approximation to be good, it should not only be valid and maxent-normal, bui also have counting numbers summing to one when summed over the entire graph. This criteria is not even guaranteed by the Bethe approximation, except for the special case of the graph being loop-free. At present, designing good region-based free-energy approximations that obejr even one of these criteria is more of an art than science, but the framework of region-based free-energy approximations is indeed very general and intuitively seems to be a fruitful path for future research. In section 3.5, methods for approximate joint detection and decoding in convolutionally encoded systems is presented based on GBP on region graphs.

2.3.6 Helping GBP Converge in Loopy Region Graphs

As for BP, the GBP algorithm is only guaranteed to converge to the exact result when the region graph is loop-free and may even fail to converge for region graphs having multiple loops. A common heuristic for managing this is to let the new message be a convex^{8} combination of the update and the last message, either directly on the messages or in the logarithmic domain. There does not appear to be any known theoretical justification for this, but for the systems of interest it seems to work best in the log-domain, i.e.

m_{P→R}{x.R) — {nι_{P→R} [X-R)] [mp_{→R\}XR)l [/,.Ab) where Wg = 1 — w± and 0 < wi < 1 is used for convex combining with Wi being a weight factor used to control the update. In fact, this can be seen to be a first-order HR filter in the log-messages with the HR filter being provably stable. Obviously, as the weight Wx approaches zero the updates become less important and thereby slowing the convergence of the overall algorithm. On the other hand, doing so stabilizes many, if not all, loopy region graphs as the couplings in the graph are relaxed. Hence, in some sense this scheme seems very similar to that of annealing in that it might be possible to prove that exact inference may be accomplished by letting the convergence rate go to zero and thus effectively perform an exhaustive search [GG84].

An observation that may justify the filtering in log-domain is the over-counting of messages occurring due to loops: If a message m has counted some evidence not once, but q times, the message m^{1~}'^{!} should be used instead. This would in fact suggest that the filtering in (2.28) does not necessarily have to be convex, but this raises the question of stability in the log-domain filtering.

Developing a sound theoretical framework for achieving a high probability of convergence for GBP in loopy region graphs while retaining an acceptable com- ^{b} A convex combination is a weighting of terms, -where all weights are positive and sums to one plexity remains an open research area. However, the applications of such a framework seern to be numerous, as it would be applicable to e.g. general Turbo setups and LDPC codes. In [YuiO2], a guaranteed convergent alternative to GBP is presented, but this also comes at the price of much slower convergence. In [LR04, LR05], a filtering scheme operating over the iterations in a Turbo setup is derived assuming that messages are Gaussian and it is shown to provide improved performance. Interestingly, the derived filter is equivalent to the convex ItR filter in (2.28) and based on the Gaussian assumption, an analytical expression for IU_{1} is further provided. Other evidence that such loop-correction schemes niay help convergence and hence performance is given in [CC06], where loop-correction is applied to the BP decoding of LDPC codes. Generalizing such ideas to general region graphs and designing methods capable of adap-tively compensating for loops while retaining an acceptable complexity seems to be an interesting topic for future research.

2.4 Disjoint Detection and Decoding: The Turbo . Principle

Determining the exact posterior of the information p (i | y) as shown in figure 2 A is practically impossible. Even in the unrealistic scenario of known system model and parameters, exact inference would be unfeasible as exhaustive search is the only known method providing exact results in general. This section will therefore describe traditional disjoint detection and decoding performed under the assumption of known system model and parameters and how this can be generalized by the Turbo principle. A common building block of such schemes is the Forward/Backward Algorithm (FBA) used for efficient detection/decoding in systems with memory and this particular algorithm is therefore shortly discussed based on the BP framework introduced earlier.

The idea of disjoint detection and decoding is to separate the two coupled operations by assuming that the other is non-existing and thus resulting in a structure as shown in figure 2E . First, the input y is fed into the detector which produces either exactly or approximately the posterior q (c_{w}) based on the assumption that the coded and interleaved bits c_{π} are independent apriori. Next, the decoder operates on a deinterleaved version of the posterior called q (c), but in order for the decoder to be simple the input must be independent, i.e. the posterior must factorize as shown in the figure. Therefore, only the marginal posterior is used as this minimizes the KL-divergence^{7} under the constraint of full factorization. Based on these marginals, the decoder determines

^{7}D_{KL} (g||p) A (*nf the approximate posterior distribution of the information bits q (i).

The problem with this disjoint scheme is that the detector does not utilize knowledge about the code and the decoder does not use knowledge of the channel. Hence, not all the available structure in the system is taken advantage of and this situation is what the Turbo principle improves upon [BGT93, KST04]. Ideally, the detection and decoding should be performed jointly, but due to complexity constraints this is unfeasible and the basic idea of the Turbo principle is then to iterate between the two disjoint components as illustrated by figure 2F .

For this to be possible, the detector should be able to accept prior information about the coded bits generated by the decoder. Typically, the decoder can directly produce the de&ired output and most detectors can be modified to accept priors without too much extra complexity. Instead of propagating the actual bit probabilities between the two components, so-called Log-Likelihood Ratios (LLR,) can be more convenient. The LLR λ, of a bit C_{1} is defined as

and is therefore just another way of parameterizing the distribution of a bit. To indicate that a fully factorized distribution q (c) = JJ_{1} q (cj) is represented using LLRs, the notation qχ (c) is usod. In figure 2 F, it is also shown how the prior input qχ _{w} (c) to any of the components is subtracted (in LLR domain) from the posterior output q_{\φ} (c) and thereby generating the so-called extrinsic information qχ_{1&} (c). This extrinsic information represents the additional information about the coded bits gained by exploiting the structure in the signal at that point, i.e. the channel structure in the detector and the code structure in the decoder. Prom a graph point of view, the Turbo principle can be seen to be a Bethe approximation [MMC98] and performing BP on this graph will then be equivalent to the structure in figure 2F . The fact that it is the extrinsic information that should be propagated comes directly from the BP updates in (2.17): Messages going in the opposite direction of a message being updated should not be included in the update. The Turbo framework takes this into account by dividing out the previous information (subtracting the component prior in LLR domain) and thereby forming the component extrinsic information. Due to the uniform prior used by the' detector in the first iteration, resulting in Qλ,pr (cπ) = 0, stopping the Turbo iterations after the first decoding will result in the traditional disjoint result. The Turbo principle can therefore be seen as a generalization of the traditional disjoint detection and decoding in figure 2E .

An important building block for detection and decoding in systems with memory is the FBA [BGJR74], which is simply a special case of the BP algorithm. For disjoint detection and decoding, the algorithm is optimal in the sense that it can

determine any desired posterior exactly in an efficient manner by exploiting the Markov structure of a multipath channel or a convolutional code. To illustrate the algorithm, a factor graph for the detection problem can be constructed as shown in figure 26 , but a factor graph for decoding of convolutional codes will have the same structure. It should be noted that the factor graph is loop-free meaning that using BP on' this graph will be exact. In the graph, X_{n} is a variable sufficient to describe the state of the system at time n and assuming the channel has a temporal length of LT_{1} a total of L symbols is therefore required. The most efficient way of distributing the evidence in this graph is by starting at any one point and propagating messages to the ends and back again. This is exactly what the FBA does by defining a forward variable a_{n} holding information from observations going from left to right, i.e. {yi, • • • , y_{n}} and a backward variable β_{n} going in the opposite direction holding information from observations {y_{n}+i, • • ^{■} , y_{N}J- l^{n} the framework of the BP algorithm, the message leaving variable node n and going to the factor node to the right of it would be a_{n} under the assumption that all nodes to the left of X_{n} have been updated in a sequential manner. Similarly, the message going in the opposite direction at the same place would be β_{n}. Due to the exclusion of messages going in the opposite direction in (2.17), the forward and backward variables will not interact and can therefore be computed separately. The complexity per symbol scales with the set-size of X_{n} and is therefore O ( N_{r}2^{K}'^{LNt}® J where K'N_{t} is the effective number of independent users/streams included in the discrete Markov model and Q is the number of bits per symbol. Although this algorithm exploits the Markov structure of the system, the complexity of this detector often makes its implementation unfeasible and approximations must be used instead as will be discussed in chapter 3. For decoding of binary convolutional codes, the complexity per information bit scales as O (r^{~1}2^{Nc}) where r is the rate of the code and iV_{c} is the constraint length. Typical real-life values for r and N_{c} lead to a complexity which is usually implementable and no approximations are therefore required. If the target is not to minimize the information BER but FER, the Viterbi algorithm [Vil67] should be used instead, which again can be seen as a special case of the FBA. Some cellular systems of interest to this thesis use Turbo codes [BGT93] instead of convolutional codes, but these codes are constructed from convolutional codes and the FBA is therefore also used as a component in its decoding. Systems may also employ block codes like LDPG, Reed-Solomon or CRC codes or combinations of all these mentioned coding schemes. Such component codes may also employ the BP algorithm for decoding [MN97, EKM06], but this is outside the scope of this thesis.

Although only described here for iterative exchange of information in a disjoint detector and decoder setup commonly known as Turbo equalization, the Turbo principle is a general way of separating various components from each other such that inference in each component becomes manageable. For example, if the information bits originated from a source encoder, e.g. a voice codec, the Turbo principle could be extended to iterate not only between the detector and the decoder, but also include the source decoder in the iterations.

2.5 Summary

This chapter has introduced the signal model that is used throughout this thesis. The model is quite general in the sense that it supports most of the features of interest in today's coded multiuser MIMO systems operating in a multipath environment. For simplicity, the channel model is assumed to be block-fading, but all methods presented throughout this thesis naturally generalize to a time-varying Gaussian state-space channel model, if desired. Furthermore, the fundamental channel capacity along with the associated rate-diversity tradeoff has been introduced including a IQ-split system model capable of capturing any non-circular properties of the signals.

Next, the concept of representing the system by a factor graph is introduced along with the sum-product algorithm for computing marginals on such graphs. As a generalization of this idea, the region graph is introduced along with a GBP algorithm for computing region marginals and methods for constructing various region graph approximations, namely the Bethe and Kikuchi approximations. In addition, a heuristic method for helping GBP converge in otherwise non-convergent loopy region graphs is presented, but the drawback of this method is slower convergence.

To see how this can be used for disjoint detection and decoding, the now well-known Turbo principle is outlined. The exchange of marginals, called extrinsic information in the Turbo framework, is a direct result of the underlying Bethe approximation leading to a manageable complexity. Having established these methods for detection and decoding, the stage is now set for improving both the individual components of the Turbo scheme, but also going beyond the Turbo framework using more advanced graph approximations.

3.5 Approximate Joint Detection and Decoding using GBP

Until now, this chapter has dealt only with detectors that can be employed in a Turbo-based receiver and not considered any other approximations to the problem of joint detection and decoding. However, based on the region-based free-energy approximations described in section 2.3.2, this section will present methods for performing approximate joint detection and decoding of convolu-tional coded signals over multipath channels, i.e. the system model shown in figure 2A . As the well-known concept of Turbo equalization for such a problem is equivalent to the Bethe approximation, the basic idea is to use a more advanced graph approximation to hopefully provide better performance without incurring the exponential complexity of an exhaustive search. A similar approach was taken in [PA06] for joint detection and decoding, but instead of convolutional codes, LDPC coding was considered.

3.5.1 The Modified Cluster Variation Method

The basic concept of this method is based on the cluster variation method for constructing a Kikuchi approximation as described in section 2.3.5. An example of a region graph generated by this method for the system of interest is shown in figure 3A . Here, the top-level regions in 7^o each contain one observation and the corresponding set of information bits required for conditional independence of the observation, e.g. yi is given by the set of information bits 2O,i- Naturally, the sets of information bits associated with each of the observations depend on the rate of the code, the interleaver and the length of the channel, but given these parameters the graph is deterministic and can be predetermined. The next levels of regions are then found by the cluster variation method until no regions intersect in the final K'ib. layer. The counting numbers for regions in ^{■}R-o are all set to one and the remaining counting numbers are found by (2.27), guaranteeing a valid region graph. Due to the Markov structure in both the convolutional code and the channel, the set-size of regions in the_ top-level is no larger than N_{C}LQ, i.e. |X_{Oi}, | < N_{0}LQ for all j , where N_{c} is the constraint length of the code, L is the length of the channel in symbols and Q is the number of bits per symbol. Furthermore, as a result of the cluster variation method using intersections to form regions, we have

\I_{hl}\ ≤ N_{c}LQ - i , Vj (3.19)

The maximum number of levels in the region graph is therefore N_{C}LQ,< i.e. K < N_{C}LQ — 1. Only counting the top-level regions, this method will there_{τ} fore have a complexity in the order of O(r~^{1}2^{N}"^{L<}^) per information bit per iteration whereas the complexity of a Turbo equalization iteration would be O(r ^{1}2^{Nc} + 2^{L}®). Due Io the many connections between layers, the region graph will generally be far from loop-free and the computed beliefs axe therefore only approximations. However, by merging of top-level regions this approximate joint detection and decoding scheme can scale all the way to exact inference, but as the complexity of the GBP algorithm scales exponentially in the region size, doing so results in a greatly increased complexity culminating in an exhaustive search when all top-level regions are merged into one.

As mentioned in section 2.3.3, the beliefs should be constrained in the sense tha.t marginals should be consistent no matter what region they are derived from. For this to be possible, the region graph must allow regions containing the same variables to communicate their intersection, i.e. they must have at least an indirect connection with the intersection being a subset. Let's consider the example in figure 3 B where the regions in H_{1} and Tl_{2} are found according to the cluster variation method. However, the right-most top-level region will not be connected at all for this method as its intersection with the other top-level regions is a sub-region of a region in Hi and the marginal of ii will therefore not be consistent. The cluster variation method is therefore modified to tackle this problem by connecting any unconnected variables to regions at lower levels, or create such required regions as necessary, so that communication can take place. We will call this the modified cluster variation method and for the example, using this modified method will result in the creation of the dotted connection. It seems obvious that the unconnected region should be connected in this simple example, but in the general case one should realize that any unconnected variable should be connected to other regions involving the same variable, either directly or indirectly. When using this modified cluster variation method, the counting numbers are found as usual and in the example, counting numbers associated with each region are shown next to that region.

An important observation from figure 3 B is that the region graph has a loop and that the resulting marginals are therefore not exact. For this simple example, it is possible to simply merge the regions in H_{1} resulting in a loop-free region graph, but this is not a general solution as this results in an exponentially increasing complexity. Generally, region graphs constructed by this method will have many loops as can be seen from figure 3A . Unlike the graph approximation underlying the Turbo principle, the resulting loop length from using this method does not increase with system size as this will simply produce a wider graph in figure 3A . Hence, increasing the system size will not make it more probable that GBP converges and heuristic loop-correction as proposed in section 2.3.6 must instead be relied upon to make the GBP algorithm convergent. Unfortunately, this results in slower convergence, which is only practical up to a certain point and it therefore seems that this method is not a generally viable solution.

However, to show that the method does in fact work when employing sufficient loop-correction, a simple system using a rate r = -| convolutional code with generator polynomial g{D) = [1, 1 + -D] is considered resulting in a constraint length of N_{c} = 2. The system transmits blocks consisting of N_{1} = 12 information bits and random interleaving is employed. This is then mapped onto BPSK symbols and transmitted over a rnultipath channel of length L = 2 given by h = [1, 4=]^{τ}. For this system, loop-correction using the convex HR filtering in (2.28) with wi — 0.1 seems to provide convergence with probability one. In figure 3.7, the LLRs computed by the GBP algorithm on the region graph found by the modified cluster variation method is plotted against the exact value found by exhaustive search foi 500 blocks. Also shown for comparison is the result found by traditional Turbo equalization for the same realizations of inteiieaveis and observations and it can be seen that GBP generally improves the quality of the marginals compared to that achieved by Turbo equalization. Although not shown here, the LLRs derived from GBP generally also result in a BER closer to that of the exact result, but marginals being closer to the exact result is in itself a desired quality, e.g. for parameter estimation.

It would be interesting to try out this method for larger systems having higher constraint and channel lengths, but as the complexity scales as O(r^{~1}2^{NaLQ}) this is only feasible for small values of N_{0}LQ. However, the perhaps biggest obstacle to such an approach seems to be the slow convergence coming from the loop-correction required to guarantee convergence. However, using the cluster variational method to form the region graph appear to yield good results for some special detection problems [SWS04].

3.5.2 The Generalized Turbo Principle

Inspired by the failure of the modified cluster variation method due to an excessive amount of short loops in the region graph, a natural way of avoiding such loops is to take a closer look at the Turbo principle. Focusing on Turbo equalization, the structure of the underlying graph approximation can be illustrated as shown in figure 3D . Here, the lower and upper Markov chains represent respectively the channel and convolutional code, with the crossing connections representing the interleaved exchange of extrinsic information. An important property of this structure is that for random interleaving, the probability of short loops decreases with the system size, i.e. the inteiieaver length [XESOl]. This is a general property of the Turbo principle and is one of the main reasons why this framework has been so successful. However, for smaller sized systems the over-counting resulting from loop-feedback can result in inferior performance and non-convergence and it is for such systems that we will try to improve upon the Turbo principle.

In fact, under the assumption of random interleaving, the graph structure asymptotically approaches a tree [AVOl, XESOl] and the Turbo principle is therefore asymptotically optimal. For finite systems, loops of finite length will exist and it is for such systems that we will try to improve upon the Turbo principle.

Unlike the modified cluster variation method where the notion of Markov chains is lost,. we will now explicitly preserve the two Markov chains, i.e. as given by the channel and convolutional code. In fact, the structure in figure 3D can be viewed as two Markov chains which intersect each other in a manner determined by the inter leaver. The Turbo principle then lets these two Markov chains exchange single- variable beliefs in the form of extrinsic information at the intersection points. The general idea is now to modify the exchange of information so that not only single-variable beliefs are exchanged, but entire region beliefs. This idea of exchanging region beliefs between components, in this example between Maxkov chains, readily generalizes to any scheme where the Turbo principle can be employed and will therefore be called the generalized Turbo principle. Random interleaving in such a system will scale the loop-length with the system size in the same manner as for the ordinary Turbo principle and thus providing the same desirable features leading to a high probability of convergence. In the framework of region graphs, the graph approximation underlying such an approach can be seen to be a junction graph as the exchange of information is no longer accomplished by single-variable beliefs, but by multi-variable beliefs [YPW05]. In essence, one thereby exchanges not single- variable beliefs (or scalar-valued extrinsic information) between components, but multi-variable beliefs (or vector- valued extrinsic information) over the denned regions.

For the Turbo equalization system considered, such a method should be able to capture more of the dependency between the two Markov chains. To accomplish this, regions should ideally capture the full intersection between the Markov chains as represented by a maximum of N_{C}LQ information bits leading to a complexity of O(r~^{l}2^{NrLQ}) instead of O{r^{~l}2^{N}-- + 2^{LQ}) for ordinary Turbo equalization. However, an interesting option is to only capture the strongest couplings between the Markov chains and thus establishing a framework in which one can tradeoff performance for lower complexity. In such a framework, constraining the exchange of beliefs to being single-variable beliefs recovers the traditional Turbo principle.

Setting up the region grapli starts, out as in the cluster variation method, i.e. defining regions in 72.o to be the observations and associated information bits._{.} Here it should be noted that as in the cluster variation method, the number of information bits required in each region in 72-_{0} is a maximum of N_{C}LQ bits. The next level TZ_{1} will be defined so as to handle the channel interactions as can be accomplished by choosing regions to be the intersections between any two neighboring regions in 72._{0}, i.e. in the time-domain Markov chain. Similarly, regions in TZ- _{1} represent the convolutional code Markov chain and are also found as intersections between any two neighboring regions in TZQ . However, neighboring regions should here be seen from the convolutional code Markov chain point-of-view and determining which regions are neighbors of a region is therefore uniquely given by the code rate and the interleaver. An illustration of this method of constructing the region graph is shown figure 3 E . Here, the definition of regions in 72._i is not explicitly shown due to regions being determined by intersections of regions in 72.o as given by the interleaver and code rate, making an illustration of this difficult. In addition, care must be taken not to include a given coupling in both TZi and 72.-_{1}, resulting in a direct feedback-effect in the graph and thereby guaranteeing that the GBP algorithm will be non-convergent. ' Furthermore, as in the modified cluster variation method, marginals should be made consistent by. requiring communication between regions containing the same variable. If required, this can be achieved by adding appropriate regions in e.g. 72._i and the region graph will therefore be given by the set of regions 72. = 72-_{0} LJ 72-_{1} U 72.-_{1} with counting numbers given by (2.27). Using the notation in [YFW05] , the resulting region graph is a junction graph with large regions 72-_{1}, = 72.Q and small regions TJ-g = 72-i U 72,_i. The counting numbers of regions in 72-jr, must be one due to the fact that every region contains at least one factor node. The counting numbers of regions in 72.^ are given by C_{R} = 1 — d,_{R} where ■d,_{R} is the number of neighboring large regions, which for this type of junction graph is always 2, i.e. CR = —1 for R G 72.^.

As for the modified cluster variation method, significant book-keeping is required to construct the appropriate region graph for a given system, but as real-life systems are likely to use fixed coding schemes^ the graph structure can be predetermined and put into a look-table. In figure 3 F , the LLRs achieved

for the same setup as in the modified cluster variation method is shown including an approximate distribution of the LLR errors from the GBP algorithm and the traditional Turbo principle. Prom this it can be seen that the generalized Turbo method provides similar results as achieved with the modified cluster variation method, but due to the longer loops in the region graph much less relaxation using the convex IIR method is required for convergence. In fact, convergence becomes less and less of a problem as the system size grows as is also the case for the traditional Turbo principle. The generalized Turbo principle was conceived very late in the research study and as a result of this, only the simulation in figure 3 F is included here. However, the author firmly believes in the generality of this principle and hopes that this framework will provide a platform for future research.

3.6 Summary

This chapter has outlined several methods for approximate detection and decoding. First, the general form of linear detection is outlined as the full posterior provided by this may be of interest, e.g. for parameter estimation. Next, a practical method for performing whitening of noise/interference is described, providing flexibility and robustness to the remaining discrete signal set. Furthermore, the sphere detection and decoding framework is presented and a previously unknown connection between the QL factorization of the channel matrix and minimum-phase prefiltcring is introduced. This establishes an unrecognized coupling between sphere detection and traditional RSSE, where sphere detection can be seen as a dynamic variant of RSSE with decisions taken only when a specified level of certainty has been achieved. Finally, the concept of using GBP on region graphs for approximate joint detection and decoding in systems with convolutional codes has been introduced. A modified cluster variation method is presented for this, but an excessive number of short loops in the graph makes convergence troublesome. As a result of this, the Turbo principle's method of generating long loops in the underlying graph is reused in a region graph setting. This gives rise to a generalized Turbo principle were region beliefs are exchanged between components instead of single- variable beliefs, as is the case for the traditional Turbo principle. Due to time constraints it has not been possible to fully investigate the proposed method by simulations before the thesis deadline, but a simple simulation is provided to indicate the improved performance offered by such an approach.

The exemplary embodiments of the invention generalize the Turbo principle by exchanging vector-valued information or region beliefs between components. This has the advantage of providing a better result, for example, in the form of a lower Bit Error Rate (BER) and better quality of the end-result marginals. In addition, the new framework allows a greater design flexibility than with the traditional Turbo principle. For example, new approximations may be constructed which are in between traditional Turbo and full exhaustive search. While such systems may incur increased complexity, the amount of such an increase largely depends on the approximation made, that is, the underlying graph structure.

While the exemplary embodiments of the invention are particularly suitable for use in a receiver, further exemplary embodiments are also suitable for use in a transmitter wherein the transmitted signal is constructed such that a (generalized) Turbo principle may be utilized by a receiver that receives the signal.

Figure 4 depicts a flowchart illustrating one non-limiting example of a method for practicing the exemplary embodiments of this invention. The method provides improved signal quality for a transmitted signal. In box 401, a signal is encoded such that a generalized Turbo principle, that is a feature of the exemplary embodiments of this invention, can be applied to the decoding of the signal. In box 402, the encoded signal is transmitted to a receiver. In box 403, the received encoded signal is decoded using the generalized Turbo principle.

In accordance with aspects of the exemplary embodiments of the invention, apparatus, methods and computer program products are provided that utilize a generalized Turbo principle by exchanging region beliefs. In addition, the generalized Turbo principle may be employed in conjunction with Turbo equalization for separate detection and decoding of signals and the encoding or decoding of Turbo-like codes or Turbo-encoded signals. Note that the exchanged region beliefs may be viewed as vector-valued extrinsic information.

In accordance with aspects of the exemplary embodiments of the invention, apparatus, methods and computer program products are provided that utilize a generalized Turbo principle in accordance with a graph where variables are grouped into regions and the beliefs of the regions are exchanged. La further embodiments, Generalized Belief Propagation (GBP) is performed on the region graph.

In accordance with aspects of the exemplary embodiments of the invention, apparatuses, methods and computer program products are provided that afford an improved signal quality (e.g., BER) by: turbo-encoding a signal in accordance with a generalized Turbo principle; transmitting the signal to a receiver; and decoding the turbo-encoded signal.

In accordance with aspects of the exemplary embodiments of the invention, apparatus, methods and computer program products are provided that utilize a generalized Turbo principle in conjunction with the traditional Turbo principle to afford an improved signal quality (e.g., BER).

It is further noted that exemplary embodiments of the invention may also be utilized in conjunction with other applications, such as magnetic storage, for example. An exemplary magnetic storage application would be implemented very similar to the Turbo equalization for communication systems, hi this application, there is an error correcting code and a dispersive channel and iterations are here performed between a detector for the channel and a decoder for the code.

Additional possible uses of the invention include demodulation of bit interleaved coded modulation and multi-user and/or multi-stream detection in coded systems. Furthermore, the principle may also be employed in the estimation of various parameters of the system such as channel coefficients, noise covariances, timing and frequency offsets and for initial synchronization, as non-limiting examples. Furthermore, the generalized Turbo principle may also be employed for joint source-channel decoding, where the structure imposed by a source encoder is included in the decoding process. Examples of such source encoders include speech, audio and video encoders, as non-limiting examples. In all of the above cases, the generalized Turbo principle may provide improved performance over that of the traditional Turbo principle by improving the quality of the set marginals as specified by the regions in the graph.

Below are provided further descriptions of non-limiting, exemplary embodiments. The below-described exemplary embodiments are separately numbered for clarity and identification. This numbering should not be construed as wholly separating the below descriptions since various aspects of one or more exemplary embodiments may be practiced in conjunction with one or more other aspects or exemplary embodiments.

(1) In another exemplary embodiment, and as shown in Figure 5, a method comprising: receiving an encoded signal (501); and decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components (502).

A method as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. A method as in any above, wherein the components may be represented by Markov random fields. A method as in any above, wherein the components comprise at least two Markov chains. A method as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. A method as in any above, wherein at least two Markov chains correspond to a Turbo-like code. A method as in any above, wherein the method is utilized by a wireless communication system. A method as in any above, wherein the method is implemented by a computer program.

(2) In another exemplary embodiment, a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, said operations comprising: receiving an encoded signal (501); and decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components (502).

A program storage device as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multiuser system and other components for which a traditional Turbo principle may be employed. A program storage device as in any above, wherein the components may be represented by Markov random fields. A program storage device as in any above, wherein the components comprise at least two Markov chains. A program storage device as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. A program storage device as in any above, wherein at least two Markov chains correspond to a Turbo-like code. A program storage device as in any above, wherein the program storage device comprises an element of a wireless communication system.

(3) In another exemplary embodiment, an apparatus (14) comprising: means for receiving (22) an encoded signal; and means for decoding (44) the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

An apparatus as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. An apparatus as in any above, wherein the components may be represented by Markov random fields. An apparatus as in any above, wherein the components comprise at least two Markov chains. An apparatus as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. An apparatus as in any above, wherein at least two Markov chains correspond to a Turbo-like code.

An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system (12). An apparatus as in any above, wherein the means for receiving comprises a receiver and the means for decoding comprises a decoder. An apparatus as in any above, further comprising: means for encoding (38) a signal utilizing the generalized Turbo principle, and means for transmitting (22) the encoded signal. An apparatus as in any above, wherein the means for encoding comprises an encoder and the means for transmitting comprises a transmitter. An apparatus as in any above, further comprising means for processing (18). An apparatus as in any above, further comprising a data processor (18). An apparatus as in any above, further comprising means for storing (20) a computer program (24). An apparatus as in any above, further comprising: a memory (20) configured to store a computer program (24). An apparatus as in any above, wherein the apparatus comprises a terminal, a mobile node, a mobile phone, a cellular phone, a base station or a relay node.

(4) hi another exemplary embodiment, an apparatus (14) comprising: a receiver (22) configured to receive an encoded signal; and a decoder (44) configured to decode the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components.

An apparatus as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. An apparatus as in any above, wherein the components may be represented by Markov random fields. An apparatus as in any above, wherein the components comprise at least two Markov chains. An apparatus as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. An apparatus as in any above, wherein at least two Markov chains correspond to a Turbo-like code.

An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system (12). An apparatus as in any above, wherein the apparatus comprises a terminal, a mobile node, a mobile phone, a cellular phone, a base station or a relay node. An apparatus as in any above, further comprising: an encoder (38) configured to encode a signal utilizing the generalized Turbo principle, and a transmitter (22) configured to transmit the encoded signal. An apparatus as in any above, further comprising a data processor (18). An apparatus as in any above, further comprising: a memoiy (20) configured to store a computer program (24).

(5) In another exemplary embodiment, and as shown in Figure 6, a method for generating a graph structure for a system having a plurality of components, comprising: defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of the plurality of components, wherein each component specifies a subset of the set of secondary regions, wherein the set of primary regions and the set of secondary regions comprise the generated graph structure.

A method as above, wherein the generated graph structure is representative of at least two intersecting Markov chains, wherein the set of secondary regions describes intersections between primary regions in accordance with the at least two Markov chains, wherein each Markov chain specifies a subset of the set of secondary regions. A method as in any above, further comprising: optimizing the generated graph structure by performing at least one of direct loop removal, merging and enlarging regions in order to minimize loop feedback while balancing an increase in complexity with at least one performance requirement, wherein said optimizing utilizes said intersections or at least one subset of said intersections. A method as in any above, further comprising: performing generalized belief propagation on the generated graph structure, wherein an exchange of information is handled by an exchange of region beliefs. A method as in any above, further comprising: utilizing the generated graph structure to decode a signal. A method as in any above, wherein the method is utilized by a wireless communication system. A method as in any above, wherein the method is implemented by a computer program. A method as in any above, wherein the set of variables comprises variables corresponding to at least one of: symbols, coded bits and information bits.

(6) In another exemplary aspect of the invention, a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations for generating a graph structure for a system having a plurality of components, said operations comprising: defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of the plurality of components, wherein each component specifies a subset of the set of secondary regions, wherein the set of primary regions and the set of secondary regions comprise the generated graph structure.

A program storage device as above, wherein the generated graph structure is representative of at least two intersecting Markov chains, wherein the set of secondary regions describes intersections between primary regions in accordance with the at least two Markov chains, wherein each Markov chain specifies a subset of the set of secondary regions. A program storage device as in any above, said operations further comprising: optimizing the generated graph structure by performing at least one of direct loop removal, merging and enlarging regions in order to minimize loop feedback while balancing an increase in complexity with at least one performance requirement, wherein said optimizing utilizes said intersections or at least one subset of said intersections. A program storage device as in any above, the operations further comprising: performing generalized belief propagation on the generated graph structure, wherein an exchange of information is handled by an exchange of region beliefs. A program storage device as in any above, the operations further comprising: utilizing the generated graph structure to decode a signal. A program storage device as in any above, wherein the program storage device comprises an element of a wireless communication system. A program storage device as in any above, wherein the set of variables comprises variables corresponding to at least one of: symbols, coded bits and information bits.

(7) In another exemplary embodiment of the invention, an apparatus comprising: means for defining a set of primary regions, each primary region comprising one observation and a corresponding set of variables required for conditional independence of the observation; and means for defining a set of secondary regions describing intersections between two primary regions, wherein said intersections are in accordance with a component structure of a plurality of components of a system, wherein each component specifies a subset of the set of secondary regions, wherein the set of first regions and the set of second regions comprise a generated graph structure.

An apparatus as above, wherein the generated graph structure is representative of at least two intersecting Markov chains, wherein the set of secondary regions describes intersections between primary regions in accordance with the at least two Markov chains, wherein each Markov chain specifies a subset of the set of secondary regions. An apparatus as in any above, further comprising: means for optimizing the generated graph structure by performing at least one of direct loop removal, merging and enlarging regions in order to minimize loop feedback while balancing an increase in complexity with at least one performance requirement, wherein said means for optimizing utilizes said intersections or at least one subset of said intersections. An apparatus as in any above, wherein the means for optimizing comprises a processor.

An apparatus as in any above, further comprising: means for performing generalized belief propagation on the generated graph structure, wherein an exchange of information is handled by an exchange of region beliefs. An apparatus as in any above, wherein the means for performing generalized belief propagation comprises a processor. An apparatus as in any above, further comprising: means for decoding a signal utilizing the generated graph structure. An apparatus as in any above, wherein the means for decoding comprises a decoder. An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system. An apparatus as in any above, further comprising: means for storing the generated graph structure. An apparatus as in any above, wherein the means for storing comprises a memory. An apparatus as in any above, wherein the apparatus comprises: a terminal, a user equipment, a mobile device, a mobile phone, a network equipment or a base station. An apparatus as in any above, wherein the means for defining a set of primary regions and the means for defining a set of secondary regions comprise a processor. An apparatus as in any above, wherein the set of variables comprises variables corresponding to at least one of: symbols, coded bits and information bits.

(8) In another exemplary embodiment of the invention, an apparatus comprising: a processor configured to define a set of primary regions and to define a set of secondary regions describing intersections between two primary regions, wherein the set of first regions and the set of second regions comprise a generated graph structure; and a memory configured to store the generated graph structure, wherein each primary region comprises one observation and a corresponding set of variables required for conditional independence of the observation, wherein said intersections are in accordance with a component structure of a plurality of components of a system.

An apparatus as above, wherein the generated graph structure is representative of at least two intersecting Markov chains, wherein the set of secondary regions describes intersections between primary regions in accordance with the at least two Markov chains, wherein each Markov chain specifies a subset of the set of secondary regions. An apparatus as in any above, wherein the processor is further configured to optimize the generated graph structure by performing at least one of direct loop removal, merging and enlarging regions in order to minimize loop feedback while balancing an increase in complexity with at least one performance requirement, wherein said optimizing utilizes said intersections or at least one subset of said intersections.

An apparatus as in any above, wherein the processor is further configured to perform generalized belief propagation on the generated graph structure, wherein an exchange of information is handled by an exchange of region beliefs. An apparatus as in any above, further comprising: a decoder configured to decode a signal utilizing the generated graph structure. An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system. An apparatus as in any above, wherein the apparatus comprises: a terminal, a user equipment, a mobile device, a mobile phone, a network equipment or a base station. An apparatus as in any above, wherein the set of variables comprises variables corresponding to at least one of: symbols, coded bits and information bits.

(9) In another exemplary embodiment, and as shown in Figure 7, a method comprising: encoding a signal such that the encoded signal is configured to be decoded utilizing a generalized Turbo principle wherein region beliefs are exchanged between components (701); and transmitting the encoded signal (702).

A method as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. A method as in any above, wherein the components may be represented by Markov random fields. A method as in any above, wherein the components comprise at least two Markov chains. A method as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. A method as in any above, wherein at least two Markov chains correspond to a Turbo-like code. A method as in any above, wherein the method is utilized by a wireless communication system. A method as in any above, wherein the method is implemented by a computer program.

(10) In another exemplary embodiment, a program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, said operations comprising: encoding a signal such that the encoded signal is configured to be decoded utilizing a generalized Turbo principle wherein region beliefs are exchanged between components; and transmitting the encoded signal.

A program storage device as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multiuser system and other components for which a traditional Turbo principle may be employed. A program storage device as in any above, wherein the components may be represented by Markov random fields. A program storage device as in any above, wherein the components comprise at least two Markov chains. A program storage device as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. A program storage device as in any above, wherein at least two Markov chains correspond to a Turbo-like code. A program storage device as in any above, wherein the program storage device comprises an element of a wireless communication system.

(11) In another exemplary embodiment, an apparatus comprising: means for encoding a signal such that the encoded signal is configured to be decoded utilizing a generalized Turbo principle wherein region beliefs are exchanged between components; and means for transmitting the encoded signal.

An apparatus as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. An apparatus as in any above, wherein the components may be represented by Markov random fields. An apparatus as in any above, wherein the components comprise at least two Markov chains. An apparatus as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. An apparatus as in any above, wherein at least two Markov chains correspond to a Turbo-like code. An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system. An apparatus as in any above, wherein the means for encoding comprises an encoder and the means for transmitting comprises a transmitter. An apparatus as in any above, further comprising: means for receiving a signal and means for decoding the received signal using the generalized Turbo principle. An apparatus as in any above, wherein the means for receiving comprises a receiver and the means for decoding comprises a decoder.

(12) In another exemplary embodiment, an apparatus comprising: an encoder configured to encode a signal such that the encoded signal is configured to be decoded utilizing a generalized Turbo principle wherein region beliefs are exchanged between components; and a transmitter configured to transmit the encoded signal.

An apparatus as above, wherein the components comprise elements of a structure representing at least one of a code, a communication channel, users in a multi-user system and other components for which a traditional Turbo principle may be employed. An apparatus as in any above, wherein the components may be represented by Markov random fields. An apparatus as in any above, wherein the components comprise at least two Markov chains. An apparatus as in any above, wherein one of the at least two Markov chains corresponds to a convolutive communication channel and another of the at least two Markov chains corresponds to a convolutional code. An apparatus as in any above, wherein at least two Markov chains correspond to a Turbo-like code. An apparatus as in any above, wherein the apparatus comprises an element of a wireless communication system. An apparatus as in any above, wherein the apparatus comprises a terminal, a mobile node, a mobile phone, a cellular phone, a base station or a relay node. An apparatus as in any above, further comprising: a receiver configured to receive a signal and a decoder configured to decode the received signal using the generalized Turbo principle.

(13) A system comprising: a first apparatus and a second apparatus, said first apparatus comprising means for encoding a signal utilizing a generalized Turbo principle wherein region beliefs are exchanged between components, and means for transmitting the encoded signal, said second apparatus comprising means for receiving the encoded signal; and means for decoding the received signal using the generalized Turbo principle. A system as above, further comprising one or more aspects of the exemplary embodiments of the invention as further described herein.

(14) A system comprising : a first apparatus and a second apparatus, said first apparatus comprising an encoder configured to encode a signal utilizing a generalized Turbo principle wherein region beliefs are exchanged between components, and a transmitter configured to transmit the encoded signal, said second apparatus comprising a receiver configured to receive the encoded signal; and a decoder configured to decode the received signal using the generalized Turbo principle. A system as above, further comprising one or more aspects of the exemplary embodiments of the invention as further described herein.

(15) A method comprising: encoding a signal utilizing a generalized Turbo principle; and transmitting the encoded signal. A method as above, further comprising one or more aspects of the exemplary embodiments of the invention as further described herein.

(16) A method comprising: receiving an encoded signal; and decoding the received signal using a generalized Turbo principle wherein region beliefs are exchanged between components. A method as above, further comprising one or more aspects of the exemplary embodiments of the invention as further described herein.

It should be appreciated that the exemplary embodiments of the invention, and particularly as related to exemplary methods, may be implemented by a computer program, a computer program product comprising program instructions tangibly embodied on a computer readable medium where execution of the program instructions comprises the steps or elements of the exemplary embodiments of the invention, and/or a program storage device (e.g., a computer readable medium, memory) readable by a machine and tangibly embodying a program of instructions executable by the machine for performing operations comprises the steps or elements of the exemplary embodiments of the invention.

While the exemplary embodiments have been described above in the context of a generalized wireless communication system, it should be appreciated that the exemplary embodiments of this invention are not limited for use with only this one particular type of system, and that they may be used to advantage in other wired and wireless communication systems.

It should be noted that the terms "connected," "coupled," or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are "connected" or "coupled" together. The coupling or connection between the elements can be physical, logical, or a combination thereof. As employed herein two elements may be considered to be "connected" or "coupled" together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical (both visible and invisible) region, as several non-limiting and non-exhaustive examples.

It should farther be noted that any reference to a "traditional Turbo principle" should be construed as referring to a conventional application or use of the Turbo principle.

In general, the various exemplary embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The exemplary embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of the non-limiting and exemplary embodiments of this invention.

Furthermore, some of the features of the preferred embodiments of this invention could be used to advantage without the corresponding use of other features. As such, the foregoing description should be considered as merely illustrative of the principles, teachings and exemplary embodiments of this invention, and not in limitation thereof.