Anomaly Detection System

This invention relates to anomaly detection systems. In particular, it relates to systems for the detection of anomalies in a medium having known or predictable statistical characteristics. An anomaly in the context of this specification is an element within an observed field that does not behave in the same statistical manner as the surrounding medium.

Systems may be concerned with the detection of an anomaly on the surface of water or land, where the anomaly is an electromagnetic reflector, in which case the detection system may involve a radar, lidar, or other such

electromagnetic observing equipment. Systems may also be concerned with the propagation of signals through a heterogeneous media, such as laser light through the atmosphere, or acoustic signals propagating through gases or liquids. Other systems may be concerned with financial time-series analysis, such as share price information, where it is advantageous to be able to separate anomalous behaviour from a well understood underlying process. Many of the signals involved in such systems may be modelled as a vector diffusion process. More information on this can be found in Oskendal, B.

1998, Stochastic Differential Equations - An Introduction with Applications, 5^{th} Edition. Springer

One current method of anomaly detection often used when searching for electromagnetic reflectors is to use an adaptive clutter filter. There are many schemes for this class of detector. For example, a filter may be formed by assuming that the overall shape of the clutter spectrum is a Gaussian, and estimating of the position of the central clutter frequency and the width of the clutter spectrum. The estimates are typically made by averaging several spectra from a region thought to contain only clutter. There are

disadvantages with such schemes. Firstly, if the range of target frequencies falls within the clutter band, then the target will be suppressed after application, of the clutter filter. Secondly the estimate of the clutter spectrum may be contaminated with targets and / or that the average clutter spectrum may not adequately represent the clutter which is to be suppressed (i.e. there is a high degree of local variation in the clutter spectrum). Also, the clutter spectrum may change over the interval during which the several spectra are measured.

For more information on this, see for example Chapter 6 (MTI, MTD and adaptive clutter cancellation, Prof Y. H. Mao) in 'Advanced Radar techniques and systems', Edited by G, Galati, Published by Perter Peregrin us Ltd for IEE, 1993, ISBN 0 86341 172 X.

Another method of anomaly detection is to use a neural network, or self-organising system. Such systems are able to discover structure in data, and hence potentially detect anomalies, with no prior knowledge of the statistics of the data. One such system is the Stochastic Vector Quantiser, more details of which can be found in S. P. Luttrell, "Using Stochastic Vector Quantisers to characterise signal and noise subspaces", Proc. 5^{th} IMA Int. Conf. on

Mathematics in Signal Processing, 2000. A problem with this technique is that the neural network must be trained, which diverts time and resources from the actual detection process. Also, if a trained system is to be employed on inputs having different statistical properties, then the neural network must be retrained from scratch.

According to the present invention there is provided an anomaly detection system arranged to have as an input a signal comprising a plurality of measurements, and arranged to provide an output signal indicative of the presence of an anomaly, characterised in that the system is also arranged: a. to compute a set of differences between successive input

measurements;

b. to estimate or compute the expectation of the square of the modulus of the set of differences;

c. to correlate the expectation estimated in b. with a function of the input signal to create a correlation signal; and

d. to provide the output signal indicative of the presence of an anomaly, the output signal being generated from the correlation signal.

The invention provides the advantage that generation of an output indicative of an anomaly does not require an ensemble average of many input signals, nor does it require a training sequence. A single sequence of measurements is sufficient for the invention to provide a result as to the presence or otherwise of an anomaly. ^{~}

The invention is applicable wherever the input data can be modelled as a stochastic process via a stochastic differential equation (SDE). The volatility coefficient of the SDE, which measures the amplitude of the local time fluctuation in the process, is particularly important since it is observable without recourse to an ensemble average. Identification of the volatility coefficient in the SDE enables a comparison to be made between the theoretically predicted and empirically observed volatilities.

Mathematically, the invention may be stated as follows: for any input quantity q_{t},

l if no anomaly is present, and departs from 1 by some predetermined threshold amount to indicate an anomaly. Here, c(X,Y) is the correlation function, where X and Y are the empirically observed and theoretically predicted volatilities respectively.. The observed volatility term \δq_{t}\^{2} comprises the set of differences between successive input signals, whilst with the theoretically predicted volatility term ?(q_{t}), the function ( must be deduced. The ^{"}departure from unity of c to be regarded as indicative of an anomaly will be set according to system performance and system parameters such as desired false alarm rate etc.

Electromagnetic scattering from random media is typically r -distributed (as will be described later) in the intensity variable, and in such a -distributed domain the volatilities (as explained above) are closely correlated. This feature is utilised in the current anomaly detection system when applied to electromagnetic input signals. Such signals will generally be complex-valued. For -distributed data, the square of the observed volatility, σ_{Ψ}^{2} is

approximately equal to the input signal z_{t}, and so a departure from an approximately unity correlation between |<5Ψ,|^{2} and z_{t} is indicative of an anomaly.

Inputs from non-electromagnetic signals may be real-valued. Such an example is a financial time series, S_{t}. St may, for example, comprise of company share price values overtime. In general, S_{r} will not be r -distributed. In this case the function to be correlated with the set of differences will need to be determined from a model of the underlying financial process. Stochastic volatility models describing financial data exist, and the invention can be applied to a suitable a^{2} (St ) function derived from such a model, see e.g. the discussion of interest rate models for which the volatility function can take various forms according to the characteristics of the model, in Baxter, M. and Rennie, A. 1996 Financial Calculus - An introduction to derivative pricing, Cambridge University Press.

If the. input signal is expected to be -distributed, the system may be arranged to calculate the error between the intensity distribution of the input signal and a theoretical -distribution, and may provide an indication relating to the degree of match between the two. This may be used as an indication of the reliability of the result given by the anomaly detection system. Likewise, if the input signal is expected to follow some other intensity distribution, the intensity distribution of the actual input signal may be compared with the appropriate theoretical distribution and the comparison used to provide an indication of reliability.

The correlation function used to correlate the fluctuations with their prediction as some function of the input signal is preferably the standard statistical correlation function:

where X and Y represent the two vectors to be correlated.

The input data may be sampled at intervals that need not be specified in advance. The increment of a discrete-time sampled amplitude diffusion process, as utilised in the present invention, satisfies the equation

<5Ψ, = μ_{t}δt + σ_{t}n_{t}δt^{υι} [+ higher order terms] (Eqn 2)

as given in Oskendal, 1998. Here, nt is drawn from a normal distribution with zero mean and unit variance, so = 1 , where (^{■}} denotes the expectation.

The higher order terms will clearly comprise those of order δt^{3/2}, δt^{2}, δt^{5/2} etc.

The sample interval is preferably small enough so that δt^{112} » δt , which means that the volatility component, σ_{t} of equation 2 will dominate over the drift component μ_{t}, and so <5Ψ,^{2} » σ^{2}n^{2} δt [+ higher order terms]

For completeness, this may be restated on a general timescale as

δq_{t} = Bμ,δt + B^{υ2}σ_{l}n_{t}δt^{m} [+ higher order terms], where δq_{t} is some increment of the dimensionless quantity q_{t}, and β is a constant, having dimensions of frequency. Thus, Bδt is a dimensionless quantity, and the sampling criterion can be recast as (Bδt)^{'} » 1.

This implies a root-sampling frequency (or value in the case of the recast (Bδt ^{m}) » 1 , such as 10, 50, 100, 1000, 10000 or 100000.

According to another aspect of the invention there is provided a method of detecting an anomaly in an input signal, wherein the input signal comprises a plurality of measurements, the method comprising the steps:

a. computing a set of differences between successive input

measurements;

b. estimating or computing the expectation of the square of the modulus of the set of differences;

c. correlating the expectation estimated in b. with a function of the. input signal to create a correlation signal; and . d. comparing the correlation signal against a predetermined threshold, wherein an. anomaly is indicated should the correlation signal fall below the threshold.

The invention will now be described in more detail, by way of example only, with reference to the following Figures, in which:

Figure 1 (prior art) diagrammatically illustrates a radar system in which may be implemented the current invention;

Figure 2 shows a plot of unprocessed baseband radar intensity data comprising sea clutter and a single target;

Figure 3 shows a graph of a theoretical K-distribution overlaid with the distribution of measured data;

Figure 4 shows a graph of the correlation function with respect to range cell, for sea clutter data, according to the present invention;

Figure 5 shows a graph of an alternate representation of the correlation function, such that the persistence over time of the anomaly can be seen;

Figure 6 shows a graph of two signals input to the correlation function, the signals taken at a range where the correlation is greatest;

Figure 7 shows a graph of two signals input to the correlation function, the signals taken at a range where the correlation is least;

Figure 8 shows a graph of the permuted time representation of the smoothed data of Figure 6;

Figure 9 shows a graph of the permuted time representation of the smoothed data of Figure 7; and

Figure 10 shows a block diagram of the steps performed on input data during the anomaly detection process;

Illustrated in Figure 1 is a radar system able to transmit and receive electromagnetic waves. For the purposes of the current invention the energy received will be a portion of the energy that has been transmitted towards a scene, and scattered from that scene. The scene may typically comprise a target 1 that is floating on seawater 2. The radar system comprises an antenna 3 that receives the scattered electromagnetic waves and passes them to a front end processing system 4. The processing system 4 amplifies, mixes down and detects the energy, the detected energy comprising components reflected from the target itself, the waves and other elements of the seawater, and other stray signals and thermal noise. The detected energy is digitised at this stage to produce a digital signal before being passed to the baseband signal processing section 5. The current invention is implemented within this baseband signal processing section 5. After the digital signal has been processed in the signal processing section 5, it is passed to the display section 6 such that it may be observed by an operator.

Illustrated in Figure 2 is a plot of a baseband signal such as may be input to a baseband signal processing unit. It is raw, unprocessed data measured over a three second time interval and comprises a sequence of three thousand samples taken at 1ms intervals from a fixed viewpoint. Each sample contains information from 1024 range bins each 0.3 metres long. A tethered target is positioned at range bin 839, although this is not obvious from merely looking at Figure 2. Note that Figure 2 only shows the intensity - in reality, the data is complex, in-phase and quadrature-phase (l,Q) data that is used in the generation of an amplitude process of the form

with t and x representing sample number and range respectively.

Data generated from electromagnetic scattering from the sea surface, commonly known as sea clutter, is typically substantially r -distributed (see for example Jakeman, E & Pusey, P N, "A Model for Non-Rayleigh Sea Echo", IEEE Trans. Antennas and Propagation 1976, AP-24 No. 6, 806-814). Figure 3 shows the distribution of data taken from forward scattering measurements from a random phase screen (dotted line) overlaid with a best fit theoretical rC-distribution (solid line). The random phase screen was created by producing a turbulent, rising airflow by means of natural convection from a heating element, and phase shifts imposed on laser radiation passing through the airflow used to generate the forward scattering measurement data. The close match between these measurements and the theoretical rC-distribution is evident. The discrepancy in the tail end of the data is of minor concern for most applications, as this region represents high intensity signals, and such high intensities have exponentially small probabilities. More details of the random phase screen can be found in Ridley et al, "Heterodyne

Measurements of Laser Light Scattering by a Turbulent Phase Screen", Applied Optics, Vol 41 No 3, 2002. The forward scattering measurements are very similar to measurements taken from sea clutter in terms of their distribution. Variations between a theoretical rC-distribution and the input intensity data (given by z_{t} = l^{2}_{t} + Q^{2}i) distribution can be quantified for any given set of input data, and if the variation is found to be greater than some threshold, a warning can be flagged up to the display system (6 of Figure 1) to indicate that reduced detection accuracy is available for this data. The variation can be quantified using standard information theory techniques, such as by use of the Fisher information distance, details of which are provided in Wootters, W. K. 1981 Statistical distance and Hubert space, Phys. Rev. D 23, 357-362.

Once it has been established thatthe data is sufficiently r -distributed, the process of anomaly detection is performed. The data shown in Figure 2 is in-phase and quadrature-phase (lt,Q_{t}) data (as described in Helmstrom, C W, "Statistical Theory of Signal Detection" p. 11 Pergamon, Oxford, 1960. that is used in the generation of a complex valued amplitude process of the form Ψt(χ)~lt(χ)+iQt(x). Temporal data from each range is taken in turn and processed in the following manner.

Firstly, a set of differences between the complex data is taken at successive time steps δt:.

J Ψ, = Ψ_{/+}„ -Ψ, (eqn. 3)

Secondly, an estimate of the expectation (•) of J-5Ψ,] is evaluated. The estimation is an approximation of the expectation and is obtained by smoothing the raw

values by averaging over a time window [t - Δ, t + A], where Δ is a smoothing parameter for variable t. This means of estimation is valid because the sample paths of Ψ, are continuous with a unit probability. As part of the smoothing process, the samples may be ordered to give increased accuracy. Here, the input values are first permuted such that they are ordered according to their magnitude. Then, the same permutation is applied to the observed volatilities, and these volatilities are smoothed according to the above smoothing procedure. Finally, the permuted, smoothed samples are reordered back into their original time order.

Thirdly, a correlation of (|<5Ψ,|^{2}) with z_{r} is carried out using the standard correlation function of eqn 1. This is therefore of the form:

c> = <5Ψ, | (eqn. 4)

Figure 4 shows a graph of the results of this process carried out at each range shown in Figure 2. The smoothing parameter, Δ, used for this example was set to 50 samples. This figure should be chosen such that the correlation in known non-anomalous ranges is as high as possible. It can be seen that most ranges have a high correlation, indicating that the input data from those ranges is substantially r -distributed, and hence is not behaving in an anomalous fashion. There is one clear null centered at range cell 839, indicative of a target present at this range. The correlation plot may be input to a threshold circuit, and an output from this generated if the correlation falls below some predetermined figure.

Figure 5 again shows the correlation signal - shown as 1 -c for improved visualisation - but this time as a 3D plot, with each point on the time axis representing the average from a 1s window of data. The 20 points on the time axis thus represent 3 seconds of data. The Figure 5 illustrates the

persistence of the anomaly detection mechanism over time.

Figure 6 shows a comparison of two signals input to the correlation function, at a range where the value of the correlation is highest. The solid line is ?(z_{t}) (i.e. z_{t} whilst the dotted line represents the smoothed set of differences

<5Ψj ) . It can be seen that the two traces track each other quite closely.

Figure 7 shows a comparison of two signals as per Figure 6, but this time the signals are chosen so as to have a minimum correlation value. The very low correlation value (~9.5%) is, in this embodiment, indicative of the presence of an anomaly. This correlation value is shown as the deep null in Figure 4

Figure 8 shows the permuted time representation of predicted and observed squared volatility for the process |<5Ψ,| . The predicted values are permuted in ascending order; the required permutation is then also applied to the observed values, which are subsequently smoothed in the manner described above. The range cell having the maximum correlation value was again chosen, and so the data in Figure 8 corresponds to the data shown in Figure 6.

Figure 9 is a similar plot to that shown in Figure 8, but this time the range cell having the minimum correlation value was chosen. The data in Figure 9 therefore corresponds to the data shown in Figure 7.

The process carried out by the current invention is summarised in the block diagram of Figure 10. This shows the sequence of events occurring in the anomaly detection process specifically as applied to the radar embodiment.

Box (a) indicates the data input procedure. Here, the data should be sampled at a rate such that δt^{m} » δt , as described above. The sampling will typically be done using analogue to digital converters (ADCs), which act upon the data received from the radar front end, where the data is downconverted to baseband or to some other frequency at which it is convenient to process it digitally. Note that this sampling procedure is prior art.

Box (b) shows the optional stage of checking the input data to see how closely it complies to a theoretical -distribution. If it is known that the input data should be K-distributed, then the data can be checked using the methods described above, and the resulting information used to provide a measure of confidence in the subsequent anomaly detection result. If the system were being used on data that was not K-distributed, such as financial data, then the check would be done on the appropriate theoretical distribution for that system.

Box (c) indicates the process of generating a set of differences between subsequent input readings. The expectation of the square of the differences act as one of the inputs to the correlation stage indicated in box (d). The other input is the theoretically predicted volatility for the inputs, c?( χ Details of the calculation of the expectation are provided above. Note that a

correlation value is generated for each range cell. For non K-distributed data, the volatility function σ^{2}(qD needs to be derived as appropriate. Such volatility functions exist in many fields, such as the financial field, and those skilled in the relevant arts will be aware of how to go about deriving them^{"}. ^{'}

Following the correlation stage, the correlation value generated therein is input to a threshold circuit, which generates an output indicative of an anomaly if the threshold falls below a predetermined value. This is indicated by box (e).

The example described above can clearly be implemented with the aid of an appropriate computer program recorded on a carrier medium and running on a conventional computer system. Such a program is straightforward for a skilled programmer to implement without requiring invention, because the mathematical procedures are well known computational procedures. Such a program and system will therefore not be described further.

The skilled person will be aware that other embodiments within the scope of the invention may be envisaged, and thus the invention should not be limited to the embodiments as herein described.