Opened 13 years ago

Closed 8 years ago

Last modified 8 years ago

#1401 closed enhancement (worksforme)

r.series for time series

Reported by: mmetz Owned by: grass-dev@…
Priority: normal Milestone: 7.4.0
Component: Raster Version: svn-trunk
Keywords: r.series, raster, time series Cc:
CPU: All Platform: All

Description

The attached patch adds weighing windows to r.series. This is useful for interpolation of NULL cells in time series data, e.g. with

r.series method=average window=gauss

which will provide more realistic results than the same command without a weighing window.

The weighing windows are taken from r.resamp.filter with modifications to the gauss and normal windows to make them pseudo-finite. At the moment, only one windowing function can be selected, i.e. coupling an infinite window with a finite window is currently not possible. In contrast to r.resamp.filter, the sinc window will never have negative weights. In my tests, the hann, hamming, and blackman windows seem to produce reasonable results even without coupling with a finite window.

Attachments (1)

r.series.patch (9.0 KB ) - added by mmetz 13 years ago.
patch to add weighing window to r.series

Download all attachments as: .zip

Change History (17)

by mmetz, 13 years ago

Attachment: r.series.patch added

patch to add weighing window to r.series

in reply to:  description ; comment:1 by glynn, 13 years ago

Replying to mmetz:

The attached patch adds weighing windows to r.series

A few points:

  1. This patch places the centre of the window in the centre of the time period; I would have expected this to be configurable. E.g. I can imagine situations where the most recent data would have the greatest weight, with weight decreasing as the data gets older. Unlike spatial data, time has an inherent asymmetry between past and future.
  1. I don't see the stated modifications to the gauss/normal or sinc functions. They appear to be identical to the r.series versions. Also, I'm not sure why those modifications would make sense; e.g. the fact that sinc/lanczos have negative weights is one of their advantages, as it results in sharper output than filters with only positive weights.
  1. Is "radius" really the correct term for one-dimensional data?

Possible enhancements:

Allowing arbitrary weights (one per input map) to be read from a file and/or a command line option.

Accepting a list of weight maps (one per input map), i.e. each input cell is a value/weight pair rather than just a value.

in reply to:  1 comment:2 by glynn, 13 years ago

Replying to glynn:

They appear to be identical to the r.series versions.

Er, I mean the r.resamp.filter versions.

in reply to:  1 ; comment:3 by mmetz, 13 years ago

Replying to glynn:

Replying to mmetz:

The attached patch adds weighing windows to r.series

A few points:

  1. This patch places the centre of the window in the centre of the time period; I would have expected this to be configurable. E.g. I can imagine situations where the most recent data would have the greatest weight, with weight decreasing as the data gets older. Unlike spatial data, time has an inherent asymmetry between past and future.

Sometimes yes. You could also achieve this with r.series in=map01,map02,map03,map04,map03,map02,map01. The patch assumes, as does the existing r.series for regression parameters, that the input is ordered and the (time) distance between maps is constant. This can be exploited by changing the order of the input maps. r.series in=map01,map01,map01,map02,map02,map03,map04 should give more weight to map01 and map02 and less to map03 and map04, effectively skewing the weight function.

  1. I don't see the stated modifications to the gauss/normal or sinc functions. They appear to be identical to the r.series versions. Also, I'm not sure why those modifications would make sense; e.g. the fact that sinc/lanczos have negative weights is one of their advantages, as it results in sharper output than filters with only positive weights.

The gauss/normal functions have a radius of 0 (zero) in r.resamp.filter/main.c:L102 and L103 to indicate that they are infinite I assume. I have changed this radius to 4 for gauss and 8 for normal since normal = f_gauss(x/2) / 2. These functions now cover the 99.9% confidence interval for the number of input maps (see also adjustment of the gauss window in v.kernel).

The sinc function becomes negative for x larger than an odd multiple of PI and smaller than the next even multiple of PI. In the patch, x is always >= 0 && <= PI. In order to have a sharper sinc window, something like sinc2 and sinc3 would be needed, like lanczos2 and lanczos3. Or an additional option radius/distance.

  1. Is "radius" really the correct term for one-dimensional data?

Not sure. It comes from cut'n paste and I kept the term to keep the code similar. The user doesn't see the term anyway.

Possible enhancements:

Allowing arbitrary weights (one per input map) to be read from a file and/or a command line option.

I was thinking about that too, similar to r.neighbors.

Accepting a list of weight maps (one per input map), i.e. each input cell is a value/weight pair rather than just a value.

Hmm, I would need a new module anyway to calculate regression parameters between two time series. The second time series could also be treated as weight maps if a weighed version of the selected function is available.

The current patch already makes r.series quite complicated, that's why I did not introduce more options like skew or radius/distance or custom weights.

in reply to:  3 ; comment:4 by glynn, 13 years ago

Replying to mmetz:

The more I think about it, the less sense this patch appears to make.

Filtering would make sense if you were generating a (time-domain) filtered version of the data (with the filter radius on the order of the sample interval) then computing the aggregate over the filtered data (i.e. a 1-dimensional analogue of r.resamp.filter as a pre-processing stemp), but that isn't what's happening.

Computing a weighted aggregate would make sense if the weights were inputs, e.g. "confidence" measures for individual maps or individual cells. It might also make sense with calculated weights where the weighting function had some theoretical basis (e.g. weight decreasing with age). But that isn't what's happening either.

Maybe I'm missing something, but I can't see how any of the r.resamp.filter functions actually make sense as a weighting function for time-series data.

in reply to:  4 ; comment:5 by mmetz, 13 years ago

Replying to glynn:

Replying to mmetz:

The more I think about it, the less sense this patch appears to make.

Filtering would make sense if you were generating a (time-domain) filtered version of the data (with the filter radius on the order of the sample interval) then computing the aggregate over the filtered data (i.e. a 1-dimensional analogue of r.resamp.filter as a pre-processing stemp), but that isn't what's happening.

Computing a weighted aggregate would make sense if the weights were inputs, e.g. "confidence" measures for individual maps or individual cells. It might also make sense with calculated weights where the weighting function had some theoretical basis (e.g. weight decreasing with age). But that isn't what's happening either.

Maybe I'm missing something, but I can't see how any of the r.resamp.filter functions actually make sense as a weighting function for time-series data.

I came up with this path because I want to do NULL cell interpolation in time. The time series data need to change gradually over time, e.g. NDVI or temperature, not e.g. daily rainfall.

I will try to explain the motivation for this patch with an example. MODIS 16 day NDVI comes with many NULL cells. Spatial interpolation of these NULL cells is problematic because NDVI does not necessarily change gradually in space. NDVI changes gradually in time for most natural land cover types if the time steps are not too large. I assume that NDVI at a given time step will be more similar to the next or previous time step and less similar to, say, 5 time steps before or ahead. Therefore I want to give higher weight to time steps closer to the given time step instead of using a simple arithmetic mean.

More accurate would probably be some interpolation method. I can not use the Rast_interp_*() methods because they return NULL if any of the input values is NULL, instead I would need a non-uniform method as implemented in the RST and lidar (bspline) libs, but for 1D.

Since NDVI is cyclic, an FFT approach would also be an option, but again FFT can not handle NULL cells, NULLs need to be replaced first with a reasonable fill value.

in reply to:  5 ; comment:6 by glynn, 13 years ago

Replying to mmetz:

I will try to explain the motivation for this patch with an example.

Okay, so you're filtering in the time domain, then taking a single sample from the middle of the interval?

Even so, I would suggest making the "radius" (i.e. cut-off frequency) a parameter, and probably also the centre (although that can be fudged by padding with null maps).

In general, the cut-off frequency shouldn't depend upon the filtering method; the reason the various filters in r.resamp.filter have associated radii is so that the cut-off frequency doesn't vary with the filter method (i.e. the user-supplied radius value is normalised).

Also, if you're likely to want more than one output, extending r.series to support multiple outputs (with different offsets) would be significantly more efficient than multiple runs of r.series (reading the inputs could easily take more time than the actual processing).

in reply to:  6 ; comment:7 by mmetz, 13 years ago

Replying to glynn:

Okay, so you're filtering in the time domain, then taking a single sample from the middle of the interval?

I calculate the weighed average for a given point in time from a given time interval, same like r.resamp.filter calculates the weighed average for a given point in space from a given space interval, e.g. r.resamp.filter/main.c:L268,269,276 and w_ave() in lib/stats/c_ave.c. r.resamp.filter could just as well use w_ave() for each dimension. In the patch, the point in time is in the middle of the interval. Since there is no multiple centre option (yet), I guess this is what you mean with taking a single sample which is located in the middle of the interval.

Even so, I would suggest making the "radius" (i.e. cut-off frequency) a parameter, and probably also the centre (although that can be fudged by padding with null maps).

The radius is currently determined by the number of input maps. Additional options radius and centre would clearly speed up processing when several different outputs are needed for different points in time. The way I currently use the patched r.series, an option radius would only make sense in combination with a multiple option centre. The number of outputs would then be n_methods x n_centres.

In general, the cut-off frequency shouldn't depend upon the filtering method; the reason the various filters in r.resamp.filter have associated radii is so that the cut-off frequency doesn't vary with the filter method (i.e. the user-supplied radius value is normalised).

AFAICT, the patch does the same, the user-supplied radius is normalised. It's not clear to me how the term frequency can be applied to a gauss/normal distribution, for the sine and cosine-based filters this is obvious. The associated radii for gauss and normal are chosen such that filter(dx) is nearly 0 for dx = user-radius in order to be comparable with lanczos, hermite or cosine-based filters where cos(normalised_dx) = 0 for dx = user-radius. IOW, the gauss filter in r.resamp.filter sets the user-radius to 1 SD (soft filter), and the gauss filter in the patch for r.series sets the user-radius to 4 SD (sharper filter), granted that f_gauss() assumes SD = 1 which I am not sure about. It looks more like SD = 1/4, and for f_normal(), SD = 1/2 in the exponent and SD = 1 in the square root?

in reply to:  7 ; comment:8 by glynn, 13 years ago

Replying to mmetz:

I calculate the weighed average for a given point in time from a given time interval, same like r.resamp.filter calculates the weighed average for a given point in space from a given space interval,

In the case of r.resamp.filter, each output cell is typically calculated based upon a small subset of the input cells, not the entire input.

The way I currently use the patched r.series, an option radius would only make sense in combination with a multiple option centre.

It would make sense even for a single output, as it controls how rapidly the weight falls off with distance from the centre.

In general, the cut-off frequency shouldn't depend upon the filtering method; the reason the various filters in r.resamp.filter have associated radii is so that the cut-off frequency doesn't vary with the filter method (i.e. the user-supplied radius value is normalised).

AFAICT, the patch does the same, the user-supplied radius is normalised.

AFAICT, the code maps the input range to the width of the window function, so wider windows will be "sharper" (have a higher cut-off frequency), e.g. lanczos3 will be sharper than lanczos2.

It's not clear to me how the term frequency can be applied to a gauss/normal distribution,

r.resamp.filter implements FIR (finite impulse response) filtering (you get an error if you don't have at least one finite window function). If you calculate the DFT of any of the window functions, you'll get its frequency response. All of the functions are low-pass filters, as they're symmetric.

See http://en.wikipedia.org/wiki/Window_function for examples of common window functions and their frequency responses.

A piecewise-continuous function defined by sampled data can be considered a mixture (sum) of the underlying signal and quantisation noise. The intent of a low pass filter is to discard the quantisation noise while retaining the signal.

The cut-off frequency is normally chosen according to the sampling frequency, as the quantisation noise is dominated by the sampling frequency and its harmonics. In general, the cut-off frequency is inversely proportional to the width of the central "lobe" of the window function.

If you run r.resamp.filter with a specific radius, you get a specific cut-off frequency regardless of the method chosen. So while lanczos3 uses 3 times as large a window as lanczos1, the cut-off frequency remains the same. This is what I mean when I say that the radius is "normalised".

OTOH, if you map the range of inputs to the range of the filter, "long-tailed" filters with a wider domain (e.g. lanczos3) will have a narrower central lobe and thus a higher cut-off frequency.

The way that Lanczos filters are defined, the number of samples is supposed to be proportional to the order ("a" parameter), so lanczos3 should use 3 times as many samples (at the same sampling frequency, i.e. cover 3 times as large a time interval) as lanczos1 in order to get a similar frequency response (higher-order filters will fall off faster, but the frequency at which the fall-off starts should be the same).

See http://en.wikipedia.org/wiki/File:Lanczos-kernel.svg for an illustration. If both graphs were drawn on the same axes, they would have roughly the same shape, but the a=3 window would have a longer tail. By scaling the axes to the same width, the a=3 window has a narrower central lobe.

in reply to:  8 ; comment:9 by mmetz, 13 years ago

Replying to glynn:

Replying to mmetz:

The way I currently use the patched r.series, an option radius would only make sense in combination with a multiple option centre.

It would make sense even for a single output, as it controls how rapidly the weight falls off with distance from the centre.

I was reading up a bit and thinking about it. As I understand it, the fundamental difference between r.resamp.filter and the patched r.series is that for r.resamp.filter, the window size is a function of the user-given radius and the chosen (combination of) filtering window(s), whereas for r.series, the radius is a function of the chosen window size (number of input maps) and the filtering window. The reason for my implementation is a simpler user-interface. Otherwise, i.e. with a required option radius, warning/error messages are needed if the number of input maps exceeds or is smaller than the window size. And I wanted an easy way to test various filters with differing sharpness (e.g. lanczos1 vs. lanczos3) using the same window size.

In short, I think that a user-controlled window size results in a simpler user-interface, but an option radius would make the patch more versatile and is more in line with the concept of FIR filtering.

in reply to:  9 comment:10 by glynn, 13 years ago

Replying to mmetz:

I was reading up a bit and thinking about it. As I understand it, the fundamental difference between r.resamp.filter and the patched r.series is that for r.resamp.filter, the window size is a function of the user-given radius and the chosen (combination of) filtering window(s), whereas for r.series, the radius is a function of the chosen window size (number of input maps) and the filtering window.

A more fundamental difference is that r.resamp.filter performs convolution (i.e. a weighted sum is calculated for every cell), while r.series calculates a single value. The main reason why r.resamp.filter raises an error for an unbounded window is that it's almost certainly a user error. A secondary reason is that it's ridiculously slow; if there was a use for such cases, an FFT-based implementation would be needed.

In theory, convolution calculates an unbounded integral (or, in the discrete case, an infinite sum). The window size is merely an optimisation (there's no point in summing an infinite number of zeros), and shouldn't affect the result.

The reason for my implementation is a simpler user-interface. Otherwise, i.e. with a required option radius, warning/error messages are needed if the number of input maps exceeds or is smaller than the window size.

It's perfectly reasonable for the domain of the filter function to extend beyond the set of input maps, particularly for a "wide" filter such as lanczos3 or gauss. The correct solution is to simply clip the kernel to the window (and calculate the divisor accordingly), not to "squash" the kernel to fit the input range.

And I wanted an easy way to test various filters with differing sharpness (e.g. lanczos1 vs. lanczos3) using the same window size.

The various Lanczos filters should all have the same sharpness; higher-order filters should just be more accurate at the cost of performance. That is why the width of the filter's domain and the number of input samples shouldn't affect the result.

comment:11 by neteler, 10 years ago

Keywords: r.series added

comment:12 by martinl, 8 years ago

Milestone: 7.0.07.0.5

comment:13 by martinl, 8 years ago

Is it still an issue?

comment:14 by martinl, 8 years ago

Milestone: 7.0.57.3.0

in reply to:  13 comment:15 by mmetz, 8 years ago

Resolution: worksforme
Status: newclosed

Replying to martinl:

Is it still an issue?

Not really because r.series accepts weights as additional input, thus users can pre-compute weights according to their needs.

comment:16 by martinl, 8 years ago

Milestone: 7.3.07.4.0

Milestone renamed

Note: See TracTickets for help on using tickets.