Performs a fast continuous wavelet transform on long sequences by sequentially processing junks of the input signal and keeping only low-resolution output data by averaging to preserve memory. This is only useful for very long signals whose output does not fit into the available memory as a whole. It should not be used on short signals since boundary artefacts are automatically discarded (and those potentially dominate for short signals).
Usage
fcwt_batch(
x,
x_sample_freq,
sigma = 1,
y_sample_freq = 3/sigma_res(sigma, freq_end)$time,
freq_begin = 2 * x_sample_freq/length(x),
freq_end = x_sample_freq/2,
n_freqs = 2 * ceiling(log(du(freq_end/freq_begin), base = 1 +
sigma_freq_res_rel(sigma))),
freq_scale = c("log", "linear"),
max_batch_size = ceiling(1 * 10^9/(n_freqs * 8)/2),
n_threads = 2L,
progress_bar = FALSE
)
Arguments
- x
Real-valued time series. The time steps are assumed to be evenly spaced.
- x_sample_freq
Sampling rate of input time series
x
. This number primarily establishes a connection to physical units which is used in other frequency definitions as well as the units of the output data. Expects either a value with frequency units, generated withu()
, or a pure number, in which case it is interpreted in units of 'Hertz'.- sigma
Sets a dimensionless parameter \(\Sigma\) controlling the wavelet spread. Changing this parameter adjusts the time/frequency uncertainty balance, \(\Delta t = 4 \frac{\Sigma}{f}\), \(\Delta f = 4 \frac{f}{2\pi \Sigma}\). Larger (lower) value of sigma corresponds to a better (worse) frequency resolution and a worse (better) time resolution.
For more information, see
vignette("sigma", package = "fCWTr")
).Defaults to \(2\pi\). Note that there is not really a natural choice for sigma, it depends on the use case. So the default choice can very well be quite a bad choice (it probably is for audio data).
- y_sample_freq
Sampling rate of output time series, in frequency units (see
u()
). The default value is set so that it aligns with the physical time resolution of the highest frequency modes. The maximum allowed sampling rate is the sampling rate of the input signalx_sample_freq
.- freq_begin, freq_end
Optionally specifies the frequency range
[freq_end, freq_begin]
. If not specified the maximal meaningful frequency range, depending on the input signal, is taken. A frequency-valued number, generated withu()
, or a pure number, that is interpreted in units of 'Hertz'.- n_freqs
Number of frequency bins generated by the CWT. The frequencies are linearly or logarithmically distributed, depending on the
freq_scale
argument. Computation time increases when raising the number of frequency bins. The default number is chosen such that the frequency bandwidths are of the size of the physical frequency resolutions in case of a logarithmic scale. In some sense this value constitutes the "physical" limit. However, increasingn_freqs
beyond this limit can still make sense: If we have a signal of a precise frequency, and the probing frequency band is not exactly aligned, the output signal will be dampened. In this case it can be useful to increasen_freqs
and perform appropriate averaging afterwards: the output signal will show improved response.- freq_scale
(
"log"
|"linear"
) Should the frequency scale be linear or logarithmic? "linear" / "log" for linear / logarithmic. The default scale is logarithmic since differences on a logarithmic scale are proportional to the frequency uncertainties. In this sense, the logarithmic frequency scale is actually the natural scale for the continuous wavelet transform.- max_batch_size
The maximal batch size that is used for splitting up the input sequence. This limits the maximal memory that is used. Defaults to roughly 1GB, being conservative and taking into account that R might make copies when further processing it. The actual batch size is the largest batch size that is smaller than
max_batch_size
and compatible with the requestedy_sample_freq
. You should aim to set the batch size as large as possible given your memory constraints (boundary effects become larger the smaller the batch size).- n_threads
Number of threads used by the computation, if supported by your platform. Defaults to 2 threads (to accommodate CRAN requirements). If
openmp_enabled()
returnsFALSE
, this argument is ignored, and only a single thread is used.- progress_bar
Monitoring progress can sometimes be useful when performing time consuming operations. Setting
progress_bar = TRUE
enables printing a progress bar to the console, printing the "loss ratio" and the number of batches. The loss ratio is a number between 0 and 1 and indicates how much of the batch computation has to be thrown away due to boundary artefacts. The higher the batch size the smaller the loss ratio will be. Defaults toFALSE
.
Value
The spectogram, a numeric real-valued matrix with dimensions roughly
dim ~ c(length(x) * x_sample_freq / y_sample_freq, n_freqs)
.
The exact length of the output depends on boundary effect details.
This matrix is wrapped into a S3-class "fcwtr_scalogram" so that plotting and
coercion functions can be used conveniently.
Details
In case of input sequences that exceed the a certain size, the output
sequence will not fit into the local memory and the fcwt cannot be
performed in one run.
For instance, in case of processing a song of 10 minutes length (assuming
a sampling rate of 44100 Hz), the size of the output vector is
10 * 60 seconds * 44100 Hz * nfreqs * 8 bytes
,
which for e.g. nfreqs = 200
, equals ~ 42 GB, hence
nowadays already at the limit of the hardware of a modern personal computer.
In cases where the required output time-resolution is smaller than the time
resolution of the input signal, one can perform the fcwt()
and reduce the
output size by averaging.
(The input signal time resolution can in general not be reduced since
high-frequency information would get lost.)
This function splits up the input sequence into batches, processes each batch separately, reduces the time resolution, and adds the outputs together.
Attention: In contrast to fcwt()
boundary artefacts are automatically removed,
so some information at the beginning and the end of the
sequence is lost. (The amount depends on the minimal frequency captured min_freq
.)
Examples
fcwt_batch(
ts_sin_sin,
x_sample_freq = u(44.1, "kHz"),
sigma = 10,
y_sample_freq = u(100, "Hz"),
freq_begin = u(100, "Hz"),
freq_end = u(11000, "Hz"),
n_freqs = 30
)
#> _Scalogram_
#> * (Time/Frequency) dimension: ( 60 , 30 )
#> * Sampling rate: 100 [Hz]
#> * Frequency scale: 100 [Hz] - 11000 [Hz], log
#> * Time offset: 0.2 [s]
#> * Sigma: 10
#> o Time resolution at 100 [Hz] : 0.4 [1/Hz]
#> o Time resolution at 11000 [Hz] : 0.003636364 [1/Hz]
#> o Relative frequency resolution: 0.06366198
#> * Time/frequency matrix summary
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000e+00 8.500e-09 1.850e-08 1.991e-05 4.640e-08 2.117e-03