I have added a lot of features to my signal processing library and I reworked the foundations. So, a lot of things to cover in this post. And a funny (but very simple) Voice Activity Detection example at the end.
1. Foundations (Fixed, Signal)
1.1. Fixed point
By better use of the types, I was able to avoid code duplication. So, it allowed me to improve the fixed point type. Here is an example of what can be expressed with the new type:
It is a fixed point word, signed, coded on 16 bits, using 14 bits for the fractional part, saturated arithmetic and no rounding (for multiplications).
You may think that perhaps it would be better to use different operators for saturated arithmetic. $R^2$ and $C$ are considered different and yet they are both implemented using pair of real numbers. I am following this philosophy. A number is not only defined by its implementation but also by what you can do with it.
The advantage of this approach is that I can easily test the same algorithm with different arithmetics.
I have started writing unitary tests. It is boring so I am progressing slowly but it has to be done because it is a critical part of this library.
It was cumbersome to track the sampling frequency so now it is included in the Signal data type. I have added a few optimization rules to be sure that list fusion is still taking place.
I have also introduced a new BSignal type for bounded (in time) signal. Using lists for that purpose was not always clear.
2. Rounding errors and fixed points dynamics
Finding the right fixed point representation for each step of an algorithm can be tricky. Some intermediate computations may require a higher accuray. So, to help for this analysis I have added a tracing mechanism.
It is as simple as writing:
Several traces can be active in the same code (and use a different string for identification)
then, once you have enough data, you can trace it using:
traceValues will display a log-log plot of the probability distribution and with a reference line corresponding to an uniform distribution. I think I'll have to write another blog post about it (or a good documentation in the package) because the resulting graph is not easy to decode.
Here is an example generated with
The minimum value which can be represented with this format is :
It log is $-9.7$. The minimum horizontal value of the graph.
The maximum value (for positive values) is : $1.99993896484375$ with a log of $0.69$. The maximum horizontal value.
But, the signal is reaching only $1.7$ in this test so with a log of $0.53$.
We see that no positive or negative saturation was detected. We also see that the lowest fractional bits are not used a lot. Indeed, a test with
would show a much better fit between the red and blue curves.
The trace function has the type:
-> Signal t a
-> Signal t a
So, yes, trace is using the horrible unsafePerformIO ! But, for tracing I think it is a perfectly acceptable use.
PDF is great but when the picture is getting to complex, vectorial drawing is no more a solution ... so the tool is now switching to a raster mode. By luck, HPDF is supporting the inclusion of raw images. So, I have used it. It is working but I have not made any attempt at accelerating my rasterization code which is currently a bit slow.
4. Sound import / export
I am relying on the WAVE package to import / export WAV files. Stereo is supported ! Importation / exportation is taking care of the conversion to/from fixed point.
I have modified the OS X hViewer to be able to play sound. It is based on a custom implementation of an AIFC encoder. So, potentially I may support 5.1 in addition to stereo. More work is needed on this and then I'll also add an API to export AIFC files.
5. Voice Activity Detection
Here is a very simplified VAD algorithm for this blog post. I have not made any attempt at writing a true VAD algorithm so don't expect that one to work on a lot of test patterns. It is just an illustration. In a future version, when I have finished writing unitary tests, I'll start adding true speech and acoustic algorithms.
I have not done any post-processing of the VAD signal (decision smoothing). It is generally done to improve the signal based on temporal properties of speech.
The implementation of the VAD is in fixed point. And I have added some non-stationary noise.
The spectrogram is also generated by this library.
Here are some examples of the code and types for the VAD:
energy = mapS (U.map bandEnergyF . fft . U.map (:+ 0)) framed
frameWithWinAndOverlap is part of the Windows module in the library.
Here is an example of how could the VAD API be. The current implementation is a bit more complex but can be simplified.
The n + n type is because the energy is a square and I keep all fractionals bits (so 2n).
=> Int -- ^ Window size
-> U.Vector (Fixed Int32 (n + n) s r) -- ^ Current Estimation of the energy spectrum
-> [U.Vector (Fixed Int32 (n + n) s r)] -- ^ Sequence of spectrum
-> [Fixed Int16 n s r] -- ^ Sequence of VAD decisions
The rasterization is slow. My functions for Signal are not lazy enough. When I write the framing algorithms using Signal it never ends. When I change some pieces to use a Data.Sream.List it is working. And yet, Signal is just a wrapper around lists. So, I probably made a mistake with my rewrite rules for optimization.
And there is a problem with Int40 so the VAD is using Int64 for intermediate computations.
I have written all this code a bit quickly. So, now it is time to make a break to write the unitary tests, document, clean the code, benchmark, optimize and correct the few big remaining bugs. So don't expect another post about this library in short term.