-------- Mailing List Message ------------------------------------ Subject: [Yoshimi-devel] PADSynth background rebuild ("padthread") Date: Mon, 31 Jan 2022 04:20:52 +0100 From: Ichthyostega To: Yohimi-developers At 15.12.21 at 13:22 Will Godfrey wrote: > This now has an auto apply feature fairly well implemented. > However it will crash on extreme wavetable sizes (I don't know why yet). Hello Yoshimi developers, As we all know, concurrent programming can be surprisingly tricky, even for seemingly simple stuff -- and so this new feature kept us busy for quite some time, until reaching a state now where it works without crashes and sound glitches, and thus appears to be "feasible". Some issues (most notably XRuns) need to be sorted out yet. You can see the current state of this experimental feature in my Github https://github.com/Ichthyostega/yoshimi.git Branch: padthread /Unfortunately this is a huge changeset, going deep down the rabbit's hole/ =========================================================================== What do we hope to achieve? The PADSynth is based on a exceptionally fine-grained spectrum distribution and uses a huge Fast-Fourier-Transform operation to generate a likewise large yet perfectly looped wavetable. The generated sound is conceptionally equivalent to results produced by "granular synthesis". Rendering this huge spectrum is a compute intensive task, and can easily take up several seconds. During that time, event processing in Yoshimi is blocked -- which leads to the idea to perform wavetable re-building as a background operation and load the results when ready. At 18.12.21 12:12 Will Godfrey wrote: > Ideally I'd want as seamless behaviour as possible. > Looking ahead there are three scenarios that I'd ideally like to see. > > 1 (manual) User moves control; nothing happens until setting 'apply'. > Nothing else is interrupted apart from the discontinuity of the actual swap. > > 2 (partially automatic) System tracks control changes applying them as they > arrive. If they come too fast the build can be interrupted so only the last > one is completed. Again, just a discontinuity at the swap time. > > 3 (fully automatic) As 2, but instead of a swap, maintain both original and > new sample set while morphing between the two, then delete the old ones, > but keep the relatively small 'framework' - so a form of double buffer. > Morph time could be made user variable, with the proviso that further > harmonics changes would be ignored during this time. > > 3 is really icing on the cake, and if it can be done would be something to shout about To translate this feature description into a programming task.... (1) we want to move the expensive rebuilding of wavetables into a background thread, so the event handling thread is no longer blocked. (2) we want to ensure the following conditions * whenever the PadSynth-Parameters are "dirty", a rebuild should happen * only after that rebuild is really complete, the swap-in should happen (3) we want to prevent redundant rebuilds from happening at the same time. Addition to (1): under some conditions (CLI Scripts) we still want to block the calling thread until the actual build is complete, in order to ensure predictable state. =========================================================================== Challenges The Yoshimi code base can be described as rather cohesive and tangled. Many parts are written in some "I know what I am about to do so get out of my way" style, leading to code that is hard to understand and maintain, and easy to break. Notable raw buffers of various size are allocated and then passed through dozens and dozens of functions, at the end to be processed somewhere by an algorithm which just "magically" seems to know how to deal with that data, and often behaves quite different based on some implied condition detected from magic markers. Moreover Yoshimi uses effectively global yet mutable state even where this wouldn't be necessary, and this state is often manipulated from a totally remote code location by grabbing into the innards of another seemingly unrelated facility. There is often no notion of ownership or hierarchy, parts are mutually dependent and have to be bootstrapped and initialised in a very specific order. Thus, to extract some functionality and perform it in a different and effectively non-deterministic order, we're bound to trace down and understand lots of details meticulously to identify which parts can be rearranged or need to be disentangled. A further complication arises from constraints imposed by the lib FFTW3, which Yoshimi relies on to implement the Fast Fourier Transform operation. This library in itself is very elaborate and flexible and meanwhile has been adapted to allow concurrent and re-entrant calculations, albeit with very strictly delineated prerequisites -- which the existing usage in Yoshimi did not need to fulfil, since up to now it operated on the assumption of a single deterministic computation path. The necessary changes were especially related to the feature of a "FFT computation plan". At start, Lib FFTW3 requires the user to pick the appropriate feature set and invocation scheme. Some users e.g. want to use complex numbers and multidimensional functions, while others (like Yoshimi) just need real valued functions and prefer to work with "sine" and "cosine" coefficients in the Spectrum to represent the phase of a spectral line. Actually, libFFTW3 would even be able to perform timing measurements and persist or load a FFT plan optimised for the specific setup and hardware -- an advanced feature Yoshimi does not exploit. Unfortunately this definition of FFT plans turned out to be not threadsafe -- and Yoshimi sometimes happened to re-build those FFT plans during normal operation, especially after GUI interactions, thereby relying on the ability of libFFTW3 to detect and re-use similar plan definitions behind the scenes (and this caching seems to be one of the reasons why the setup of such FFT plans interferes with other concurrent memory management operations. To overcome these difficulties, we had to overturn and rearrange all memory management related to spectrum and waveform data -- to get reliable control over the actual allocations and change the point in time when allocations are workable. So the FFT plans are now prepared at first usage and shared by all further calculations, while spectrum data is now arranged in memory right from start in the very specific order required by the Fast Fourier algorithm, and with appropriate alignment to allow for SIMD optimisation. Thus the transform calculation can now be invoked directly on the working data within OscilGen or the PADnoteParameters, instead of allocating a shared data block and copying and rearranging the spectrum coefficients for each invocation (as it was done as of yet). To carry out this tricky refactoring safely, we relied on the help by the compiler: Spectrum and Waveform data became encapsulated into a data holder object (based on a single-ownership smart-pointer); various function signatures within OscilGen and SynthEngine have been converted step by step from using raw and unbounded float* to accepting these new data holder types. =========================================================================== Implementation of PADSynth background builds Whenever a new instrument involving PADSynth Kit-Items is loaded, and also when the user hits the "Apply" button in the PAD editor, or by the new »auto-Apply« feature detecting relevant parameter changes, a background build is triggered. Further changes during ongoing builds will cause these to start over afresh -- however in the case of »auto-Apply« with a short delay to integrate several change messages caused from dragging the sliders in the GUI. The data storage for the PADSynth wavetables was likewise encapsulated into a new data holder type "PADtables", which can be moved only (single ownership). This result data will be handed over from the background thread to the Synth thread with the help of a C++ std::future, while the rebuild-trigger is coordinated through a std::atomic variable. For the background tasks a rather simplistic scheduler has been added, to start a limited number of background threads, based on the number of available CPU cores, as reported by the C++ runtime system. Incoming build tasks are enqueued and picked up by those worker threads. Since these operations never interfere directly with the Synth, we can keep matters straight and use a simple Mutex for protection. Within the SynthEngine thread, at the begin of every buffer cycle when calculating sound for PADSynth notes, the readiness state of the future is probed (non blocking), to swap in the new PADtables when actually ready. All of this state handling logic has been embodied into a new component "FutureBuild", defined in Misc/BuildScheduler.h|cpp. Each PADnoteParameters instance now holds a PADtables instance and a FutureBuild instance, and delegates to the latter for all requests pertaining wavetable builds. This FutureBuild state manager has been written in a way to remain agnostic both of the actual data type to transport (which is the PADtables) and the actual scheduler backend implementation to use, allowing to tweak and evolve those parts independently as we see fit. =========================================================================== Integration and Extensions: Cross-Fade and Random Walk [30.4.2022: added these explanations] While "a transition by cross fade" might be deemed simple at first sight, it turned out as rather tricky on close investigation, because cross-fading is an ongoing task and need to be interwoven with the actual sound computation on the inner processing loop. It would be a simple addition indeed within a processing architecture based on processing tasks and a scheduler -- Yoshimi however takes the opposite ascent with a single top-down compute-buffer call, handling any variations by pre-coded forking in the computation path. Moreover, the concept of a "note" was shaped rather accidentally and then extended by copy-n-paste to SUBnote and PADnote after the fact. And so, especially for PADSynth, there is no room between the triggering of a note instance within Part.cpp, and the actual low-level wavetable based sample computation. Duplicating this sample computation code into a cross-fading version was not deemed acceptable, and directly hooking the cross fade into the computation was not even considered (for obvious performance reasons). Which leaves the option of abstracting out the actual computation as a wavetable interpolator component, which can then be wired either directly, or combined with a fade. The details of this refactoring however turn out to be quite involved, since a set of wavetables is maintained for each PADSynth kit-item in its own PADnoteParameters object, which in turn can be shared by multiple note instances, which additionally can also be legato or portamento notes. The "check point" for integrating a newly built set of wavetables is at the begin of the buffer computation cycle for some note, and this might happen at any point and for any note right in the middle of the overall calculation call. At this point, a new XFadeManager component was added, to take hold of the old wavetable, mark an ongoing cross-fade and keep track of all users through reference counting. This could have been implemented by just using a std::shared_ptr -- but this idea was rejected, since shared_ptr uses atomic locks for coordination, which might add a considerable overhead within the inner processing loop, and thus would necessitate extended investigation and timing measurements to be safe -- on the other hand, the actual SynthEngine code is known to run entirely single threaded, and to fit in with that image, even the note-initialisation of PADnotes has now be moved over into the Synth thread to forego any necessitation of thread synchronisation (beyond the FutureBuild used for handing in the new wavetables). In listening tests it turned out that a simple linear crossfade is not sufficient at this point, since the waveform of old and new wavetables typically show very low correlation (due to the randomised phases). Thus an equal-power mix seems adequate. Moreover, even such a mix would still be noticeable as a "manipulation", and for that reason, a typical S-Fade edit curve was devised, and combined with segment-wise linear interpolation, so to compute the expensive square root for the equal power mix only once per block. And finally, after having built all this scaffolding, it became simple to add a user-visible new feature on top, which hopefully expands the musical viability of PADSynth: it is now possible to re-trigger this background build process periodically, and even to perform a classical random walk on some parameters, to break the subtle trait of "sameness", which, after playing for some time, arises from the fixed wavetables. Even while the sound superficially might seem random, in fact the patterns repeat after some time; but rebuilding a new set of wavetables will completely re-shuffle all phases and thereby randomise the patterning in the sound.