Postnet Layer
Postnet Layer
💣 Postnet Layer
In addition to the decoder, some systems have a post-net, an additional network that predicts acoustic features. A post-net was originally introduced to convert acoustic features to different acoustic features that were suitable for an adopted waveform synthesis method, for example, from mel spectrograms to linear spectrograms [2] or mel spectrograms to vocoder parameters [4]. In recent studies the role of the post-net was to improve the acoustic features predicted by the decoder to improve quality further [5, 6]. The post-net introduces an additional loss term in the objective function.
Relative to DAR, C-DAR has three additional changes that do not significantly impact naturalness or controllability, but provide additional insights into F0 generation. First, a 5-layer postnet [3] follows the autoregressive RNN. We find that this postnet has the effect of reducing autoregressive sampling errors and tightening the posterior distribution around the argmax (Figure 2)