SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization

SAPE: Spatially-Adaptive
Progressive Encoding for Neural Optimization

NeurIPS 2021

Amir Hertz¹, Or Perel¹, Raja Giryes¹,
Olga Sorkine-Hornung², Daniel Cohen-Or¹

¹ Tel Aviv University, ² ETH Zurich, Switzerland

-Note: This page is best viewed on large screens-

What is SAPE about?

Spectral Bias

When used to fit neural implicit functions, Multilayer Perceptrons are likely to learn the global, lower frequencies of signals earlier. At the same time, they struggle to fit local, high frequencies. This phenomenon was termed by Rahaman et al. 2019 as "Spectral Bias".

Positional Encoding

One way to overcome "Spectral Bias" is by mapping input coordinates to a higher dimensional space via "Positional Encoding". Doing so, however, generally requires manually tuning the scale of the encoding frequency band.

Progressive Positional Encoding

SAPE avoids manual tuning by employing a simple policy of progressively revealing the positional encoding to the network.

Spatially Adaptive Progression

In addition, SAPE maintains a record of progression per portion of the neural implicit signal. The progression rate per part is stimulated by a feedback loop according to the loss function.

As a result, SAPE is able to learn high quality neural implicit functions with minimal manual intervention.

Multilayer-perceptrons (MLP) are known to struggle with learning functions of high-frequencies, and in particular cases with wide frequency bands. We present a spatially adaptive progressive encoding (SAPE) scheme for input signals of MLP networks, which enables them to better fit a wide range of frequencies without sacrificing training stability or requiring any domain specific preprocessing. SAPE gradually unmasks signal components with increasing frequencies as a function of time and space. The progressive exposure of frequencies is monitored by a feedback loop throughout the neural optimization process, allowing changes to propagate at different rates among local spatial portions of the signal space. We demonstrate the advantage of SAPE on a variety of domains and applications, including regression of low dimensional signals and images, representation learning of occupancy networks, and a geometric task of mesh transfer between 3D shapes.

Representation of 1D Signals

With SAPE, multilayer-perceptrons can faithfully represent implicit 1D signals of varying frequency.
In the example below the network attempts to learn the representation of a 1D function represented by the black curve. The training samples are shown in red .

MLP

Fourier Feature Networks

SAPE

Ground Truth

Representation of 2D images

SAPE is able to represent a wide range of natural images without tuning the positional encoding frequency scale.
By uniformly sampling 25% of the original pixels in the image as a train set, SAPE is still able to reconstruct small details of the original signal. Note that SAPE's performance is capped by the sampling rate (e.g: details smaller than the sampling rate are not guaranteed to be captured). Below we show animations comparing the optimization progress per algorithm. For SAPE - you may hover over the animation to toggle a heatmap tracking the maximal frequency unmasked per position (low to high). Further below we compare the results after convergence.

MLP

Fourier Feature Net. σ=5

Fourier Feature Net. σ=25

SAPE σ=25

Representation of 3D shapes

SAPE is also useful for learning the representation of 3d occupancy implicit functions.
In the examples below, points were sampled uniformly in space and near the shape surface.
Points are then assigned a binary label to determine if they fall within the interior of the surface volume or not.
Note that due to memory constraints, the result presented is a mesh converted from a neural implicit function using Marching Cubes with finite resolution.

MLP

Fourier Feature Networks

SAPE

Ground Truth

Deformation of 2D Silhouettes

Finally, we demonstrate how SAPE can regularize a deformation process. In the following task, for all shapes, SAPE is first pretrained to output the coordinates of a unit circle. Then, the network is then optimized to trace the boundaries of a target shape by learning the offset from the circle boundary to the shape contour, per position.
The progressive nature of SAPE allows it to capture the global shape first, during the early steps when Spectral Bias is present and the optimization is stable. As higher frequencies are revealed, SAPE is able to fit the finer details of the target shape.

MLP

FFN

SAPE

Target

Related Works

Baselines

Rahaman et al. (2019) observed that deep ReLU networks are biased towards low frequency functions, and identified this phenomenon as "Spectral Bias".
Tancik et al. (2020) established the groundwork for applying Fourier Feature mappings to MLPs. They provided extensive analysis of results through the lens of NTK theory.
Sitzmann et al. (2020) proposed the sinusoidal representation networks (SIREN). Unlike other works which focus on positional encoding mapping of the network input, they use sine functions as non-linear activations for all layers of the network.

Concurrent Works

Park et al. (2021) reconstruct photorealistic non-rigid deforming scenes from photos or videos. They also use coarse-to-fine positional encoding.
Lin et al. (2021) extend Neural Radiance Fields (NeRF), for training without accurate camera poses. They too, apply coarse-to-fine registration on coordinate based scene-representations.
Mehta et al. (2021) generalize SIREN with a dual-MLP architecture, where an auxilary network maps input latent codes to parameters that modulate the periodic activations of the synthesis network.

BibTeX

@article{hertz2021sape, title={SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization}, author={Amir Hertz and Or Perel and Raja Giryes and Olga Sorkine-Hornung and Daniel Cohen-Or}, journal={arXiv preprint arXiv:2104.09125}, year={2021} }