FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives
-
Anonymous Authors
Abstract
Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints. Widely adopted approaches supervise complex interactions with additional masks and control signal annotations, limiting their real-world applications. In this paper, we propose an annotation guidance-free method, dubbed FreeGaussian, that mathematically derives dynamic Gaussian motion from optical flow and camera motion using novel dynamic Gaussian constraints. By establishing a connection between 2D flows and 3D Gaussian dynamic control, our method enables self-supervised optimization and continuity of dynamic Gaussian motions from flow priors. Furthermore, we introduce a 3D spherical vector controlling scheme, which represents the state with a 3D Gaussian trajectory, thereby eliminating the need for complex 1D control signal calculations and simplifying controllable Gaussian modeling. Quantitative and qualitative evaluations on extensive experiments demonstrate the state-of-the-art visual performance and control capability of our method.
Pipeline
The overview of FreeGaussian. Given a set of video stream \(\{\mathbf{P}(t), \mathbf{I}(t)\}\), our method recover controllable 3D Gaussians \(\mathbf{G}^{\ast}\) with two stages. First, we pre-train a deformable 3DGS and calculate dynamic Gaussian flow \(\mathbf{u}^\text{GS}\) from optical and camera flow with Lemma 1. Then, we reproject dynamic Gaussian flow maps and cluster the highlight 3DGS with the DBSCAN algorithm, followed with trajectory calculation. In the controllable Gaussian training stage, we optimize Gaussians \(\mathbf{G}\) and network \(\mathbf{\Theta}\) using rasterization-based loss function in Sec 3.4, which measures the discrepancy between rendered images and input images, as well as dynamic Gaussian flows.
Dynamic Gaussian Flow Analysis
In interactive scenes, consider an instantaneous motion model, where the camera and 3D Gaussian hold separate velocities in consecutive frames. The projected optical flow \(\mathbf{u}\) can be decomposed into camera flow \(\mathbf{u}^\text{Cam}\) and dynamic Gaussian flow \(\mathbf{u}^\text{GS}\), as described in Lemma 1 and Corollary 1.
Lemma 1
Dynamic Gaussian flow \(\mathbf{u}^\text{GS}\) under instantaneous motion can be derived from optical flow \(\mathbf{u}\) and camera flow \(\mathbf{u}^\text{Cam}\) with the following transform: \[ \begin{equation} \begin{aligned} \label{eq:gaussian_flow_analysis} & \mathbf{u} = \mathbf{u}^\text{Cam} + \mathbf{u}^\text{GS} + \mathbf{\Delta}, \\ & \mathbf{u}^\text{Cam} = \frac{\mathbf{A}\boldsymbol{v}}{Z} + \mathbf{B}\boldsymbol{\omega}, \quad \mathbf{u}^\text{GS} = \mathbf{A} \sum_{i=1}^{M} T_i \alpha_i \frac{\boldsymbol{v}^\text{GS}}{Z_i}, \quad \mathbf{\Delta} = \mathbf{A} \sum_{i=1}^{M} T_i \alpha_i \boldsymbol{v}(\frac{1}{Z_i} - \frac{1}{Z}), \\ & \mathbf{A} = \begin{bmatrix} -f_x & 0 & x - c_x \\ 0 & -f_y & y - c_y \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} \frac{(x - c_x)(y - c_y)}{f_y} & - f_x - \frac{(x - c_x)^2}{f_x} & \frac{(y - c_y) f_x}{f_y} \\ f_y + \frac{(y - c_y)^2}{f_y} & -\frac{(x - c_x)(y - c_y)}{f_x} & -\frac{(x - c_x)f_y}{f_x} \end{bmatrix}, \\ \end{aligned} \end{equation} \]
where \(f_x, f_y, c_x, c_y\) are camera intrinsics, \(M\) denotes the number of Gaussian projections sorted with Gaussian depth \(Z_i\) intersecting the pixel \(\mathbf{m}\). Flow residual term \(\mathbf{\Delta}\) are preserved to guarantee accuracy, even when they approach zero after refined optimization.
Corollary 1
The dynamic Gaussian flow \(\mathbf{\tilde{u}}^\text{GS}\) on image plane can be accumulated with 2D Gaussian means displacement \(\boldsymbol{\mu}_{i,t} - \boldsymbol{\mu}_{i,0}\). \[ \begin{align} \mathbf{u} = \mathbf{u}^\text{Cam} + \tilde{\mathbf{u}}^\text{GS} + \mathbf{\Delta}, \quad \tilde{\mathbf{u}}^\text{GS} = \sum_{i=1}^{M} T_i \alpha_i (\boldsymbol{\mu}_{i,t} - \boldsymbol{\mu}_{i,0}). \label{eq:dynamic_gs_flow} \end{align} \]
Dynamic Gaussian clustering and tracking
With the formulations in Corollary 1, we pretrain a deformable 3DGS \(\mathbf{G}^{\prime}\) with a set of camera streams first. Then dynamic Gaussian flow \(\mathbf{u}^\text{GS}\) from Corollary 1 can be extracted frame-by-frame and binaried to obtain flow maps. By back-projecting the flow maps to identify dynamic 3D Gaussians, we highlight Gaussians \(\mathcal{D} = \{g_i \mid i = 1, 2, \ldots, Q\}\) with sharp dynamics, as illustrated in Pipeline. Next, we use unsupervised clustering algorithm DBSCAN to group dynamic Gaussians into clusters \(\mathcal{C} = \{c_i \mid i = 1, 2, \ldots, K\}\), where \(K\) is the number of interactive objects. The cluster centers evolve over time, generating continuous trajectories \(\boldsymbol{\varsigma}(t, k)\), where \(k\) indexing which objects the trajectory belongs to.
3D Spherical Vector Control
In the training stage, we represent the Gaussian dynamics state using cluster trajectory coordinates \(\mathbf{v}_c^i = \boldsymbol{\varsigma}(t, k) - \boldsymbol{\varsigma}(0, k)\), concatenated with Gaussian centers \(\mathbf{X}_i\). Then, we encode the coordinates with \(\mathbf{E}(\mathbf{v}_{c}^i, \mathbf{X}_i)\) and jointly train the model \(\Theta\) to recover Gaussian dynamics \(\left \langle \Delta\mathbf{X}_i, \Delta\mathbf{\Sigma}_i \right \rangle\): \[ \begin{align} \boldsymbol{f}_{\Theta}\left(\mathbf{X}_i, \mathbf{E}(\boldsymbol{\varsigma}(t, k) - \boldsymbol{\varsigma}(0, k)) \right) \mapsto \left \langle \Delta\mathbf{X}_i, \Delta\mathbf{\Sigma}_i \right \rangle. \label{eq:training} \end{align} \] Then, we perform splatting rasterization with the Gaussian combining with predicted dynamics. In contrast, during the control stage, we manually input interactive 3D vector \(\mathbf{v}_c^\prime\), retrieving the Gaussian dynamics from the network by \( \boldsymbol{f}_{\Theta}\left(\mathbf{X}_i, \mathbf{v}_c^\prime \right)\).
More Demos
Citation
If you want to cite our work, please use:
@misc{freegaussian2024, title={FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives}, author={Anonymous Authors}, year={2024}, }
Acknowledgements
The website template was borrowed from Michaël Gharbi. Image sliders are based on dics. We adopt code from Nerfstudio. Thanks for making the code available!