[SLAM][En] Notes on Formula Derivation and Analysis of the VINS-mono

This post is the study note on Yibin Wu's "Formula Derivation and Analysis of the VINS-mono" paper. If you see something wrong or needs to be corrected, please comment, and I will check it later.

1. Introduction

Visualizing the camera and IMU coordinate system is as follows. Both sensors are fixed on a single board, and calibrating the board yields the extrinsic parameter $\mathbf{T}^{b}_{C}$, which is the relative pose between the two sensors.

NOMENCLATURE

Matrices are written in bold capital letters e.g., $\mathbf{R}$
Vectors are written in bold lowercase e.g., $\mathbf{a}$
Scalars are written in normal lowercase letters e.g., a
Coordinate systems related to vector transformation are denoted with superscript $^{a}$ and subscript $_{a}$ e.g., $\mathbf{R}^{b}_{w}$ (from world to body frame). $\mathbf{R}^{b}_{w}\mathbf{v}^{w} = \mathbf{v}^{b}$
For vectors, the superscript $^{a}$ denotes the vector viewed from that coordinate system e.g., $\mathbf{g}^{w}$
$\hat{*}$ means an estimated value e.g., $\hat{\mathbf{p}}$
$\tilde{*}$ means the observed value $\tilde{\mathbf{p}}$
$\mathbf{a}_{x}$ denotes the x component of the vector
$[*]_{\times}$ means the skew-symmetric matrix of the vector $*$ e.g., $\mathbf{x} = \begin{bmatrix} x,y,z \end{bmatrix}^{\intercal}$, then $[\mathbf{x}]_{\times} = \begin{bmatrix} 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \end{bmatrix}$
$\mathbf{R}\{*\}$ means to convert $*$ to rotation matrix e.g., $\mathbf{R}\{\mathbf{I} + [\delta \boldsymbol{\theta} ]_{\times}\}$
$\otimes$ means quaternion product e.g., $\mathbf{q}_{1} \otimes \mathbf{q}_{2}$
$\Omega_{L}(\boldsymbol{\omega})$ is given by the pure quaternion $\boldsymbol{\omega} = [0, \omega_{x}, \omega_{y}, \omega_{z}]$ means the quaternion left multiplication operator. That is, $\Omega_{L} = \begin{bmatrix} 0 & -\boldsymbol{\omega}^{\intercal} \\ \boldsymbol{\omega} & [\boldsymbol{\omega}]_{\times} \end{bmatrix}$. Conversely, $\Omega_{R}(\boldsymbol{\omega})$ means the quaternion right multiplication operator, $\Omega_{R}(\boldsymbol{\omega}) = \begin{bmatrix} 0 & -\boldsymbol{\omega}^{\intercal} \\ \boldsymbol{\omega} & -[\boldsymbol{\omega}]_{\times} \end{bmatrix}$. For details, see (19) of \ref{ref:3}.
$\mathbf{b}_{g}$ means the same gyro bias as $\mathbf{b}_{w}$. In this paper, only $\mathbf{b}_{g}$ is used.
$\mathbf{I} \in \mathbb{R}^{3\times3}$ means a 3x3 identity matrix.

2. Imu Preintegration

2.1. IMU Preintegration in Continuous Time

IMU sensors used in SLAM typically acquire data much faster than camera sensors. If the camera acquires data at a rate of 15-30 [Hz], the IMU sensor acquires data at a rate of 100-1000 [Hz]. Therefore, in general, dozens of IMU data exist between two image coordinate systems. At this time, if all IMU data is used for tightly-coupled nonlinear optimization, a huge computation load is generated. Therefore, recently, a method of reducing the computation load by converting several IMU data existing between two image coordinate systems (or keyframes) into one factor has been studied, and this method is called IMU Preintegration.

Assume that an IMU sensor exists in a 3D space. Since the IMU sensor is continuously affected by gravity, the value measured by the IMU accelerometer becomes the actual acceleration when the sensor moves only when gravity is removed.

\begin{equation}
\begin{aligned}
\mathbf{f}^{b} = \mathbf{a}^{b} - \mathbf{g}^{b}
\end{aligned}
\end{equation}
- $b$ : body coordinate system (= IMU coordinate system)
- $\mathbf{f}$ : final IMU acceleration
- $\mathbf{a}$ : measured acceleration
- $\mathbf{g}$ : gravity vector

If the IMU sensor is lying face up on the floor parallel to the ground (right-front-up), the observed value at this time is $\mathbf{f}^{b}_{0} = - \mathbf{g} ^{b} \sim 9.8 m/s$.

2.1.1. IMU noise and bias

The IMU data measured in the IMU coordinate system includes the components of acceleration, change in angular velocity, and gravity according to the movement of the sensor, and is affected by the bias that inevitably occurs during the IMU manufacturing process and the noise of the sensor itself. Therefore, the value measured by the IMU sensor includes the following components.

\begin{equation}
\begin{aligned}
& \hat{\mathbf{a}}_{t} = \mathbf{a}_{t} + \mathbf{b}_{a_{t}} + \mathbf{R}^{t}_{w}\mathbf{g}^{w} + \mathbf{n}_{a} \\
& \hat{\boldsymbol{\omega}}_{t} = \boldsymbol{\omega}_{t} + \mathbf{b}_{g_{t}} + \mathbf{n}_{g}
\end{aligned}
\end{equation}
- $\hat{\mathbf{a}}_{t}$ : measured acceleration in time $t$
- $\hat{\boldsymbol{\omega}}_{t}$ : measured angular velocity in time $t$
- $\mathbf{a}_{t}$ : actual acceleration in time $t$
- $\boldsymbol{\omega}_{t}$ : actual angular velocity in time $t$
- $\mathbf{g}^{w} = [0,0,9.8]^{\intercal}$ : Gravity vector viewed from the world coordinate system
- $\mathbf{R}^{t}_{w}$ : (from world to t) rotation matrix
- $\mathbf{b}_{at}, \mathbf{b}_{gt}$ : Acceleration bias and angular velocity bias values at time $t$.

$\dot{\mathbf{b}}_{at} = \mathbf{n}_{b_{a}}$, $\dot{\mathbf{b}}_{gt} = \mathbf{n}_{b_{g}}$
$\mathbf{n}_{b_{a}} \sim \mathcal{N}(0, \sigma^{2}_{b_{a}})$, $\mathbf{n}_{b_{g}} \sim \mathcal{N}(0, \sigma^{2}_{b_{g}})$.

- $\mathbf{n}_{a}, \mathbf{n}_{g}$ : Noise of acceleration and angular velocity.

$\mathbf{n}_{a} \sim \mathcal{N}(0, \sigma^{2}_{a})$, $\mathbf{n}_{g} \sim \mathcal{N}(0, \sigma^{2}_{g})$

Therefore, since the acceleration and angular velocity $\hat{\mathbf{a}}_{t}, \hat{\boldsymbol{\omega}}_{t}$ including gravity, noise, and bias coming through the IMU sensor, the actual IMU sensor The acceleration and angular velocity values of must be obtained separately. The formula is as follows based on the values $\mathbf{a}_{t} and \boldsymbol{\omega}_{t}$ to be actually obtained.
\begin{equation}
  \begin{aligned}
    & \mathbf{a}_{t} = \hat{\mathbf{a}}_{t} - \mathbf{b}_{a_{t}}  - \mathbf{n}_{a} - \mathbf{R}^{t}_{w}\mathbf{g}^{w}\\
& \boldsymbol{\omega}_{t} = \hat{\boldsymbol{\omega}}_{t} - \mathbf{b}_{g_{t}} - \mathbf{n}_{g}
  \end{aligned}
\end{equation}

2.1.2. IMU Preintegration

When IMU coordinate systems $b_{k}$ and $b_{k+1}$ corresponding to the two image coordinate systems exist, position, velocity, and angle values from the IMU can be obtained as follows.

\begin{equation}
  \label{eq:0}
  \begin{aligned}
    & \mathbf{p}^{\omega}_{b_{k+1}} && = \mathbf{p}^{\omega}_{b_{k}} + \mathbf{v}^{\omega}_{b_{k}} \Delta t_{k} + \iint_{t \in [k,k+1]} \mathbf{R}^{\omega}_{t}\mathbf{a}_{t} \  dt^{2} \\
    & \mathbf{v}^{\omega}_{b_{k+1}} && = \mathbf{v}^{\omega}_{b_{k}} + \int_{t \in [k,k+1]} \mathbf{R}^{\omega}_{t}\mathbf{a}_{t} \ dt \\
    & \mathbf{q}^{\omega}_{b_{k+1}} && = \mathbf{q}^{\omega}_{b_{k}} \otimes \int_{t \in [k,k+1]} \dot{\mathbf{q}} \ dt \\
    & && = \mathbf{q}^{\omega}_{b_{k}} \otimes \int_{t \in [k,k+1]} \frac{1}{2} \Omega _{R}(\boldsymbol{\omega}_{t})\mathbf{q}^{b_{k}}_{t} dt
  \end{aligned}
\end{equation}

Substituting $\mathbf{a}_{t}$ and $\boldsymbol{\omega}_{t}$ with the formula derived above, it is expressed again as follows.
\begin{equation}
  \label{eq:1}
  \begin{aligned}
    & \mathbf{p}^{\omega}_{b_{k+1}} = \mathbf{p}^{\omega}_{b_{k}} + \mathbf{v}^{\omega}_{b_{k}} \Delta t_{k} + \iint_{t \in [k,k+1]} (\mathbf{R}^{\omega}_{t}(\hat{\mathbf{a}}_{t} - \mathbf{b}_{at} - \mathbf{n}_{a}) - \mathbf{g}^{w}) dt^{2} \\
    & \mathbf{v}^{\omega}_{b_{k+1}} = \mathbf{v}^{\omega}_{b_{k}} + \int_{t \in [k,k+1]} (\mathbf{R}^{\omega}_{t}(\hat{\mathbf{a}}_{t} - \mathbf{b}_{at} - \mathbf{n}_{a}) - \mathbf{g}^{w}) dt \\
    & \mathbf{q}^{\omega}_{b_{k+1}} = \mathbf{q}^{\omega}_{b_{k}} \otimes \int_{t \in [k,k+1]} \frac{1}{2} \Omega_{R} (\hat{\boldsymbol{\omega}}_{t} - \mathbf{b}_{gt} - \mathbf{n}_{g})\mathbf{q}^{b_{k}}_{t} dt
  \end{aligned}
\end{equation}
- $\mathbf{p}^{\omega}_{b_{k+1}}$ : The position of the $b_{k+1}$ coordinate system as seen from the world coordinate system
- $\mathbf{v}^{\omega}_{b_{k+1}}$ : Velocity of the $b_{k+1}$ coordinate system viewed from the world coordinate system
- $\mathbf{q}^{\omega}_{b_{k+1}}$ : Orientation of the $b_{k+1}$ coordinate system as seen from the world coordinate system
- $\Omega_{R}(\boldsymbol{\omega}) = [\boldsymbol{\omega}]_{R} = \begin{bmatrix} 0 & -\boldsymbol{\omega}^{\intercal} \\ \boldsymbol{\omega} & -[\boldsymbol{\omega}]_{\times} \end{bmatrix}$ where,  $\quad [\boldsymbol{\omega}]_{\times} = \begin{bmatrix} 0 & -\omega_{z} & \omega_{y} \\ \omega_{z} & 0 & -\omega_{x} \\ -\omega_{y} & \omega_{x} & 0 \end{bmatrix}$
- $\otimes$ : Quaternion multiplication operator (e.g., $\mathbf{q} = \mathbf{q}_{1} \otimes \mathbf{q}_{2}$)

NOTICE: At this time, $\mathbf{g}^{w}$ does not represent the gravity in the real-world coordinate system, but the gravity vector in $\mathbf{f}^{b}_{0}$ in the initial coordinate system of the IMU system.

(\ref{eq:1}) The $\int_{t \in [k,k+1]} \frac{1}{2} \Omega (\hat{\boldsymbol{\omega}}_{t} - \mathbf{b}_{gt} - \mathbf{n}_{g})\mathbf{q}^{b_{k}}_{t} dt$ part in the quaternion part is in the form of $\int \dot{\mathbf{q}} \ dt$, which can be derived as follows.\begin{equation}
  \begin{aligned}
    \dot{\mathbf{q}} & = \lim_{\delta t \rightarrow 0} \frac{\mathbf{q}(t+\delta t) - \mathbf{q}(t)}{\delta t} \\
    & = \lim_{\delta t \rightarrow 0} \frac{\mathbf{q}(t) \otimes \delta \mathbf{q} - \mathbf{q}(t)}{\delta t} \\
    & = \lim_{\delta t \rightarrow 0} \frac{\mathbf{q}(t) \otimes (\begin{bmatrix} 1 \\ \frac{1}{2}\boldsymbol{\omega}_{t} \end{bmatrix} - \begin{bmatrix} 1 \\ 0 \end{bmatrix})}{\delta t} \\
    & = \frac{1}{2} \mathbf{q} \otimes \boldsymbol{\omega}_{t} = \frac{1}{2} \Omega_{R}(\boldsymbol{\omega}_{t})\mathbf{q}
  \end{aligned}
\end{equation}
- $\delta \mathbf{q} \approx \begin{bmatrix} 1 \\ \frac{1}{2}\delta \boldsymbol{\theta} \end{bmatrix} = \begin{bmatrix} 1 \\ \frac{1}{2}\boldsymbol{\omega}_{t} \end{bmatrix}$ is approximately true for sufficiently small quaternions $\delta \mathbf{q}$

Next, the reference coordinate system is changed from the world coordinate system $\{w\}$ to the IMU coordinate system $\{b_{k}\}$. Preintegration that integrates position, velocity, and angle between $\{b_{k}\}$ and $\{b_{k+1}\}$ only when the reference coordinate system is changed to $\{b_{k}\}$ value can be obtained.

\begin{equation}
  \label{eq:2}
  \begin{aligned}
    & \mathbf{R}^{b_{k}}_{w} \mathbf{p}^{w}_{b_{k+1}} = \mathbf{R}^{b_{k}}_{w} (\mathbf{p}^{w}_{b_{k}} + \mathbf{v}^{w}_{b_{k}}\Delta t - \frac{1}{2}\mathbf{g}^{w} \Delta t_{k}^{2}) + \alpha^{b_{k}}_{b_{k+1}} \\
    & \mathbf{R}^{b_{k}}_{w}\mathbf{v}^{w}_{b_{k+1}} = \mathbf{R}^{b_{k}}_{w} (\mathbf{v}^{w}_{b_{k}} - \mathbf{g}^{w} \Delta t_{k}) + \beta^{b_k}_{b_{k+1}} \\
    & \mathbf{q}^{b_{k}}_{w} \otimes \mathbf{q}^{w}_{b_{k+1}} = \gamma^{b_{k}}_{b_{k+1}}
  \end{aligned}
\end{equation}
- $\mathbf{R}^{b_{k}}_{w}$ : (from world to $b_{k}$) rotation matrix

In the figure above, the parts highlighted in red and blue mean the IMU values at two time $b_{k}, b_{k+1}$, and the parts highlighted in green mean the $\alpha^{b_{k}}_{b_{k+1}}, \beta^{b_{k}}_{b_{k+1}}, \gamma^{b_{k}}_{b_{k+1}}$ values that are preintegrated between the $[b_{k},b_{k+1}]$ times. Here's how to break it down in detail:
\begin{equation}
  \label{eq:3}
  \begin{aligned}
    & \alpha^{b_{k}}_{b_{k+1}} = \iint_{t \in [k,k+1]} \mathbf{R}^{b_{k}}_{t} (\hat{\mathbf{a}}_{t} - \mathbf{b}_{at} - \mathbf{n}_{a}) dt^{2} \\
    & \beta^{b_{k}}_{b_{k+1}} = \int_{t \in [k,k+1]} \mathbf{R}^{b_{k}}_{t} (\hat{\mathbf{a}}_{t} - \mathbf{b}_{at} -\mathbf{n}_{a}) dt \\
    & \gamma^{b_{k}}_{b_{k+1}} = \int_{t \in [k,k+1]} \frac{1}{2} \Omega_{R}(\hat{\boldsymbol{\omega}}_{t} - \mathbf{b}_{gt} - \mathbf{n}_{g})\gamma^{b_{k}}_{t} dt
  \end{aligned}
\end{equation}

If preintegration is performed, only the preintegrated IMU values need to be updated without the need to perform repropagation on numerous IMU data between $t \in [k,k+1]$ each time the IMU pose is updated due to VIO. has a reduced computational cost.

2.2. IMU Preintegration in Discrete Time

The (\ref{eq:3}) derived so far is a formula that can be used for continuous signals. However, since the actual IMU signal comes in as a discrete signal, the differential equation must be expressed as a difference equation. Various numerical integration algorithms are used in this process, and numerical integration includes zero-order hold (Euler), first-order hold (Midpoint), and higher order (RK4). Among them, the difference equation is expressed using the Midpoint method used in VINS-mono as follows.
\begin{equation}
  \begin{aligned}
    \hat{\alpha}^{b_{k}}_{i+1} & = \hat{\alpha}^{b_{k}}_{i} + \frac{1}{2}(\hat{\beta}^{b_{k}}_{i} + \hat{\beta}^{b_{k}}_{i+1})\delta t \\
    & = \hat{\alpha}^{b_{k}}_{i} + \hat{\beta}^{b_{k}}_{i} \delta t + \frac{1}{4}[\mathbf{R}\{\hat{\gamma}^{b_{k}}_{i}\}(\hat{\mathbf{a}}_{i} - \mathbf{b}_{ai}) + \mathbf{R}\{\hat{\gamma}^{b_{k}}_{i+1}\}(\hat{\mathbf{a}}_{i+1} - \mathbf{b}_{ai})]\delta t^{2} \\
    \hat{\beta}^{b_{k}}_{i+1} & = \hat{\beta}^{b_{k}}_{i} + \frac{1}{2}[\mathbf{R}\{\hat{\gamma}^{b_{k}}_{i}\}(\hat{\mathbf{a}}_{i} - \mathbf{b}_{ai}) + \mathbf{R}\{\hat{\gamma}^{b_{k}}_{i+1}\}(\hat{\mathbf{a}}_{i+1} - \mathbf{b}_{ai})]\delta t \\
    \hat{\gamma}^{b_{k}}_{i+1} & = \hat{\gamma}^{b_{k}}_{i} \otimes \hat{\gamma}^{b_{k}}_{i, i+1} = \hat{\gamma}^{b_{k}}_{i} \otimes \begin{bmatrix} 1 \\ 1/4(\hat{\boldsymbol{\omega}}_{i} + \hat{\boldsymbol{\omega}}_{i+1} - 2 \mathbf{b}_{gi}) \delta t \end{bmatrix} \\
  \end{aligned}
\end{equation}
- $\hat{\alpha}, \hat{\beta}, \hat{\gamma}$: The discrete conversion pretinegration values are approximated by numerical integration, so they are expressed as $\hat{(\cdot)}$
- $i$ : discrete time step between times $[t_{k}, t_{k+1}]$
- $\delta t$ : Time interval between $i$ and $i+1$
- $\mathbf{R}\{\gamma^{b_{k}}_{i+1}\}$: converts $\gamma^{b_{k}}_{i+1}$ into a rotation matrix an operator that

For comparison, expressed through Euler numerical integration, it is as follows.
\begin{equation}
  \begin{aligned}
    \hat{\alpha}^{b_{k}}_{i+1} & = \hat{\alpha}^{b_{k}}_{i} + \hat{\beta}^{b_{k}}_{i} \delta t + \frac{1}{2}\mathbf{R}\{\hat{\gamma}^{b_{k}}_{i+1}\}(\hat{\mathbf{a}}_{i} - \mathbf{b}_{ai})\delta t^{2}    \\
    \hat{\beta}^{b_{k}}_{i+1} & = \hat{\beta}^{b_{k}}_{i} + \mathbf{R}\{\hat{\gamma}^{b_{k}}_{i+1}\}(\hat{\mathbf{a}}_{i} - \mathbf{b}_{ai})\delta t \\
    \hat{\gamma}^{b_{k}}_{i+1} & = \hat{\gamma}^{b_{k}}_{i} \otimes \hat{\gamma}^{b_{k}}_{i, i+1} = \hat{\gamma}^{b_{k}}_{i} \otimes \begin{bmatrix} 1 \\  \frac{1}{2}(\hat{\boldsymbol{\omega}}_{i} - \mathbf{b}_{gi}) \delta t \end{bmatrix} \\
  \end{aligned} \label{eq:11}
\end{equation}

NOTICE: Since the author of VINS-mono did not know the values of noise $\mathbf{n}_{a}$ and $\mathbf{n}_{g}$, he set them to 0 in the actual code implementation and proceeded. (mentioned in the VINS-mono paper)
NOTICE: Strictly speaking, the preintegration position $\alpha^{b_{k}}_{b_{k+1}}$ and speed $\beta^{b_{k}}_{b_{k+1}}$ are Since it does not include gravity, it can be understood as the position and speed measured when the IMU sensor moves in a zero-gravity space.

Euler method is implemented in actual VINS-mono code. Therefore, the $\hat{\alpha}, \hat{\beta}, \hat{\gamma}$ are updated according to the above (\ref{eq:11}) formula for each $t \in [k, k+1]$ every time the IMU is input from $b_{k}$ to $b_{k+1}$.

2.3. Error-state Kinematics in Continuous Time

Next, when the bias $\mathbf{b}_{a}, \mathbf{b}_{g}$ is optimized by the tightly-coupled VIO, a Jacobian $\mathbf{J}^{b_{k}}_{b_{k+1}}$ must be obtained to update it to the existing preintegration $\alpha, \beta, \gamma$. Also, we need to find the covariance $\mathbf{P}^{b_{k}}_{b_{k+1}}$ of the preintegration state variables to use later in the VIO process. In order to obtain these $\mathbf{J}, \mathbf{P}$, IMU state variables must be expressed as an error-state equation.

Let's create an error state equation for IMU preinegration by referring to the explanation in the \ref{ref:3} paper. The general error state equation can be expressed as:
\begin{equation}
  \begin{aligned}
    \hat{\mathbf{x}} = \mathbf{x} + \delta \mathbf{x}
  \end{aligned}
\end{equation}
- $\hat{\mathbf{x}}$ : true state
- $\mathbf{x}$ : nominal state
- $\delta \mathbf{x}$ : error state

The nominal state can be expressed in detail as follows.

\begin{equation}
  \begin{aligned}
    & \dot{\alpha} = \beta \\
    & \dot{\beta} = \mathbf{R}(\mathbf{a}_{m} - \mathbf{b}_a) \\
    & \dot{\gamma} = \frac{1}{2} \gamma \otimes (\boldsymbol{\omega}_{m} - \mathbf{b}_{g}) \\
    & \dot{\mathbf{b}}_{a} = 0 \\
    & \dot{\mathbf{b}}_{g} = 0
  \end{aligned}
\end{equation}

At this time, $\mathbf{R}$ means a matrix obtained by converting the quaternion $\gamma$ into a rotation matrix. That is, $\mathbf{R} \triangleq \mathbf{R}\{\gamma\}$. Next, the true state can be expressed in detail as follows.
\begin{equation}
  \begin{aligned}
    & \dot{\hat{\alpha}} = \hat{\beta} \\
    & \dot{\hat{\beta}} = \hat{\mathbf{R}}(\mathbf{a}_{m}  - \mathbf{n}_{a} - \hat{\mathbf{b}}_a) \\
    & \dot{\hat{\gamma}} = \frac{1}{2} \hat{\gamma} \otimes (\boldsymbol{\omega}_{m} - \mathbf{n}_{g} - \hat{\mathbf{b}}_{g} ) \\
    & \dot{\hat{\mathbf{b}}}_{a} = \mathbf{n}_{b_{a}} \\
    & \dot{\hat{\mathbf{b}}}_{g} = \mathbf{n}_{b_{g}}
  \end{aligned}
\end{equation}

At this time, it is assumed that the acceleration bias and the angular velocity bias follow the random walk model and their derivatives are white gaussian noise. That is, it satisfies $\mathbf{n}_{b_{a}} \sim \mathcal{N}(0, \sigma^{2}_{b_{a}}), \mathbf{n}_{b_{g}} \sim \mathcal{N}(0, \sigma^{2}_{b_{g}})$. Finally, the error state is described in detail as follows.
\begin{equation}
  \begin{aligned}
    & \dot{\delta \alpha} = \delta \beta \\
    & \dot{\delta \beta} = -\mathbf{R}[\mathbf{a}_{m} - \mathbf{b}_{a}]_{\times} \delta \boldsymbol{\theta} - \mathbf{R}\mathbf{n}_{a} - \mathbf{R}\delta \mathbf{b}_{a} \\
    & \dot{\delta \boldsymbol{\theta}} = -[\boldsymbol{\omega}_{m} - \mathbf{b}_{g}]_{\times} \delta \boldsymbol{\theta} - \delta \mathbf{b}_{g} - \mathbf{n}_{g} \\
    &\dot{\delta \mathbf{b}_{a}} = \mathbf{n}_{b_{a}} \\
    & \dot{\delta \mathbf{b}_{g}} = \mathbf{n}_{b_{g}}
  \end{aligned}
\end{equation}
- $\dot{\delta \boldsymbol{\theta}}$: Note that angular errors in error state use $\dot{\delta \boldsymbol{\theta}}$ instead of $\dot{\delta \gamma}$ .

The error state system equations summarized so far are shown in the figure below.

In the above differential equation, $\dot{\delta \beta}, \dot{\delta \boldsymbol{\theta}}$ can be derived as follows. Values higher than quadratic terms are ignored because they have relatively small values. First of all, $\dot{\delta \beta}$ can be obtained as follows.
\begin{equation}
  \begin{aligned}
    \dot{\beta} + \dot{\delta \beta} & = \boxed{\dot{\hat{\beta}}} && = \hat{\mathbf{R}}(\mathbf{a}_{m} -\mathbf{n}_{a} - \hat{\mathbf{b}_{a}}) \\
    \mathbf{R}(\mathbf{a}_{m} - \mathbf{b}_{a}) + \dot{\delta \beta} & = && = \mathbf{R}\{\mathbf{I} + [\delta \boldsymbol{\theta}]_{\times}\}(\mathbf{a}_{m} - \mathbf{n}_{a} - \mathbf{b}_{a} - \delta \mathbf{b}_{a}) \\
    \dot{\delta \beta} & = && = -\mathbf{R}[\mathbf{a}_{m} - \mathbf{b}_{a}]_{\times} \delta \boldsymbol{\theta} - \mathbf{R}\mathbf{n}_{a} - \mathbf{R}\delta \mathbf{b}_{a}
  \end{aligned}
\end{equation}
\begin{equation}
  \begin{aligned}
    \boxed{\dot{\delta \beta} = -\mathbf{R}[\mathbf{a}_{m} - \mathbf{b}_{a}]_{\times} \delta \boldsymbol{\theta} - \mathbf{R}\mathbf{n}_{a} - \mathbf{R}\delta \mathbf{b}_{a}}
  \end{aligned}
\end{equation}

Next, $\dot{\delta \boldsymbol{\theta}}$ can be obtained as follows.
\begin{equation}
  \begin{aligned}
    \dot{(\gamma \otimes \delta \gamma)} & = \boxed{ \dot{\hat{\gamma}} } && = \frac{1}{2} \hat{\gamma} \otimes (\boldsymbol{\omega}_{m} - \mathbf{n}_{g} - \hat{\mathbf{b}}_{g}) \\
    \dot{\gamma} \otimes \delta \gamma +\gamma \otimes \dot{\delta \gamma}  & = && = \frac{1}{2}\gamma \otimes \delta \gamma \otimes (\boldsymbol{\omega}_{m} - \mathbf{n}_{g} - \mathbf{b}_{g} - \delta \mathbf{b}_{g})
  \end{aligned}
\end{equation}
\begin{equation}
  \begin{aligned}
    \frac{1}{2} \gamma \otimes (\boldsymbol{\omega}_{m} - \mathbf{b}_{g}) \otimes \delta \gamma + \gamma \otimes \dot{\delta \gamma} = \frac{1}{2} \gamma \otimes \delta \gamma \otimes (\boldsymbol{\omega}_{m} - \mathbf{n}_{g} - \mathbf{b}_{g} - \delta \mathbf{b}_{g})
  \end{aligned}
\end{equation}
- $\hat{\boldsymbol{\omega}} = \boldsymbol{\omega}_{m} -\mathbf{n}_{g} - \mathbf{b}_{g} - \delta \mathbf{b}_{g}$
- $\boldsymbol{\omega} = \boldsymbol{\omega}_{m} - \mathbf{b}_{g}$

Substituted to express:
\begin{equation}
  \begin{aligned}
    \frac{1}{2} \gamma \otimes \boldsymbol{\omega} \otimes \delta \gamma + \gamma \otimes \dot{\delta \gamma} = \frac{1}{2} \gamma \otimes \delta \gamma \otimes \hat{\boldsymbol{\omega}}
  \end{aligned}
\end{equation}

If $\gamma \otimes$ common to both sides is omitted and arranged, it is as follows.
\begin{equation}
  \begin{aligned}
    2 \dot{\delta \gamma}  & = \delta \gamma \otimes \hat{\boldsymbol{\omega}} - \boldsymbol{\omega} \otimes \delta \gamma \\
    & = \Omega_{R}(\hat{\boldsymbol{\omega}}) \delta \gamma - \Omega_{L}(\boldsymbol{\omega})\delta \gamma \\
    & = \begin{bmatrix} 0 & (\boldsymbol{\omega} - \hat{\boldsymbol{\omega}})^{\intercal} \\
      \hat{\boldsymbol{\omega}} - \boldsymbol{\omega} & - [\boldsymbol{\omega} + \hat{\boldsymbol{\omega}}]_{\times}
    \end{bmatrix} \delta \gamma
  \end{aligned}
\end{equation}

If the above expression is expressed as a matrix, it is as follows.

\begin{equation}
  \begin{aligned}
    \begin{bmatrix} 0 \\ \dot{\delta \boldsymbol{\theta}} \end{bmatrix} =
    \begin{bmatrix} 0 & (\delta \mathbf{b}_{g} + \mathbf{n}_{g})^{\intercal} \\
      -\delta \mathbf{b}_{g} - \mathbf{n}_{g} & - [2 \boldsymbol{\omega}_{m} - \mathbf{n}_{g} - 2 \mathbf{b}_{g} - \delta \mathbf{b}_{g}]_{\times}
    \end{bmatrix}
    \begin{bmatrix} 1 \\ \frac{1}{2} \delta \boldsymbol{\theta} \end{bmatrix}
  \end{aligned}
\end{equation}

Finally, $\dot{\delta \boldsymbol{\theta}}$ is

\begin{equation}
  \boxed{
  \begin{aligned}
     \dot{\delta \boldsymbol{\theta}} = -[\boldsymbol{\omega}_{m} - \mathbf{b}_{g}]_{\times} \delta \boldsymbol{\theta} - \delta \mathbf{b}_{g} - \mathbf{n}_{g}
  \end{aligned} }
\end{equation}

If we write the error state equation at once, it is as follows.

\begin{equation}
  \begin{aligned}
    \begin{bmatrix}
      \dot{\delta \alpha}^{b_{k}}_{t} \\
      \dot{\delta \beta}^{b_{k}}_{t} \\
      \dot{\delta \boldsymbol{\theta}}^{b_{k}}_{t} \\
      \dot{\delta \mathbf{b}_{a_{t}}} \\
      \dot{\delta \mathbf{b}_{g_{t}}} \\
    \end{bmatrix} =
    \begin{bmatrix}
      & \mathbf{I} & & & & \\
      & & -\mathbf{R}^{b_{k}}_{t}[\mathbf{a}_{m}-\mathbf{b}_{a}]_{\times} & -\mathbf{R}^{b_{k}}_{t} &  \\
      & & -[\boldsymbol{\omega}_{m} - \mathbf{b}_{g}]_{\times} & & -\mathbf{I} \\
      & & & & & \\
      & & & & &
    \end{bmatrix}
    \begin{bmatrix}
      \delta \alpha^{b_{k}}_{t} \\
      \delta \beta^{b_{k}}_{t} \\
      \delta \boldsymbol{\theta}^{b_{k}}_{t} \\
      \delta \mathbf{b}_{a_{t}} \\
      \delta \mathbf{b}_{g_{t}} \\
    \end{bmatrix}  +
    \begin{bmatrix}
      &&&\\
      -\mathbf{R}^{b_{k}}_{t} && \\
      &-\mathbf{I} && \\
      &&\mathbf{I}& \\
      &&&\mathbf{I}
    \end{bmatrix}
    \begin{bmatrix} \mathbf{n}_{a} \\ \mathbf{n}_{g} \\ \mathbf{n}_{b_{a}} \\ \mathbf{n}_{b_{g}} \end{bmatrix}
  \end{aligned}
\end{equation}

This can be compactly expressed as:
\begin{equation}
  \begin{aligned}
    \dot{\delta \mathbf{x}}_{t} = \mathbf{F}_{t} \delta \mathbf{x}_{t} + \mathbf{G}_{t}\mathbf{n}_{t}
  \end{aligned}
\end{equation}

The derivative of a function can be expressed as:

\begin{equation}
  \begin{aligned}
    \dot{\mathbf{x}} = \lim_{\delta t \rightarrow 0} \frac{\mathbf{x}(x + \delta t) - \mathbf{x}(t)}{\delta t}
  \end{aligned}
\end{equation}

For a sufficiently small $\delta t$, if lim is omitted and $\mathbf{x} \rightarrow \delta \mathbf{x}$ is written, it can be expressed as $\dot{\delta \mathbf{x}} = \frac{\delta \mathbf{x}(t +\delta t) - \delta \mathbf{x}(t)}{\delta t}$, which is written as follows. At this time, for convenience, it is expressed as $\mathbf{x}(t) \rightarrow \mathbf{x}_{t}$.
\begin{equation}
  \begin{aligned}
    & \delta \mathbf{x}_{t+\delta t} = \dot{\delta \mathbf{x}}\delta t + \delta \mathbf{x}_t \\
    & \delta \mathbf{x}_{t+\delta t} = \mathbf{F}_{t}\delta \mathbf{x}_{t} + \mathbf{G}_{t}\mathbf{n}_{t}\delta t + \delta \mathbf{x}_{t} \\
    & \boxed {\delta \mathbf{x}_{t+\delta t} = (\mathbf{I} + \mathbf{F}_{t}\delta t)\delta \mathbf{x}_{t} + \mathbf{G}_{t}\mathbf{n}_{t} \delta t}
  \end{aligned} \label{eq:4}
\end{equation}

According to the above equation, the covariance $\mathbf{P}^{b_{k}}_{b_{k+1}}$ is updated as follows, starting with the initial value $\mathbf{P}^{b_{k}}_{b_{k}}=0$. At this time, when $cov(\mathbf{x}) = \Sigma$, $cov(\mathbf{Ax})=\mathbf{A}\Sigma \mathbf{A}^{\intercal}$ properties are used.
\begin{equation}
  \begin{aligned}
    \mathbf{P}^{b_{k}}_{t+\delta t} = (\mathbf{I} + \mathbf{F}_{t}\delta t)\mathbf{P}^{b_{k}}_{t}(\mathbf{I}+\mathbf{F}_{t}\delta t)^{\intercal} + (\mathbf{G}_{t}\delta t) \mathbf{Q} (\mathbf{G}_{t}\delta t)^{\intercal}
  \end{aligned} \label{eq:12}
\end{equation}
- $\delta t$: IMU sampling time
- $\mathbf{Q}$: covariance value for noise $\mathbf{Q} = diag(\sigma^{2}_{a}, \sigma^{2}_{g}, \sigma^{2 }_{b_{a}}, \sigma^{2}_{b_{g}})$

Next, the first-order approximation Jacobian $\mathbf{J}^{b_{k}}_{b_{k+1}} = \frac{\partial \delta \mathbf{x}^{b_{k}}_{b_{k+1}}}{\partial \delta \mathbf{x}^{b_{k}}_{b_{k}}}$ value is updated as follows, starting with the initial value $\mathbf{J}^{b_{k}}_{b_{k}}=\mathbf{I}$.
\begin{equation}
  \begin{aligned}
    \mathbf{J}_{t+\delta t} = (\mathbf{I} + \mathbf{F}_{t}\delta t)\mathbf{J}_{t}, \ \ t \in [k,k+1]
  \end{aligned} \label{eq:13}
\end{equation}

Therefore, if expressed as a first-order approximation of the variable of preintegration ($\alpha^{b_{k}}_{b_{k+1}}, \beta^{b_{k}}_{b_{k+1}}, \gamma^{b_{k}}_{b_{k+1}}$), it is as follows.
\begin{equation}
  \begin{aligned}
    & \alpha^{b_{k}}_{b_{k+1}} \approx \hat{\alpha}^{b_{k}}_{b_{k+1}} + \mathbf{J}^{\alpha}_{b_{a}} \delta \mathbf{b}_{ak} + \mathbf{J}^{\alpha}_{b_{k}}\delta \mathbf{b}_{gk} \\
    & \beta^{b_{k}}_{b_{k+1}} \approx \hat{\beta}^{b_{k}}_{b_{k+1}} + \mathbf{J}^{\beta}_{b_{a}} \delta \mathbf{b}_{ak} + \mathbf{J}^{\beta}_{b_{k}}\delta \mathbf{b}_{gk} \\
    & \gamma^{b_{k}}_{b_{k+1}} = \hat{\gamma}^{b_{k}}_{b_{k+1}} \otimes \begin{bmatrix} 1 \\ \frac{1}{2} \mathbf{J}^{\gamma}_{b_{g}} \delta \mathbf{b}_{gk} \end{bmatrix}
  \end{aligned} \label{eq:14}
\end{equation}

In the above equation, $\mathbf{J}^{\alpha}_{b_{a}}$ is a submatrix meaning $\frac{\partial \delta \mathbf{x}^{b_{k}}_{b_{k+1}}}{\partial \delta \mathbf{b}_{a}}$ block of $\mathbf{J}_{b_{k+1}}$, and other matrices are submatrices with the same meaning. If this is used, when a sufficiently small bias value is updated due to VIO, instead of performing repropagation for all preintegrations, it can be approximated through the above equation. This has the advantage that a large amount of computation is reduced.

2.4. Error-state Kinematics in Discrete Time

The error state equations derived so far are equations for continuous signals. However, unlike the theory, the actual IMU data is a discrete signal that comes in at the sampling time $\delta t > 0$, so it is necessary to convert the previously described differential equation into a difference equation. For discrete transformation, various numerical integration methods such as Euler method, Midpoint method, and RK method are required. In this section, the error state equation described above is converted through the midpoint method, which is the method used in the actual VINS-fusion.

Midpoint numerical integration method:
In the midpoint numerical integration method, existing variables are converted as follows.

\begin{equation}
  \begin{aligned}
    & \mathbf{n}_{g_{t}} \rightarrow \frac{\mathbf{n}_{g_{t}} + \mathbf{n}_{g_{t+1}}}{2} \\
    & \boldsymbol{\omega}_{t} \rightarrow \frac{\boldsymbol{\omega}_{t} + \boldsymbol{\omega}_{t+1}}{2} \\
    & \mathbf{R}^{b_{k}}_{t} \rightarrow \frac{1}{2}(\mathbf{R}^{b_{k}}_{t} + \mathbf{R}^{b_{k}}_{t+1}) \\
    & \delta \beta^{b_{k}}_{t} \rightarrow  \frac{\delta \beta^{b_{k}}_{t} +\delta \beta^{b_{k}}_{t+1}}{2}
  \end{aligned}
\end{equation}

Using the above conversion and the previously described (\ref{eq:4}), the existing error state equation is discretized as follows. First, discretizing $\dot{\delta \boldsymbol{\theta}}$ gives:
\begin{equation}
  \begin{aligned}
    \dot{\delta \boldsymbol{\theta}} = -[\frac{\boldsymbol{\omega}_{t} + \boldsymbol{\omega}_{t+1}}{2} - \mathbf{b}_{g_{t}}]_{\times} \delta \boldsymbol{\theta}^{b_{k}}_{t} -\delta \mathbf{b}_{g_{t}} - \frac{\mathbf{n}_{g_{t}} + \mathbf{n}_{g_{t+1}}}{2}
  \end{aligned}
\end{equation}

Putting the above expression into (\ref{eq:4}) and expanding it, we get the following.
\begin{equation}
  \begin{aligned}
    \boxed{ \delta \boldsymbol{\theta}^{b_{k+1}}_{t+1} = \bigg( \mathbf{I} - \delta t [\frac{\boldsymbol{\omega}_{t} + \boldsymbol{\omega}_{t+1} - \mathbf{b}_{g_{t}}}{2}]_{\times} \bigg) \delta \boldsymbol{\theta}^{b_{k}}_{t} - \delta \mathbf{b}_{g_{t}} \delta t - \frac{\mathbf{n}_{g_{t}} + \mathbf{n}_{g_{t+1}}}{2} \delta t }
  \end{aligned} \label{eq:10}
\end{equation}

Next, discretizing $\dot{\delta \beta^{b_{k}}_{t}}$ gives:

\begin{equation}
  \begin{aligned}
    \dot{\delta \beta^{b_{k}}_{t}} = & -\frac{1}{2}\mathbf{R}^{b_{k}}_{t}[\mathbf{a}_{t} - \mathbf{b}_{a_{t}}]_{\times} \delta \boldsymbol{\theta}^{b_{k}}_{t} -\frac{1}{2}\mathbf{R}^{b_{k}}_{t+1}[\mathbf{a}_{t+1} - \mathbf{b}_{a_{t}}]_{\times} \delta \boldsymbol{\theta}^{b_{k}}_{t+1} - \frac{1}{2}(\mathbf{R}^{b_{k}}_{t} + \mathbf{R}^{b_{k}}_{t+1}) \delta \mathbf{b}_{a_{t}} \\ & - \frac{1}{2}(\mathbf{R}^{b_{k}}_{t}  + \mathbf{R}^{b_{k}}_{t+1})\mathbf{n}_{a_{t}}
  \end{aligned}
\end{equation}

By substituting $\delta \boldsymbol{\theta}^{b_{k+1}}_{t+1}$ previously calculated in the above expression and expanding it, the following expression can be obtained.

\begin{equation}
    \boxed{
  \begin{aligned}
    \delta \beta^{b_{k}}_{t+1} = & \ \delta \beta^{b_{k}}_{t} \\
    & +  \bigg( -\frac{\delta t}{2}\mathbf{R}^{b_{k}}_{t}[\mathbf{a}_{t} - \mathbf{b_{a_t}}]_{\times} - \frac{\delta t}{2} \mathbf{R}^{b_{k}}_{t+1}[\mathbf{a}_{t+1} - \mathbf{b}_{a_{t}}]_{\times} \delta \boldsymbol{\theta}^{b_{k}}_{t+1} \big( \mathbf{I} - \delta t[\frac{\boldsymbol{\omega}_{t} + \boldsymbol{\omega}_{t+1}}{2} - \mathbf{b}_{g_{t}}]_{\times} \big) \bigg) \delta \boldsymbol{\theta}^{b_{k}}_{t} \\
      & + \frac{\delta t^{2}}{2} \mathbf{R}^{b_{k}}_{t+1}(\mathbf{a}_{t+1} - \mathbf{b}_{a_{t}}) - \frac{\delta t^{2}}{4}\mathbf{R}^{b_{k}}_{t+1}(\mathbf{a}_{t+1}-\mathbf{b_{a_{t}}}) (\mathbf{n}_{g_{t}} + \mathbf{n}_{g_{t+1}}) - \frac{\delta t}{2}(\mathbf{R}^{b_{k}}_{t} + \mathbf{R}^{b_{k}}_{t+1})\delta \mathbf{b}_{a_{t}} \\
      & - \frac{\delta t}{2} \mathbf{R}^{b_{k}}_{t} \mathbf{n}_{a_{t}}  - \frac{\delta t}{2} \mathbf{R}^{b_{k}}_{t+1} \mathbf{n}_{a_{t+1}}
  \end{aligned} }
\end{equation}

Finally, discretizing $\dot{\delta \alpha^{b_{k}}_{t}}$ gives
\begin{equation}
  \begin{aligned}
    \dot{\delta \alpha^{b_{k}}_{t}} = \frac{1}{2} (\delta \beta^{b_{k}}_{t} + \delta \beta^{b_{k}}_{t+1})
  \end{aligned}
\end{equation}

Using the above formula and (\ref{eq:4}), $\delta \alpha^{b_{k}}_{t+1}$ is obtained as follows.
\begin{equation}
  \begin{aligned}
    \boxed{ \delta \alpha^{b_{k}}_{t+1} = \delta \alpha^{b_{k}}_{t}  + \frac{1}{2} \Big(\delta \beta^{b_{k}}_{t} + \delta \beta^{b_{k}}_{t+1} \Big)\delta t   }
  \end{aligned}
\end{equation}

The error-state system equation obtained so far is expressed in matrix form as follows.
\begin{equation}
  \begin{aligned}
    \begin{bmatrix}
      \delta \alpha_{k+1} \\
      \delta \boldsymbol{\theta}_{k+1} \\
      \delta \beta_{k+1} \\
      \delta \mathbf{b}_{a_{k+1}} \\
      \delta \mathbf{b}_{g_{k+1}} \\
    \end{bmatrix}
    = \begin{bmatrix}
      \mathbf{I} & \mathbf{F}_{01} & \delta t \mathbf{I} & \mathbf{F}_{03} & \mathbf{F}_{04} \\
      & \mathbf{F}_{11} &  &  & -\delta t \mathbf{I} \\
       & \mathbf{F}_{21} & \mathbf{I} & \mathbf{F}_{23} & \mathbf{F}_{24} \\
      &&&\mathbf{I}& \\
      &&&&\mathbf{I}
    \end{bmatrix}
    \begin{bmatrix}
      \delta \alpha_{k} \\
      \delta \boldsymbol{\theta}_{k} \\
      \delta \beta_{k} \\
      \delta \mathbf{b}_{a_{k}} \\
      \delta \mathbf{b}_{g_{k}} \\
    \end{bmatrix} +
    \begin{bmatrix}
      \mathbf{G}_{00} & \mathbf{G}_{01} & \mathbf{G}_{02} & \mathbf{G}_{03} &  &  \\
      & -\delta t /2 \mathbf{I} &  & -\delta t /2 \mathbf{I} &  &  \\
      -\frac{\mathbf{R}_{k}\delta t}{2} & \mathbf{G}_{21} & -\frac{\mathbf{R}_{k+1}\delta t}{2} & \mathbf{G}_{23} &  &  \\
      &&&&\delta t \mathbf{I} &  \\
      &&&& & \delta t \mathbf{I}
    \end{bmatrix}
    \begin{bmatrix}
      \mathbf{n}_{a_{k}} \\
      \mathbf{n}_{g_{k}} \\
      \mathbf{n}_{a_{k+1}} \\
      \mathbf{n}_{g_{k+1}} \\
      \mathbf{n}_{b_{a}} \\
      \mathbf{n}_{b_{g}} \\
    \end{bmatrix}
  \end{aligned}
\end{equation}

Each submatrix is as follows.
\begin{equation}
  \begin{aligned}
    & \mathbf{F}_{01} = -\frac{\delta t^{2}}{4} \mathbf{R}_{k}[\hat{\mathbf{a}}_{k} -\mathbf{b}_{a_{k}}]_{\times} - \frac{\delta t^{2}}{4} \mathbf{R}_{k+1} [\hat{\mathbf{a}}_{k+1} - \mathbf{b}_{a_{k}}]_{\times} \bigg( \mathbf{I} - [\frac{\hat{\boldsymbol{\omega}}_{k} + \hat{\boldsymbol{\omega}}_{k+1}}{2} - \mathbf{b}_{g_{k}}]_{\times} \delta t \bigg) \\
    & \mathbf{F}_{03} = -\frac{\delta t^{2}}{4}(\mathbf{R}_{k} + \mathbf{R}_{k+1}) \\
    & \mathbf{F}_{04} = \frac{\delta t^{3}}{4} \mathbf{R}_{k+1}[\hat{\mathbf{a}}_{k+1} -\mathbf{b}_{a_{k}}]_{\times} \\
    & \mathbf{F}_{11} = \mathbf{I} - [\frac{\hat{\boldsymbol{\omega}}_{k} + \hat{\boldsymbol{\omega}}_{k+1}}{2} - \mathbf{b}_{g_{k}}]_{\times} \delta t \\
    & \mathbf{F}_{21} = -\frac{\delta t}{2} \mathbf{R}_{k}[\hat{\mathbf{a}}_{k} -\mathbf{b}_{a_{k}}]_{\times} - \frac{\delta t}{2} \mathbf{R}_{k+1} [\hat{\mathbf{a}}_{k+1} - \mathbf{b}_{a_{k}}]_{\times} \bigg( \mathbf{I} - [\frac{\hat{\boldsymbol{\omega}}_{k} + \hat{\boldsymbol{\omega}}_{k+1}}{2} - \mathbf{b}_{g_{k}}]_{\times} \delta t \bigg) \\
    & \mathbf{F}_{23} = -\frac{\delta t}{2} (\mathbf{R}_{k} + \mathbf{R}_{k+1}) \\
    & \mathbf{F}_{24} = \frac{\delta t^{2}}{2} \mathbf{R}_{k+1}[\hat{\mathbf{a}}_{k+1} - \mathbf{b}_{a_{k}}]_{\times} \\ \\
    & \mathbf{G}_{00} = -\frac{\delta t^{2}}{4} \mathbf{R}_{k} \\
    & \mathbf{G}_{01} = \mathbf{G}_{03} = \frac{\delta t^{3}}{8} \mathbf{R}_{k+1} [\hat{\mathbf{a}}_{k+1} - \mathbf{b}_{a_{k}}]_{\times} \\
    & \mathbf{G}_{02} = -\frac{\delta t^{2}}{4} \mathbf{R}_{k+1} \\
    & \mathbf{G}_{21} = \mathbf{G}_{23} = \frac{\delta t^{2}}{4} \mathbf{R}_{k+1}[\hat{\mathbf{a}}_{k+1} - \mathbf{b}_{a_{k}}]_{\times}
  \end{aligned}
\end{equation}

Above Jacobian $\mathbf{F}, \mathbf{G}$ is implemented in integration_base.h :: midPointIntegration()  function of VINS-mono code.

VINS-mono uses the error state equation obtained so far to perform the following tasks.

At every $t$ instant between intervals $[b_{k}, b_{k+1}]$, the covariance $\mathbf{P}^{b_{k}}_{t}$ is obtained (\ref{eq:12}). At this time, the previously obtained discretized $\mathbf{F}, \mathbf{G}$ matrix is used. The covariance is continuously calculated until $b_{k+1}$ when the next keyframe is created, and the finally obtained $\mathbf{P}^{b_{k}}_{b_{k+1}}$ is used when optimizing VIO.
At every $t$ instant between intervals $[b_{k}, b_{k+1}]$, the covariance $\mathbf{J}^{b_{k}}_{t}$ is obtained (\ref{eq:13}). At this time, the previously obtained discretized $\mathbf{F}, \mathbf{G}$ matrix is used. When the bias $\mathbf{b}_{a}, \mathbf{b}_{g}$ value is optimized by VIO, it is updated to the existing preintegration value through (\ref{eq:14}).

3. Initialization

The initialization process of VINS-mono consists of the following 4 processes.

3.1. Vision-Only SfM in Sliding Window

The initialization process starts by performing vision-only SFM within a sliding window. First of all, suppose that two coordinate systems $F_{t}$ and $F_{t+1}$ are given in time order.

If more than 30 feature points are extracted from $F_{t}$ and at the same time the average parallax between $F_{t} \leftrightarrow F_{t+1}$ differs by more than 20 pixels, the 3D relative pose between the two coordinate systems (relative pose) is calculated through the Five-point method. At this time, parallax means the distance difference between two pixels on the 2D image plane. The relative pose can be estimated except for the scale value (up to scale).
Next, set $F_{t}$ to the reference coordinate system $c_0$. $F_{t}$ does not always mean the first coordinate system but means the previous coordinate system among the two coordinate systems $F_{t}$ and $F_{t+1}$.
Next, triangulate all feature points within the two coordinate systems.
EPnP is performed to obtain poses of other coordinate systems within the sliding window using triangulated 3D points.
Perform bundle adjustment to optimize all poses within the sliding window.

3.2. Visual-Inertial Alignment

1) Gyroscope Bias Calibration: After vision-only SFM is all performed, various parameters can be initialized using IMU preinegration and SFM values. First of all, given the two keyframe coordinate systems $\{b_{k}\}$ and $\{b_{k+1}\}$, the gyroscope bias value is calculated using $\mathbf{q}^{c_{0}}_{b_{k+1}}, \mathbf{q}^{c_{0}}_{b_{k}}$ obtained through SFM and $\gamma^{b_{k}}_{b_{k+1}}$ obtained through IMU preintegration. can be updated
\begin{equation}
  \begin{aligned}
    & \min_{\delta \mathbf{b}_{g}} \sum_{k \in \mathcal{B}} \left\| (\mathbf{q}^{c_{0}}_{b_{k+1}})^{-1} \otimes \mathbf{q}^{c_{0}}_{b_{k}} \otimes \gamma^{b_{k}}_{b_{k+1}} \right\|^{2} \\
    & = \min_{\delta \mathbf{b}_{g}} \sum_{k \in \mathcal{B}} \left\| \mathbf{q}^{b_{k+1}}_{b_{k}} \otimes \gamma^{b_{k}}_{b_{k+1}} \right\|^{2} \\
    & = \min_{\delta \mathbf{b}_{g}} \sum_{k \in \mathcal{B}} \left\| \mathbf{r}_{gyr} \right\|^{2}
  \end{aligned}
\end{equation}

Since the Jacobian $\mathbf{J}^{\gamma}_{\mathbf{b}_{g}} = -\delta t \mathbf{I}$ at (\ref{eq:10}), this allows As we update the expression, we update the optimal $\delta \mathbf{b}_{g}$.

\begin{equation}
  \begin{aligned}
\gamma^{b_{k}}_{b_{k+1}} \approx \hat{\gamma}^{b_{k}}_{b_{k+1}} \otimes \begin{bmatrix} 1 \\ \frac{1}{2} \mathbf{J}^{\gamma}_{\mathbf{b}_{g}} \delta \mathbf{b}_{g}  \end{bmatrix}
  \end{aligned}
\end{equation}

In code implementation, an expression is created in $\mathbf{J}^{\intercal}\mathbf{J} \delta \mathbf{b}_{g}^{*} = \mathbf{J}^{\intercal}\mathbf{r}_{gyr}$ format for all keyframes, and an optimal value is obtained at once through Cholesky decomposition.

2) Velocity, Gravity Vector, and Metric Scale Initialization : The relative pose between the camera and the IMU is obtained using the calibrated extrinsic parameter ${\color{red}\mathbf{p}^{b}_{c}}, {\color{blue} \mathbf{q}^{b}_{c}}$ as follows.

\begin{equation}
  \begin{aligned}
    & s \bar{\mathbf{p}}^{c_{0}}_{b_{k}} = s \bar{\mathbf{p}}^{c_{0}}_{c_{k}} - \mathbf{R}^{c_{0}}_{b_{k}} {\color{red}\mathbf{p}^{b}_{c}}\
    & \mathbf{q}^{c_0}_{b_{k}} = \mathbf{q}^{c_{0}}_{c_{k}} \otimes ({\color{blue}\mathbf{q}^{b}_{c}})^{-1} \
  \end{aligned} \label{eq:5}
\end{equation}

- $c_{0}$: The first coordinate system at the time initialization started
- $\bar{\mathbf{p}}$: position vector $\mathbf{p}$ up to scale
- $s$: scale value

In (\ref{eq:2}), if the reference coordinate system of all coordinate systems is converted from the world coordinate system $\{w\}$ to the $\{c_0\}$ coordinate system and the preintegation term is binomial, it can be expressed as follows.
\begin{equation}
  \begin{aligned}
    & \hat{\alpha}^{b_{k}}_{b_{k+1}} = \mathbf{R}^{b_{k}}_{c_{0}} \bigg( s\Big( \bar{\mathbf{p}}^{c_{0}}_{b_{k+1}} - \bar{\mathbf{p}}^{c_{0}}_{b_{k}} \Big) + \frac{1}{2}\mathbf{g}^{c_{0}}\delta t^{2} - \mathbf{R}^{c_{0}}_{b_{k}} \mathbf{v}^{b_{k}}_{b_{k}} \delta t \bigg) \\
    & \hat{\beta}^{b_{k}}_{b_{k+1}} = \mathbf{R}^{b_{k}}_{c_{0}} \Big( \mathbf{R}^{c_{0}}_{b_{k+1}}\mathbf{v}^{b_{k+1}}_{b_{k+1}} + \mathbf{g}^{c_{0}}\delta t - \mathbf{R}^{c_{0}}_{b_{k}}\mathbf{v}^{b_{k}}_{b_{k}} \Big)
  \end{aligned} \label{eq:6}
\end{equation}

From (\ref{eq:5}), (\ref{eq:6}) we can obtain a linear system of linear equations.
\begin{equation}
  \begin{aligned}
    \hat{\alpha}^{b_{k}}_{b_{k+1}} & = \mathbf{R}^{b_{k}}_{c_{0}} \bigg( s\Big( \bar{\mathbf{p}}^{c_{0}}_{b_{k+1}} - \bar{\mathbf{p}}^{c_{0}}_{b_{k}} \Big) + \frac{1}{2}\mathbf{g}^{c_{0}}\delta t^{2} - \mathbf{R}^{c_{0}}_{b_{k}} \mathbf{v}^{b_{k}}_{b_{k}} \delta t \bigg) \\
    & = \mathbf{R}^{b_{k}}_{c_{0}} \bigg( s \bar{\mathbf{p}}^{c_{0}}_{c_{k+1}} - \mathbf{R}^{c_{0}}_{b_{k+1}}\mathbf{p}^{b}_{c} - (s \bar{\mathbf{p}}^{c_{0}}_{c_{k}} - \mathbf{R}^{c_{0}}_{b_{k}}\mathbf{p}^{b}_{c})  + \frac{1}{2}\mathbf{g}^{c_{0}}\delta t^{2} - \mathbf{R}^{c_{0}}_{b_{k}} \mathbf{v}^{b_{k}}_{b_{k}} \delta t  \bigg) \\
    & = \mathbf{R}^{b_{k}}_{c_{0}} (\bar{\mathbf{p}}^{c_{0}}_{c_{k+1}} - \bar{\mathbf{p}}^{c_{0}}_{c_{k}})s  + \frac{1}{2} \mathbf{R}^{b_{k}}_{c_{0}}\mathbf{g}^{c_{0}}\delta t^{2} - \mathbf{v}^{b_{k}}_{b_{k}} \delta t + \mathbf{p}^{b}_{c} - \mathbf{R}^{b_{k}}_{c_{0}} \mathbf{R}^{c_{0}}_{b_{k+1}} \mathbf{p}^{b}_{c}
  \end{aligned} \label{eq:66}
\end{equation}

A linear system has the advantage of being able to obtain a solution relatively easily through various matrix decomposition methods. If the above equation is rearranged in the form of a linear system $\mathbf{Ax} = \mathbf{b}$, we get:
\begin{equation}
  \begin{aligned}
    \begin{bmatrix} \hat{\alpha}^{b_{k}}_{b_{k+1}} - \mathbf{p}^{b}_{c} + \mathbf{R}^{b_{k}}_{c_{0}}\mathbf{R}^{c{0}}_{b_{k+1}} \mathbf{p}^{b}_{c} \\
      \hat{\beta}^{b_{k}}_{b_{k+1}}
    \end{bmatrix} = \begin{bmatrix}
      \delta t \mathbf{I} & 0 & \frac{1}{2}\mathbf{R}^{b_{k}}_{c_{0}}\delta t^{2} & \mathbf{R}^{b_{k}}_{c_{0}}(\bar{\mathbf{p}}^{c_{0}}_{c_{k+1}} - \bar{\mathbf{p}}^{c_{0}}_{c_{k}}) \\
      -\mathbf{I} & \mathbf{R}^{b_{k}}_{c_{0}} \mathbf{R}^{c_{0}}_{b_{k+1}} & \delta t \mathbf{R}^{b_{k}}_{c_{0}} & 0
    \end{bmatrix}
    \begin{bmatrix}
        \mathbf{v}^{b_{k}}_{b_{k}} \\ \mathbf{v}^{b_{k+1}}_{b_{k+1}} \\ \mathbf{g}^{c_{0}} \\ s
    \end{bmatrix} \\
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}} = \mathbf{H}^{b_{k}}_{b_{k+1}} \mathcal{X}_{I} + \mathbf{n}^{b_{k}}_{b_{k+1}}
  \end{aligned}
\end{equation}
- $\mathcal{X}_{I} = [\mathbf{v}^{b_{0}}_{b_{0}}, \mathbf{v}^{b_{1}}_{b_{1}}, \cdots, \mathbf{v}^{b_{n}}_{b_{n}}, \mathbf{g}^{c_{0}}, s]$
- $\mathbf{n}^{b_{k}}_{b_{k+1}}$: $[b_{k}, b_{k+1}]$ 좌표계 사이의 모든 노이즈 항

This is the same as equations (10) and (11) of VINS-mono. $\mathbf{R}^{c_{0}}_{b_{k}}, \mathbf{R}^{c_{0}}_{b_{k+1}}, \bar{\mathbf{p}}^{c_{0}}_{c_{k}}, \bar{\mathbf{p}}^{c_{0}}_{c_{k+1}}$ in the above formula is the value obtained from SFM.
Therefore, the final optimization formula to be calculated can be expressed as:
\begin{equation}
  \begin{aligned}
    \min_{\mathcal{X}_{I}} \sum_{k \in \mathcal{B}} \left\|  \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}} - \mathbf{H}^{b_{k}}_{b_{k+1}} \mathcal{X}_{I} \right\|^{2}
  \end{aligned}
\end{equation}

3) Gravity Refinement: In the above process, the value of the gravity vector is firstly estimated and then fine-tuned and optimized again. Gravity is a constant with a fixed magnitude ($9.8m/s^2$), so if the estimated magnitude of gravity is greater than 9.8, it can be made smaller, and if it is less than 9.8, it can be made larger. The gravity vector of known magnitude has two degrees of freedom, so if we know two of the three vector values, the remaining values are determined automatically. Therefore, the gravity vector can be parameterized with a manifold having a two-dimensional tangent plane.
\begin{equation}
  \begin{aligned}
    \mathbf{g}^{c_0} = \left\| \mathbf{g} \right\| \frac{\mathbf{g}^{c_{0}}}{\left\| \mathbf{g}^{c_{0}} \right\| } + \begin{bmatrix} \mathbf{b}_{1} & \mathbf{b}_{2} \end{bmatrix} \begin{bmatrix} w_{1} \\ w_{2} \end{bmatrix}
  \end{aligned}
\end{equation}

In the above expression, $\mathbf{g}^{c_{0}}, \mathbf{b}_{1}$, and $\mathbf{b}_{2}$ are orthogonal to each other, of which $\mathbf{b }_{1}, \mathbf{b}_{2}$ are two basis vectors spanning the tangent plane. And $w_{1} and w_{2}$ mean perturbation or error, and are values determined by the primarily estimated gravity vector. The purpose of gravity refinement is to make it close to 0 through these $w_{1} and w_{2}$ optimizations.

4) Completing Initialization: When the fine adjustment up to the gravity vector is completed, next, a world coordinate system $(\cdot)^{w}$ with the gravity vector as the z-axis is created, and all the parameters of the existing $(\cdot)^{c_0}$ coordinate system are changed to the world coordinate system. Rotates in the coordinate system $(\cdot)^{w}$. At this time, the velocity of the IMU coordinate system is also rotated to the world coordinate system. Next, a scale is applied to the value created by visual SFM and it has a metric unit. At that point, all initialization processes are finished and all parameters are fed into the inputs of VIO.

4. Tightly-Coupled Nonlinear Optimization

4.1. Basic of the State Estimation

The state estimation problem can generally be interpreted as a problem of obtaining Maximum A Posteriori (MAP) of Bayesian probability. This is the same as calculating the conditional probability distribution of the robot's state in the presence of observed values.
\begin{equation}
  \begin{aligned}
    P(\mathbf{x} | \mathbf{z})
  \end{aligned}
\end{equation}
- $\mathbf{x}$: state variable
- $\mathbf{z}$: observed sensor data

Applying the Bayesian law to the above equation, we get the following equation:

\begin{equation}
  \begin{aligned}
    P(\mathbf{x} | \mathbf{z}) = \frac{P(\mathbf{z} | \mathbf{x}) P(\mathbf{x})}{P(\mathbf{z})} \propto P(\mathbf{z} | \mathbf{x})P(\mathbf{x})
  \end{aligned}
\end{equation}

In the above expression, $P(\mathbf{x}|\mathbf{z})$ is called posterior, $P(\mathbf{x})$ is called priori, and $P(\mathbf{z} | \mathbf{x} )$ is called the likelihood. In general, it is known that it is difficult to directly obtain the posterior. Therefore, a method of obtaining an optimal state variable in which the posterior is maximized by changing the prior and the likelihood is widely used, and this is called Maximum A Posteriori (MAP). In most cases, state estimation proceeds without knowing the prior information, so the MAP can be written as follows.
\begin{equation}
  \begin{aligned}
    \mathcal{X}^{*} = \arg\max P(\mathbf{x} | \mathbf{z}) \propto \arg\max P(\mathbf{z} | \mathbf{x})P(\mathbf{x})\propto \arg\max P(\mathbf{z} | \mathbf{x})
  \end{aligned} \label{eq:7}
\end{equation}

As shown in the above equation, the MAP problem is to find the state $\arg\max P(\mathbf{z} | \mathbf{x})$ that maximizes the observations. If the observation $\mathbf{z}$ follows a Gaussian distribution, then $\mathbf{z} \sim \mathcal{N}(\bar{\mathbf{z}}, \mathbf{Q})$ and (\ref{eq:7}) can be expressed as:
\begin{equation}
  \begin{aligned}
    \mathcal{X}^{*} = \arg\max P(\mathbf{z} | \mathbf{x}) = \arg\min \sum \left\| \mathbf{z} - h(\mathbf{x}) \right\|_{\mathbf{Q}}
  \end{aligned} \label{eq:8}
\end{equation}

In this case, $h(\cdot)$ represents the state function and $\left\| \mathbf{x} \right\|_{\mathbf{Q}} = \mathbf{x}^{\intercal}Q^{-1}\mathbf{x}$ gives the mahalanobis norm.

NOTICE: When the multidimensional Gaussian distribution satisfies $\mathbf{x} \sim \mathcal{N}(\mu, \mathbf{Q})$, the probability density function (pdf) of $\mathbf{x}$ is can be written as
\begin{equation}
  \begin{aligned}
    & P(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^{N} \det(\mathbf{Q})}} \exp \bigg(-\frac{1}{2}(\mathbf{x} - \mu)^{\intercal} \mathbf{Q}^{-1} (\mathbf{x} - \mu) \bigg) \\
    & -\ln P(\mathbf{x}) = \cancel{\ln \sqrt{ (2\pi)^{N} \det(\mathbf{Q}) }} + \frac{1}{2}(\mathbf{x}-\mu)^{\intercal} \mathbf{Q}^{-1} (\mathbf{x}- \mu)
  \end{aligned}
\end{equation}

In the above expression, the $\ln\sqrt{\cdots}$ part does not include $\mathbf{x}$, so only the $\frac{1}{2}(\cdots)$ part is $\mathbf{x}$ is used when optimizing

After that, (\ref{eq:8}) can find the optimal solution using nonlinear least squares. Specific methods for performing this largely include the Gauss-Newton (GN) method and the Levenberg-Marquardt (LM) method. Among them, let's solve the optimization formula using the GN method.
\begin{equation}
  \begin{aligned}
    \mathcal{X} = \arg\min_{\mathbf{x}} \left\| f(\mathbf{x}) \right\|_{\mathbf{P}}
  \end{aligned}
\end{equation}

In the above expression, $\mathbf{P}$ means the covariance matrix of the error, and $\left\| \cdot \right\|$ means the L2 norm. When $f(\mathbf{x})$ is expressed as a first-order Taylor approximation, it becomes $f(\mathbf{x} +\delta \mathbf{x}) \approx f(\mathbf{x}) + \mathbf{J}(\mathbf{x})\delta \mathbf{x}$, and the expression can be expanded as follows for the optimal increment $\delta \mathbf{x}^{*}$.
\begin{equation}
  \begin{aligned}
    \delta \mathbf{x}^{*} = \arg\min_{\delta \mathbf{x}} \left\| f(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \delta \mathbf{x} \right\|_{\mathbf{P}}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
   \left\| f(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \delta \mathbf{x} \right\|_{\mathbf{P}}  & = \Big( f(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \delta \mathbf{x} \Big)^{\intercal} \mathbf{P}^{-1} \Big( f(\mathbf{x}) + \mathbf{J}(\mathbf{x}) \delta \mathbf{x} \Big) \\
    & = f(\mathbf{x})^{-1}\mathbf{P}^{-1}f(\mathbf{x}) + 2 \mathbf{J}(\mathbf{x})^{\intercal} \mathbf{P}^{-1} f(\mathbf{x}) \delta \mathbf{x} + \delta \mathbf{x}^{\intercal} \mathbf{J}(\mathbf{x})^{\intercal} \mathbf{P}^{-1}\mathbf{J}(\mathbf{x})\delta \mathbf{x}
  \end{aligned}
\end{equation}

By differentiating the above expression with respect to $\delta \mathbf{x}$ and finding the minimum value with $(\cdot)' = 0$, the following expression can be obtained.
\begin{equation}
  \begin{aligned}
    \mathbf{J}(\mathbf{x})^{\intercal} \mathbf{P}^{-1}\mathbf{J}(\mathbf{x}) \delta \mathbf{x}^{*}  & = -\mathbf{J}(\mathbf{x})^{\intercal} \mathbf{P}^{-1} f(\mathbf{x}) \\
    \mathbf{H}\delta \mathbf{x}^{*} & = \mathbf{b}
  \end{aligned} \label{eq:9}
\end{equation}

The above equation can be optimized through various matrix decomposition methods such as SVD or Cholesky decomposition to find $\delta \mathbf{x}^{*}$. One cycle of GN is completed by finding the optimal solution and then updating it to the existing state.
\begin{equation}
  \begin{aligned}
    \mathbf{x} \leftarrow \mathbf{x} + \delta \mathbf{x}^{*}
  \end{aligned}
\end{equation}

4.2. Cost Function

When SLAM initialization is complete, next, tightly-coupled monocular VIO based on a sliding window is performed. VIO includes inverse depth $\lambda$ of feature points, velocity $\mathbf{v}_{k}$ of all coordinate systems, poses $\mathbf{p}_{k}, \mathbf{q}_{k}$, And the IMU bias $\mathbf{b}_{a}, \mathbf{b}_{g}$, and extrinsic $\mathbf{x}^{b}_{c}$ values all at once. are optimized
\begin{equation}
  \begin{aligned}
    & \mathcal{X} = [\mathbf{x}_{0}, \mathbf{x}_{1}, \cdots, \mathbf{x}_{n}, \mathbf{x}^{b}_{c}, \lambda_{0}, \lambda_{1}, \cdots, \lambda_{m}] \\
    & \mathbf{x}_{k} = [\mathbf{p}^{w}_{b_{k}}, \mathbf{v}^{w}_{b_{k}}, \mathbf{q}^{w}_{b_{k}}, \mathbf{b}_{a}, \mathbf{b}_{g}] \\
    & \mathbf{x}^{b}_{c} = [\mathbf{p}^{b}_{c}, \mathbf{q}^{b}_{c}]
  \end{aligned}
\end{equation}

In VINS-mono, the optimization function of VIO is defined as

\begin{equation}
  \begin{aligned}
    \min_{\mathcal{X}} \bigg\{ \left\| \mathbf{r}_{p} - \mathbf{J}_{p}\mathcal{X} \right\|_{\mathbf{P}_{M}} + \sum_{k \in \mathcal{B}} \left\| \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big) \right\|_{\mathbf{P}_{B}} + \sum_{(l,j) \in \mathcal{C}} \left\| \mathbf{r}_{\mathcal{C}} \Big( \hat{\mathbf{z}}^{c_{j}}_{l}, \mathcal{X} \Big) \right\|_{\mathbf{P}^{c_{j}}_{l}} \bigg\}
  \end{aligned}
\end{equation}

In the above expression, the preceding $\mathbf{r}_{p}, \mathbf{J}_{p}$ refers to priori information due to marginalization, and $\mathbf{r}_{\mathcal{B}}$ part means the residual of the IMU, and $\mathbf{r}_{\mathcal{C}}$ means the residual of the camera image. $\mathcal{B}$ means the set of all IMU measurement values within the sliding window, and $\mathcal{C}$ means the set of all feature points observed more than once within the sliding window. $\mathbf{P}_{M}, \mathbf{P}_{\mathcal{B}}, \mathbf{P}_{\mathcal{C}}$ are the covariance values of the prior, IMU, and visual residual, respectively.

4.3. IMU Model

In (\ref{eq:2}), the residual of the IMU measurement value can be expressed as follows. At this time, residual has the same meaning as error and means observed value - predicted value.
\begin{equation}
  \begin{aligned}
    \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big) = \begin{bmatrix}
      \delta \alpha^{b_{k}}_{b_{k+1}} \\
      \delta \boldsymbol{\theta}^{b_{k}}_{b_{k+1}} \\
      \delta \beta^{b_{k}}_{b_{k+1}} \\
      \delta \mathbf{b}_{a}\\
      \delta \mathbf{b}_{g}\\
    \end{bmatrix} = \begin{bmatrix}
      \mathbf{R}^{b_{k}}_{w}(\mathbf{p}^{w}_{b_{k+1}} - \mathbf{p}^{w}_{b_{k}} - \mathbf{v}^{w}_{b_{k}}\Delta t_{k} + \frac{1}{2} \mathbf{g}^{w}\Delta t_{k}^{2}) - \hat{\alpha}^{b_{k}}_{b_{k+1}} \\
      2 \Big[ \Big( \hat{\gamma}^{b_{k}}_{b_{k+1}} \Big)^{-1} \otimes (\mathbf{q}^{w}_{b_{k}})^{-1} \otimes \mathbf{q}^{w}_{b_{k+1}}  \Big]_{xyz} \\
      \mathbf{R}^{b_{k}}_{w} ( \mathbf{v}^{w}_{b_{k+1}} - \mathbf{v}^{w}_{b_{k}} + \mathbf{g}^{w}\Delta t_{k}) - \hat{\beta}^{b_{k}}_{b_{k+1}} \\
      \mathbf{b}_{a_{k+1}} - \mathbf{b}_{a_{k}} \\
      \mathbf{b}_{g_{k+1}} - \mathbf{b}_{g_{k}} \\
    \end{bmatrix}
  \end{aligned}
\end{equation}
- $[\cdot]_{xyz}$: An operator that extracts only the vector part $(x,y,z)$ from a quaternion separately
- $\delta \boldsymbol{\theta}^{b_{k}}_{b_{k+1}}$: Quaternion error state representation. Angular errors are marked with $\boldsymbol{\theta}$ instead of $\mathbf{q}$.
- $(\delta \alpha^{b_{k}}_{b_{k+1}}, \delta \beta^{b_{k}}_{b_{k+1}}, \delta \gamma^ {b_{k}}_{b_{k+1}})$: IMU measurements preintegrated between two images

The state variables to be optimized in the IMU residual are as follows.

\begin{equation}
  \begin{aligned}
      & \mathbf{x}[0] = [\mathbf{p}^{w}_{b_{k}}, \mathbf{q}^{w}_{b_{k}}] \\
      & \mathbf{x}[1] = [\mathbf{v}^{w}_{b_{k}}, \mathbf{b}_{a_{k}}, \mathbf{b}_{g_{k}} ] \\
      & \mathbf{x}[2] = [\mathbf{p}^{w}_{b_{k+1}}, \mathbf{q}^{w}_{b_{k+1}} ] \\
      & \mathbf{x}[3] = [\mathbf{v}^{w}_{b_{k+1}}, \mathbf{b}_{a_{k+1}}, \mathbf{b}_{g_{k+1}} ]
  \end{aligned}
\end{equation}

The above state variables are divided into four groups based on what is implemented in the code. The dimensions of each state variable are 7, 9, 7, and 9, respectively.

NOTICE: When optimizing state variables, when quaternion $\mathbf{q}$ is used as a direction notation, it is not easy to use for optimization due to problems such as over-parameterization. Therefore, by using only the imaginary part $(x,y,z)$ of $\mathbf{q}$ excluding the real part $w$, optimization based on lie algebra so(3) without additional constraints can be performed. there is. At this time, a quaternion using only the imaginary part becomes a pure quaternion, so it has the same physical meaning as the rotation vector $\boldsymbol{\theta}$. When updating the pose after optimization, it is converted into unit quaternion $\mathbf{q}$ again and updated. This is implemented in 'LocalParameterization' of ceres solver and automatically converted and updated. This applies equally to the rotation of the visual residual as well as the IMU.

Thus, the dimensions of the conventional IMU residual Jacobian are $<15\times7, \ 15\times9, \ 15\times 7, \ 15\times 9>$, but since only 3 parameters are used when updating the rotation, instead of 4, The last column of $\mathbf{J}[0], \mathbf{J}[2]$ is always 0. Therefore, for convenience of expression, $\mathbf{J}[0]$ and $\mathbf{J}[2]$ are expressed as matrices of size $<15 \times 6>$.

So the Jacobian is: For detailed Jacobian derivation process, refer to the Appendix.
\begin{equation}
  \begin{aligned}
    \mathbf{J}[0]_{15\times 6} = \frac{\partial \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big)}{\partial [\mathbf{p}^{w}_{b_{k}}, \mathbf{q}^{w}_{b_{k}}]} = \begin{bmatrix}
      -\mathbf{R}^{b_{k}}_{w} & [\mathbf{R}^{b_{k}}_{w}(\mathbf{p}^{w}_{b_{k+1}} - \mathbf{p}^{w}_{b_{k}} - \mathbf{v}^{w}_{b_{k}}\Delta t_{k} + \frac{1}{2} \mathbf{g}^{w}\Delta t_{k}^{2})]_{\times} \\
       0 & [\gamma^{b_{k}}_{b_{k+1}}]_{R}[(\mathbf{q}^{w}_{b_{k+1}})^{-1} \otimes \mathbf{q}^{b_{k}}_{w}]_{L, 3\times3} \\
      0 & [\mathbf{R}^{b_{k}}_{w} (\mathbf{p}^{w}_{b_{k+1}} - \mathbf{p}^{w}_{b_{k}} + \mathbf{g}^{w} \Delta t_{k})]_{\times} \\
      0&0\\
      0&0
    \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[1]_{15\times9} = \frac{\partial \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big)}{\partial [\mathbf{v}^{w}_{b_{k}}, \mathbf{b}_{a_{k}}, \mathbf{b}_{g_{k}}]} = \begin{bmatrix}
      -\mathbf{R}^{b_{k}}_{w} \Delta t_{k} & -\mathbf{J}^{\alpha}_{b_{a}} & -\mathbf{J}^{\alpha}_{b_{g}} \\
      0&0& -[(\hat{\gamma}^{b_{k}}_{b_{k+1}})^{-1} \otimes (\mathbf{q}^{w}_{b_{k}})^{-1} \otimes \mathbf{q}^{w}_{b_{k+1}}]_{R,3\times3} \mathbf{J}^{\gamma}_{b_{g}} \\
      -\mathbf{R}^{b_{k}}_{w} & -\mathbf{J}^{\beta}_{b_{a}} & -\mathbf{J}^{\beta}_{b_{g}} \\
      0 & - \mathbf{I} & 0 \\
      0 & 0 &-\mathbf{I}
    \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[2]_{15\times 6} = \frac{\partial \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big)}{\partial [\mathbf{p}^{w}_{b_{k+1}}, \mathbf{q}^{w}_{b_{k+1}}]} = \begin{bmatrix}
      \mathbf{R}^{b_{k}}_{w} & 0 \\
      0 & [(\hat{\gamma}^{b_{k}}_{b_{k+1}})^{-1} \otimes (\mathbf{q}^{w}_{b_{k}})^{-1} \otimes \mathbf{q}^{w}_{b_{k+1}}]_{L} \\
      0&0\\
      0&0\\
      0&0
    \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[3]_{15\times9} = \frac{\partial \mathbf{r}_{\mathcal{B}} \Big( \hat{\mathbf{z}}^{b_{k}}_{b_{k+1}}, \mathcal{X} \Big)}{\partial [\mathbf{v}^{w}_{b_{k+1}}, \mathbf{b}_{a_{k+1}}, \mathbf{b}_{g_{k+1}}]} = \begin{bmatrix}
      0&0&0\\
      0&0&0\\
      \mathbf{R}^{b_{k}}_{w} &0&0 \\
      0&\mathbf{I}&0 \\
      0&0&\mathbf{I}
    \end{bmatrix}
  \end{aligned}
\end{equation}

4.4. Vision Model

When the $l$th feature point is first observed in the $i$th image, the visual residual observed for the feature point in the $j$th image is as follows.
\begin{equation}
  \begin{aligned}
    & \hat{\bar{\mathbf{P}}}^{c_{j}}_{l} && = \pi^{-1}_{c} \bigg( \begin{bmatrix} \hat{u}^{c_{j}}_{l} \\ \hat{v}^{c_{j}}_{l} \end{bmatrix} \bigg) \\
    & && = \begin{bmatrix} \hat{x}^{c_{j}}_{l} & \hat{y}^{c_{j}}_{l} & \hat{z}^{c_{j}}_{l} \end{bmatrix}^{\intercal} \\
    & \tilde{\mathbf{P}}^{c_{j}}_{l} && = \mathbf{R}^{c}_{b} \bigg\{ \mathbf{R}^{b_{j}}_{w} \bigg[ \mathbf{R}^{w}_{b_{i}} \bigg( \mathbf{R}^{b}_{c} \frac{1}{\lambda_{l}} \pi^{-1}_{c}  \begin{bmatrix} \hat{u}^{c_{i}}_{l} \\ \hat{v}^{c_{i}}_{l} \end{bmatrix}  + \mathbf{P}^{b}_{c} \bigg) + \mathbf{P}^{w}_{b_{i}} \bigg] + {\color{red}\mathbf{P}^{b_{j}}_{w}} \bigg\} + {\color{blue}\mathbf{P}^{c}_{b}} \\
    & && = \mathbf{R}^{c}_{b} \bigg\{ \mathbf{R}^{b_{j}}_{w} \bigg[ \mathbf{R}^{w}_{b_{i}} \bigg( \mathbf{R}^{b}_{c} \frac{1}{\lambda_{l}} \pi^{-1}_{c}  \begin{bmatrix} \hat{u}^{c_{i}}_{l} \\ \hat{v}^{c_{i}}_{l} \end{bmatrix}  + \mathbf{P}^{b}_{c} \bigg) + \mathbf{P}^{w}_{b_{i}} {\color{red}- \mathbf{P}^{w}_{b_{j}}} \bigg] {\color{blue}- \mathbf{P}^{b}_{c}}  \bigg\}  \\
    & && = \begin{bmatrix} x^{c_{j}}_{l} & y^{c_{j}}_{l} & z^{c_{j}}_{l} \end{bmatrix}^{\intercal}
  \end{aligned}
\end{equation}
- $\hat{\bar{\mathbf{P}}}^{c_{j}}_{l} \in \mathbb{R}^{3}$: $l$th feature point in $j$th image predicted value of . For optimization, we have the magnitude of the unit vector
- $\tilde{\mathbf{P}}^{c_{j}}_{l} \in \mathbb{R}^{3}$: observed value of $l$th feature point in $j$th image
- $\pi^{-1}_{c}(\cdot)$: back projection function. Converts a 2D point on the image plane to a 3D point on the world. In the VINS-mono paper, the vector passing through the function becomes the unit vector. This seems to be because optimization can be performed only when converted to a unit vector.
- $\begin{bmatrix} \hat{u}^{c_{j}}_{l} \\ \hat{v}^{c_{j}}_{l} \end{bmatrix}$: Pixel coordinates of the $l$th feature point seen in the $j$th image

The above formula is the same as the VINS-mono (17) formula. The above expression is a form of expanding $\mathbf{T}^{a}_{b}\mathbf{P}^{b} = \mathbf{P}^{a}$ when an arbitrary transformation matrix $\mathbf{T}^{a}_{b} = \begin{bmatrix} \mathbf{R}^{a}_{b} & \mathbf{P}^{a}_{b} \\ 0 & 1 \end{bmatrix}$ is given.
\begin{equation}
  \begin{aligned}
    \mathbf{R}^{a}_{b}\mathbf{P}^{b} + \mathbf{P}^{a}_{b} = \mathbf{P}^{a}
  \end{aligned}
\end{equation}

In addition, you can refer to the property that the shaded part in the above equation is the inverse matrix $(\mathbf{T}^{a}_{b})^{-1} = \mathbf{T}^{b}_{a}  = \begin{bmatrix} \mathbf{R}^{b}_{a} & -\mathbf{R}^{b}_{a}\mathbf{P}^{a}_{b} \\ 0 & 1 \end{bmatrix} =\begin{bmatrix} \mathbf{R}^{b}_{a} & \mathbf{P}^{b}_{a} \\ 0 & 1 \end{bmatrix}$ of the transformation matrix.
\begin{equation}
  \begin{aligned}
    -\mathbf{R}^{b}_{a}\mathbf{P}^{a}_{b} = \mathbf{P}^{b}_{a} \quad \Rightarrow \quad \mathbf{P}^{a}_{b} = -\mathbf{R}^{a}_{b}\mathbf{P}^{b}_{a}
  \end{aligned}
\end{equation}

The author of VINS-mono used a method similar to the gravity vector optimization method, referring to the fact that the visual residual vector has two degrees of freedom due to the characteristics of image pixel coordinates. Therefore, when $\hat{\bar{\mathbf{P}}}^{c_{j}}_{l}$ is manifolded and given a tangent plane that spans from any two orthogonal basis vectors $[\mathbf{b}_{1}, \mathbf{b}_{2}]$, optimization is performed in the direction of minimizing the size of the error value $w_{1},w_{2}$.

First of all, if the basis vector is not used, the visual residual is as follows.
\begin{equation}
  \begin{aligned}
    \mathbf{r}_{c}(\hat{\mathbf{z}}^{c_{j}}_{l}, \mathcal{X}) & = \bigg( \frac{\tilde{\mathbf{P}}^{c_{j}}_{l}}{\tilde{\mathbf{P}}^{c_{j}}_{l}.z }  - \hat{\bar{\mathbf{P}}}^{c_{j}}_{l} \bigg)_{xy} \in \mathbb{R}^{2\times 1}
  \end{aligned}
\end{equation}

The visual residual using the basis vectors $[\mathbf{b}_{1}, \mathbf{b}_{2}]$ can be expressed as follows.
\begin{equation}
  \begin{aligned}
    \mathbf{r}_{c}(\hat{\mathbf{z}}^{c_{j}}_{l}, \mathcal{X}) & = \begin{bmatrix} \mathbf{b}_{1} & \mathbf{b}_{2} \end{bmatrix}^{\intercal} \cdot \bigg( \frac{\tilde{\mathbf{P}}^{c_{j}}_{l}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|}  - \hat{\bar{\mathbf{P}}}^{c_{j}}_{l} \bigg) \\
  \end{aligned}
\end{equation}
- $\frac{\tilde{\mathbf{P}}^{c_{j}}_{l}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|}$: normalize the magnitude of a vector to make it a unit vector

The variables to be optimized in the visual residual are as follows.
\begin{equation}
  \begin{aligned}
    & \mathbf{x}[0] = [\mathbf{p}^{w}_{b_{i}}, \mathbf{q}^{w}_{b_{i}}] \\
      & \mathbf{x}[1] = [\mathbf{p}^{w}_{b_{j}}, \mathbf{q}^{w}_{b_{j}}] \\
      & \mathbf{x}[2] = [\mathbf{p}^{b}_{b_{c}}, \mathbf{q}^{b}_{b_{c}}] \\
      & \mathbf{x}[3] = [\lambda_{l}]
  \end{aligned}
\end{equation}

The Jacobian can be calculated using the chain rule as

\begin{equation}
  \begin{aligned}
    \frac{\partial \mathbf{r}_{c}}{\partial \mathbf{x}} = \frac{\partial \mathbf{r}_{c}}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}} \cdot \frac{\partial \tilde{\mathbf{P}}^{c_{j}}_{l}}{\partial \mathbf{x}}
  \end{aligned}
\end{equation}

The Jacobian of the visual residual without basis is $\frac{\partial \mathbf{r}_{c}}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}} }$ is equivalent to:
\begin{equation}
  \begin{aligned}
    \frac{\partial \mathbf{r}_{c}}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}} = \begin{bmatrix} \frac{1}{z^{c_{j}}_{l}} & 0 & -\frac{x^{c_{j}}_{l}}{(z^{c_{j}}_{l})^{2}} \\ 0 & \frac{1}{z^{c_{j}}_{l}} & -\frac{y^{c_{j}}_{l}}{(z^{c_{j}}_{l})^{2}} \end{bmatrix}
  \end{aligned}
\end{equation}
- $\mathbf{r}_{c} = \begin{bmatrix} \frac{x^{c_{j}}_{l}}{z^{c_{j}}_{l}} - \hat{u}^{c_{j}}_{l} & \frac{y^{c_{j}}_{l}}{z^{c_{j}}_{l}} - - \hat{v}^{c_{j}}_{l} \end{bmatrix}^{\intercal}$
- $\tilde{\mathbf{P}}_{l}^{c_{j}} = \begin{bmatrix} x^{c_{j}}_{l} & y^{c_{j}}_{l} & z^{c_{j}}_{l} \end{bmatrix}^{\intercal}$

On the other hand, the Jacobian with basis $\mathbf{b}_{1}, \mathbf{b}_{2}$ applied can be obtained as follows.
\begin{equation}
  \begin{aligned}
    \frac{\partial \mathbf{r}_{c}}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}} & = \begin{bmatrix} \mathbf{b}_{1} & \mathbf{b}_{2} \end{bmatrix}^{\intercal} \cdot \frac{\partial \bigg( \frac{\tilde{\mathbf{P}}^{c_{j}}_{l}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|}  - \hat{\bar{\mathbf{P}}}^{c_{j}}_{l} \bigg)}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}} \\
    & = \frac{\partial \frac{\tilde{\mathbf{P}}^{c_{j}}_{l}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|}}{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}
    = \frac{1}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|} \mathbf{I} -  \frac{\tilde{\mathbf{P}}^{c_{j}}_{l}  \frac{\partial \left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|}{\partial \tilde{\mathbf{P}}^{c_{j}}_{l}}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{2}} \\
    & = \begin{bmatrix}
      \frac{1}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|} - \frac{x^{2}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} &  -\frac{xy}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} & -\frac{xz}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}}\\
      -\frac{xy}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} & \frac{1}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|} - \frac{y^{2}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} & -\frac{yz}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} \\
      - \frac{xz}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} & -\frac{yz}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}} & \frac{1}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|} - \frac{z^{2}}{\left\| \tilde{\mathbf{P}}^{c_{j}}_{l} \right\|^{3}}
    \end{bmatrix}
  \end{aligned}
\end{equation}

Next, $\frac{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}{\partial \mathbf{x}}$ can be obtained as follows. For detailed Jacobian derivation, refer to the Appendix.
\begin{equation}
  \begin{aligned}
    \mathbf{J}[0]_{3\times 6} = \frac{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}{\partial \begin{bmatrix} \mathbf{p}^{w}_{b_{i}} & \mathbf{q}^{w}_{b_{i}} \end{bmatrix}} = \begin{bmatrix} \mathbf{R}^{c}_{b}\mathbf{R}^{b_{j}}_{w} & -\mathbf{R}^{c}_{b}\mathbf{R}^{b_{j}}_{w}\mathbf{R}^{w}_{b_{i}}\bigg(\Big[ \frac{1}{\lambda_{l}}\mathbf{R}^{b}_{c}\hat{\bar{\mathbf{P}}}^{c_{i}}_{l} + \mathbf{P}^{b}_{c} \Big]_{\times} \bigg) \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[1]_{3\times 6} = \frac{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}{\partial \begin{bmatrix} \mathbf{p}^{w}_{b_{j}} & \mathbf{q}^{w}_{b_{j}} \end{bmatrix}} = \begin{bmatrix}
      -\mathbf{R}^{c}_{b} \mathbf{R}^{b_{j}}_{w} &
      \mathbf{R}^{c}_{b} \bigg[ \mathbf{R}^{b_{j}}_{w} \bigg( \mathbf{R}^{w}_{b_{i}}\bigg( \mathbf{R}^{b}_{c} \frac{1}{\lambda_{l}}\hat{\bar{\mathbf{P}}}^{c_{i}}_{l} \bigg) + \mathbf{P}^{w}_{b_{i}}  - \mathbf{P}^{w}_{b_{j}}\bigg) \bigg]_{\times}
    \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[2]_{3\times6}  = \frac{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}{\partial \begin{bmatrix} \mathbf{p}^{b}_{c} & \mathbf{q}^{b}_{c} \end{bmatrix}} = \begin{bmatrix}
      \mathbf{R}^{c}_{b}(\mathbf{R}^{b_{j}}_{w}\mathbf{R}^{w}_{b_{i}}-\mathbf{I}) & \bigg[ \mathbf{R}^{c}_{b}\bigg(\mathbf{R}^{b_{j}}_{w} \bigg( \mathbf{R}^{w}_{b_{i}}\mathbf{P}^{b}_{c} + \mathbf{P}^{w}_{b_{i}} - \mathbf{P}^{w}_{b_{j}} \bigg) -\mathbf{P}^{b}_{c} \bigg) \bigg]_{\times} + \\
      &  \bigg[ \mathbf{R}^{c}_{b}\mathbf{R}^{b_{j}}_{w}\mathbf{R}^{w}_{b_{i}} \mathbf{R}^{b}_{c} \frac{1}{\lambda_{l}} \hat{\bar{\mathbf{P}}}^{c_{i}}_{l} \bigg]_{\times} - \mathbf{R}^{c}_{b}\mathbf{R}^{b_{j}}_{w}\mathbf{R}^{w}_{b_{i}}\mathbf{R}^{b}_{c} \bigg( \frac{1}{\lambda_{l}} [\hat{\bar{\mathbf{P}}}^{c_{i}}_{l}]_{\times} \bigg)
    \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \mathbf{J}[3]_{3\times1} =  \frac{\partial \tilde{\mathbf{P}}_{l}^{c_{j}}}{\partial \lambda_{l}} = \begin{bmatrix} -\frac{1}{\lambda_{l}^{2}} \mathbf{R}^{c}_{b}\mathbf{R}^{b_{j}}_{w}\mathbf{R}^{w}_{b_{i}}\mathbf{R}^{b}_{c} \hat{\bar{\mathbf{P}}}^{c_{i}}_{l} \end{bmatrix}
  \end{aligned}
\end{equation}

5. Marginalization

As time goes by, the state variable of VIO increases, and the amount of calculation accordingly increases exponentially. In order to reduce the amount of computation without loss of information, it is necessary to transfer the information of previous (or old) observation values to the prior term and remove it, which is called marginalization. Let's say $\mathcal{X}_{m}$ is the state that disappears after marginalization is applied, and $\mathcal{X}_{r}$ is the state that remains.

In (\ref{eq:9}), old observations are separately marked and set as the marginalization part, and then the marginalization part and the remaining part are aligned as shown below.
\begin{equation}
  \begin{aligned}
    \begin{bmatrix} \mathbf{H}_{mm} & \mathbf{H}_{mr} \\ \mathbf{H}_{rm} & \mathbf{H}_{rr} \end{bmatrix}
    \begin{bmatrix} \delta \mathbf{x}_{m} \\ \delta \mathbf{x}_{r} \end{bmatrix} =
    \begin{bmatrix} \mathbf{b}_{m} \\ \mathbf{b}_{r} \end{bmatrix}
  \end{aligned}
\end{equation}

In order to perform marginalization, if schur complement is performed on the above matrix, the matrix is transformed as follows.
\begin{equation}
  \begin{aligned}
    \begin{bmatrix} \mathbf{I} & 0 \\ -\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1} & \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{H}_{mm} & \mathbf{H}_{mr} \\ \mathbf{H}_{rm} & \mathbf{H}_{rr} \end{bmatrix} \begin{bmatrix} \delta \mathbf{x}_{m} \\ \delta \mathbf{x}_{r} \end{bmatrix} =
    \begin{bmatrix} \mathbf{I} & 0 \\ -\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1} & \mathbf{I} \end{bmatrix}
    \begin{bmatrix} \mathbf{b}_{m} \\ \mathbf{b}_{r} \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \begin{bmatrix} \mathbf{H}_{mm} & \mathbf{H}_{mr} \\ 0 & \underbrace{\mathbf{H}_{rr} -\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1}\mathbf{H}_{mr} }_{\mathbf{H}_{p}}  \end{bmatrix}
    \begin{bmatrix} \delta \mathbf{x}_{m} \\ \delta \mathbf{x}_{r} \end{bmatrix} =
    \begin{bmatrix} \mathbf{b}_{m} \\ \underbrace{\mathbf{b}_{r}-\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1}\mathbf{b}_{m} }_{\mathbf{b_{p}}} \end{bmatrix}
  \end{aligned}
\end{equation}

\begin{equation}
  \begin{aligned}
    \therefore  ( \mathbf{H}_{rr}-\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1}\mathbf{H}_{mr} )\delta \mathbf{x}_{r} & = \mathbf{b}_{r}-\mathbf{H}_{rm}\mathbf{H}_{mm}^{-1}\mathbf{b}_{m}  \\
     \mathbf{H}_{p} \delta \mathbf{x}_{r} & = \mathbf{b}_{p}
  \end{aligned}
\end{equation}

If only $[m,n]$ instances are maintained to maintain the number of keyframes in the sliding window, the state before $m$ is marginalized and becomes a prior term. Therefore, the MAP problem can be written as:
\begin{equation}
  \begin{aligned}
    \mathcal{X}_{m:n}^{*} = \arg\min_{\mathcal{X}_{m:n}} \sum^{n}_{t=m}\sum_{k \in \mathcal{S}} \bigg\{ \left\| \mathbf{z}^{k}_{t} - h^{k}_{t}(\mathcal{X}_{m:n}) \right\|^{2}_{\Omega^{k}_{t}}  + (\mathbf{H}_{p} \delta \mathbf{x}_{r} - \mathbf{b}_{p}) \bigg\}
  \end{aligned}
\end{equation}
- $\mathcal{S}$: set of observations
- $\mathcal{X}_{m:n}$: state variable between instances $m$ and $n$

6. Global Optimization in the VINS-Fusion

In 2019, VINS-fusion, which uses the framework of VINS-mono as a basis and estimates global pose, was proposed. The locally estimated VIO pose is fused with global sensor data such as GPS by pose graph optimization (PGO). When performing PGO, the transformation matrix $\mathbf{T}^{G}_{L}$ between $\{L\}$ in the local coordinate system and $\{G\}$ in the global coordinate system is estimated, and the local state variables become global. converted to a coordinate system. The state variables estimated by PGO are as follows.
\begin{equation}
  \begin{aligned}
    \mathcal{X} = \begin{bmatrix} \mathbf{q}^{G}_{i} & \mathbf{P}^{G}_{i} & \cdots & \mathbf{q}^{G}_{n} & \mathbf{P}^{G}_{n} \end{bmatrix}
  \end{aligned}
\end{equation}

In VINS-fusion, every node has a robot's pose in the global coordinate system (GPS coordinate system), and the edge between two consecutive nodes means a relative pose and is estimated through VIO. The local pose and global pose can be expressed as follows.
\begin{equation}
  \begin{aligned}
    & \mathbf{T}^{L}_{i} = \begin{bmatrix} \mathbf{R}^{L}_{i} & \mathbf{P}^{L}_{i} \\ 0 &1 \end{bmatrix} \\
    & \mathbf{T}^{G}_{i} = \begin{bmatrix} \mathbf{R}^{G}_{i} & \mathbf{P}^{G}_{i} \\ 0 & 1 \end{bmatrix}
  \end{aligned}
\end{equation}

At this time, $i$ means the $i$th node and $L$ means the local reference frame. $G$ means the global reference coordinate system. Given two consecutive nodes $i,j$, the relative pose between the two nodes can be derived in two ways as follows.
\begin{equation}
  \begin{aligned}
    & _{L}\mathbf{T}^{i}_{j} = \begin{bmatrix} \mathbf{R}^{L}_{i} & \mathbf{P}^{L}_{i} \\ 0 & 1 \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{R}^{L}_{j} & \mathbf{P}^{L}_{j} \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} _{L}\mathbf{R}^{i}_{j} & _{L}\mathbf{P}^{i}_{ij} \\ 0 &1 \end{bmatrix} \\
    & _{G}\mathbf{T}^{i}_{j} = \begin{bmatrix} \mathbf{R}^{i}_{G} \mathbf{R}^{G}_{j} & \mathbf{R}^{i}_{G}(\mathbf{P}^{G}_{j} - \mathbf{P}^{G}_{i}) \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} _{G}\mathbf{R}^{i}_{j} & _{G}\mathbf{P}^{i}_{ij} \\ 0 & 1 \end{bmatrix}
  \end{aligned}
\end{equation}

At this time, the author of VINS-fusion set the point where the first GPS signal was captured as the origin of the global coordinate system and set the direction using the ENU (East-North-Up) coordinate system. In the above equation, the residual is:
\begin{equation}
  \begin{aligned}
    \begin{bmatrix} \delta \mathbf{P}_{ij} \\ \delta \boldsymbol{\theta}_{ij} \end{bmatrix} = \begin{bmatrix} _{G}\mathbf{P}^{i}_{ij} - \ _{L}\mathbf{P}^{i}_{ij} \\ 2(_{L}\mathbf{q}^{i}_{j})^{-1} (_{G}\mathbf{q}^{i}_{j}) \end{bmatrix}
  \end{aligned}
\end{equation}
- $_{L}\mathbf{T}^{i}_{j}$: Relative pose of $i$ node and $j$ node viewed from local coordinate system
- $_{G}\mathbf{T}^{i}_{j}$: Relative pose of $i$ node and $j$ node as seen from the global coordinate system

In addition, the following formula is established by the GPS location constraints.
\begin{equation}
  \begin{aligned}
    \delta \mathbf{P}_{i} = \begin{bmatrix} \hat{\mathbf{P}}^{G}_{i} - \tilde{\mathbf{P}}^{G}_{i} \end{bmatrix}
  \end{aligned}
\end{equation}
- $\hat{\mathbf{P}}^{G}_{i}$: Position estimate of node $i$ in global coordinate system
- $\tilde{\mathbf{P}}^{G}_{i}$: Observed position of node $i$ in global coordinate system

The standard deviation (stdev) value for the global position error can be obtained by the GPS position estimation algorithm. Standard deviations for local position and rotation were fixed at 0.1 [m] and 0.01 [rad], respectively, on the VINS-fusion code.

PGO is optimized using ceres-solver and is updated at a fairly sparse cycle compared to VIO. Accordingly, it is mentioned in the thesis that drift-free pose is estimated by increasing the window size, but when the code is analyzed, it seems not implemented in the code. In PGO, many nodes are updated at once, but the transformation matrix $\mathbf{T}^{G}_{L}$ between local and global coordinates is estimated only for the most recent local coordinate system.

The author of this paper (Yibin Wu) expects that estimating the transformation matrix for every node will be effective in reducing errors. And he says time synchronization for VIO and GPS measurements is not properly implemented. This time synchronization problem can cause a serious pose error, and since the VIO and GPS operate in a loosely-coupled manner, the GPS measurement value has a limitation in that the value of the IMU sensor cannot be corrected.

7. References

[1] Qin, Tong, Peiliang Li, and Shaojie Shen. "Vins-mono: A robust and versatile monocular visual-inertial state estimator." IEEE Transactions on Robotics 34.4 (2018): 1004-1020.

[2] Wu, Yibin. "Formula Derivation and Analysis of the VINS-Mono." arXiv preprint arXiv:1912.11986 (2019).

[3] Sola, Joan. "Quaternion kinematics for the error-state Kalman filter." arXiv preprint arXiv:1711.02508 (2017).

8. Closure

PDF version is available soon

저작자표시

'English' 카테고리의 다른 글

[En] Notes on Lie Theory (SO(3), SE(3)) (0)	2023.01.17
[En] Notes on Plücker Coordinate (0)	2023.01.17
[SLAM][En] Errors and Jacobian Derivations for SLAM Part 2 (0)	2023.01.17
[SLAM][En] Errors and Jacobian Derivations for SLAM Part 1 (0)	2023.01.17
[En] Vim - Useful plugins and C++/Python development environment (1)	2023.01.16

A L I D A

[SLAM][En] Notes on Formula Derivation and Analysis of the VINS-mono

1. Introduction