칼만 필터(Kalman Filter) 개념 정리 (+ KF, EKF, ESKF, IEKF, IESKF) Part 1

본 포스트는 필자가 Kalman filter 공부하면서 배운 내용을 정리한 포스트이다.

다음 포스팅 칼만 필터(Kalman filter) 개념 정리 Part 2에서 더 많은 내용을 확인할 수 있다.
Kalman filter를 이해하는데 필요한 기반 지식은 확률 이론(Probability Theory) 개념 정리 포스팅을 참조하면 된다.
Particle filter에 대해 알고 싶으면 파티클 필터(Particle Filter) 개념 정리 포스팅을 참조하면 된다.

arXiv preprint

English version

Notes on Kalman Filter (KF, EKF, ESKF, IEKF, IESKF)

The Kalman Filter (KF) is a powerful mathematical tool widely used for state estimation in various domains, including Simultaneous Localization and Mapping (SLAM). This paper presents an in-depth introduction to the Kalman Filter and explores its several e

arxiv.org

Preliminaries

칼만 필터(Kalman filter)는 시간에 따라 변하는 시스템의 상태를 추정하는 방법 중 하나로써 시스템 모델의 예측값과 노이즈가 포함된 관측값을 바탕으로 현재 상태를 재귀적으로 예측하고 업데이트하는 알고리즘을 말한다.

해당 섹션에서는 칼만 필터를 본격적으로 설명하기 앞서 기반이 되는 지식들을 간단히 소개한다.

Estimation theory

추정 이론(estimation theory)은 관측된 데이터를 바탕으로 모델의 파라미터나 상태를 예측하는 다양한 방법을 정리한 이론이다. 데이터 분석, 신호처리, 기계학습, 금융, 로봇공학 등 다양한 분야에서 널리 쓰이고 있으며 주로 불확실성을 다루는 과정에서 정확한 결정을 내리기 위한 필수적인 도구로 사용되고 있다. 보다 자세한 내용은 해당 포스팅을 참조하면 된다.

예를 들어 위 그림과 같이 평균 몸무게가 70kg인 사람을 1년 동안 측정한 데이터가 주어졌다고 가정하자. 그래프를 이루고 있는 초록색 데이터들은 $x[n]$이며 데이터들의 중심값은 추정하고자 하는 파라미터 $\theta$가 된다. 그리고 $\theta$로부터 위 아래로 벌어진 정도(=분산)는 $w[n]$ 을 통해 수학적으로 표현할 수 있다.

위와 같이 관측된 데이터 $x[n]$을 바탕으로 파라미터 $\theta$를 추정하는 문제는 다음과 같은 간단한 수학적 모델로 표현할 수 있다.
\begin{equation}
\begin{aligned}
x[n] = \theta + w[n] \quad n=0,1,\cdots,N-1
\end{aligned}
\end{equation}
- $x[n]$: 관측된 데이터
- $\theta$: 추정하고자 하는 파라미터
- $w[n]$: 랜덤 노이즈

Bayesian philosophy

추정 이론에는 추정해야 하는 파라미터 $\theta$를 보는 관점에 따라 크게 빈도주의와 베이지안 관점으로 나뉜다.

Frequentist: 추정해야 하는 파라미터 $\theta$가 미지의 결정론적(deterministic) 파라미터로 보는 빈도주의적 관점
Bayesian: 추정해야 하는 파라미터 $\theta$가 사전 확률분포(prior)를 가지는 확률 변수(random variable, r.v.)로 간주하는 관점

\begin{equation}
\begin{aligned}
& \text{Frequentist: } \quad \underbrace{x[n]}_{\text{r.v.}} = \underbrace{\theta}_{\text{deterministic}}+ w[n] \\
& \text{Bayesian: } \quad \underbrace{x[n]}_{\text{r.v.}} = \underbrace{\theta}_{\text{r.v.}} + w[n] \\
\end{aligned}
\end{equation}

베이지안 철학은 만약 우리가 파라미터 $\theta$에 대한 사전(prior) 정보를 알고 있다면 이는 더 나은 추정에 활용될 수 있다는 모티브에서 출발하였으며 이를 위해서는 $\theta$에 대한 prior pdf가 미리 주어져 있거나 계산할 수 있어야 한다는 특징이 존재한다.

베이지안 철학 관점에서는 파라미터 $\theta$ 또한 확률변수로 모델링할 수 있으므로 다음과 같은 베이지안 룰이 성립한다.
\begin{equation}
\begin{aligned}
p(\theta|x) &  = \frac{ p(x|\theta) p(\theta) }{p(x)}
\end{aligned}
\end{equation}
- $p(\theta|x)$: 관측된 데이터 $x$가 주어졌을 때 파라미터 $\theta$의 사후 조건부 확률 분포(posterior)
- $p(x|\theta)$:  $\theta$가 주어졌을 때 $x$의 조건부 확률 분포, 또는 가능도(likelihood)

- $p(\theta)$: $\theta$ 사전 확률 분포(prior)
- $p(x)$: $x$의 확률 분포. 추정하고자 하는 파라미터 $\theta$와 무관하므로 일반적으로 고려되지 않는다. 분모에 곱해져서 전체 확률의 크기를 1로 정규화해주는 값으로 $\eta$로 치환되기도 한다.

Estimation problem

추정 문제는 일반적으로 다음과 같이 도식화하여 나타낼 수 있다.

추정 문제의 목적은 주어진 데이터 $(t_i, \mathbf{x}_i)$를 사용하여 최적의 모델 $\boldsymbol{\theta}(t)$를 찾는 것이다. 이 때, 어느 시점을 추정하느냐에 따라 다른 추정 문제가 된다.

Filtering : 필터링은 관측 데이터 $\{x[0],x[1],\cdots,x[N-1]\}$가 주어졌을 때 $\theta=x[N-1]$을 추정하는 문제를 말한다. 최적의 파라미터를 추정함으로써 우리는 신호에서 노이즈를 필터링하고자 한다. 필터링에서는 파라미터가 현재와 과거 데이터에만 의존하는 것에 유의하자.}
Smoothing: 스무딩은 관측 데이터 $\{x[0],x[1],\cdots,x[N-1]\}$이 주어졌을 때 중간에 있는 $\theta=s[n]$을 추정하는 경우를 말한다. 예를 들어 $s[1]$을 추정하기 위해 모든 관측 데이터가 사용된다. 당연하게도 스무딩은 모든 데이터가 관측되기 전에는 수행할 수 없다.
Prediction: 예측은 관측 데이터 $\{x[0],x[1],\cdots,x[N-1]\}$가 주어졌을 때 $\theta = x[N-1+l]$을 추정하는 경우를 말한다. 이 때, $l$은 임의의 양수이다.
Interpolation: 보간은 관측 데이터 $\{x[0], \cdots ,x[n-1], x[n+1,\cdots,x[N-1]\}$이 주어졌을 떄 $\theta=x[n]$을 추정하는 경우를 말한다.

Dynamic system

시간에 따라 상태 변수가 변하는 시스템은 다음과 같이 모델링할 수 있다.
\begin{equation}
\begin{aligned}
& \text{Motion Model: } & \mathbf{x}_{t} = \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t} ) + \mathbf{w}_{t} \\
& \text{Observation Model: } & \mathbf{z}_{t} = \mathbf{h}(\mathbf{x}_{t}) + \mathbf{v}_{t}
\end{aligned}
\end{equation}
- $\mathbf{x}_{t}$: $t$ 시점에서 모델의 상태 변수
- $\mathbf{u}_{t}$: $t$ 시점에서 모델의 제어 입력
- $\mathbf{z}_{t}$: $t$ 시점에서 관측값
- $\mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t} )$: 이전 상태 $\mathbf{x}_{t-1}$와 현재 제어 입력 $\mathbf{u}_{t}$으로부터 현재 상태 $\mathbf{x}_t$를 예측하는 모션 모델 함수
- $\mathbf{h}(\mathbf{x}_{t})$: 현재 상태 변수 $\mathbf{x}_{t}$를 관측값 $\mathbf{z}_{t}$으로 변환해주는 관측 모델 함수
- $\mathbf{w}_{t}$: $t$ 시점에서 모션 모델의 노이즈
- $\mathbf{v}_{t}$: $t$ 시점에서 관측 모델의 노이즈

위 표기법을 사용하여 앞서 추정 문제를 다시 그려보면 다음과 같다.

위와 같은 동적 시스템을 그래프로 그려보면 다음과 같다.

동적 시스템은 일반적으로 상태 예측(prediction) 단계와 업데이트(update, 또는 correction) 단계로 나뉜다. 예측 단계는 다음과 같이 현재까지 측정한 모든 관측값 $\mathbf{z}_{0:t}$과 현재 상태 $\mathbf{x}_{t}$를 통해 다음 상태 $\mathbf{x}_{t+1}$를 예측하는 것을 말한다.

업데이트 단계는 다음과 같이 다음 상태 $\mathbf{x}_{t+1}$로부터 새로운 관측값 $\mathbf{z}_{t+1}$을 얻는 과정을 말한다.

Recursive bayes filter

제어입력과 관측값이 주어졌을 때 현재 상태 $\mathbf{x}_{t}$의 믿을만한 정도, 또는 $\mathbf{x}_{t}$에 대한 Belief $\text{bel}(\mathbf{x}_{t})$은 같이 정의한다.
\begin{equation}
  \begin{aligned}
    \text{bel}(\mathbf{x}_{t}) = p(\mathbf{x}_{t} \ | \ \mathbf{z}_{1:t}, \mathbf{u}_{1:t})
  \end{aligned}
\end{equation}
- $\mathbf{x}_{t}$ : t 시간에서 상태 변수
- $\mathbf{z}_{1:t} = \{\mathbf{z}_{1}, \cdots, \mathbf{z}_{t} \}$ : 1~t 시간에서 관측값
- $\mathbf{u}_{1:t} = \{\mathbf{u}_{1}, \cdots, \mathbf{u}_{t} \}$ : 1~t 시간에서 제어입력
- $\text{bel}(\mathbf{x}_{t})$ : Belief of $\mathbf{x}_{t}$라고 불리며 시작 시간부터 $t$초까지 센서를 통한 관측 $ \mathbf{z}_{1:t}$과 제어입력 $\mathbf{u}_{1:t}$으로 인해 현재 로봇이 $\mathbf{x}_{t}$에 위치할 확률(믿을만한 정도)을 의미

이전 섹션에서 언급한 그래프로 설명하면 다음 부분이 $\text{bel}(\mathbf{x}_{t})$에 해당한다.

$\text{bel}(\cdot)$은 bayesian rule에 따라 표현되고 전개되기 때문에 bayes filter라고도 불린다. 위 식을 markov assumption과 bayesian rule을 사용하여 전개하면 재귀적인 필터를 유도할 수 있고 이를 recursive bayes filter라고 한다.
\begin{equation}
  \begin{aligned}
    & \text{bel}(\mathbf{x}_{t}) = \eta \cdot p( \mathbf{z}_{t} \ | \  \mathbf{x}_{t})\overline{\text{bel}}(\mathbf{x}_{t}) \\
    & \overline{\text{bel}}(\mathbf{x}_{t}) = \int p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t})\text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1}\\
  \end{aligned}
\end{equation}
- $\eta = 1/p(\mathbf{z}_{t}|\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})$ : 확률분포의 넓이를 1로 정규화하여 확률분포의 정의를 유지시켜주는 값
- $p( \mathbf{z}_{t} \ | \ \mathbf{x}_{t})$ : 관측 모델(observation model)
- $\int p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1},\mathbf{u}_{t}) d \mathbf{x}_{t-1}$ : 모션 모델(motion model)

Recursive bayes filter는 위와 같이 이전 스텝의 $\text{bel}(\mathbf{x}_{t-1})$로부터 현재 스텝의 $\text{bel}(\mathbf{x}_{t})$를 구할 수 있으므로 재귀 필터라고 부른다.

Derivation of recursive bayes filter

Recursive bayes filter의 수식은 다음과 같이 유도된다.
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1}\\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t-1}) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \overline{\text{bel}}(\mathbf{x}_{t}) \\
\end{aligned}
\end{equation}

Step 1:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p({\color{Cyan} \mathbf{x}_{t} } | {\color{Cyan} \mathbf{z}_{1:t}}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p({\color{Cyan} \mathbf{z}_{t}} | {\color{Cyan} \mathbf{x}_{t}}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( {\color{Cyan} \mathbf{x}_{t}} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t})
\end{aligned}
\end{equation}
Bayesian rule을 적용한다. $p(x|y) = \frac{p(y|x)p(x)}{p(y)}$

Step 2:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  {\color{Cyan} \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t} } ) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  {\color{Cyan} \mathbf{x}_{t} } ) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
\end{aligned}
\end{equation}

현재 상태는 바로 이전 상태에만 의존성을 지니는 Markov Assumption을 적용한다.

Step 3:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot {\color{Cyan} p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) } \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot {\color{Cyan}  \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1} } \\
\end{aligned}
\end{equation}

총합의 법칙(Law of total probability) 또는 Marginalization를 적용한다. $p(x) = \int_{y} p(x|y) \cdot p(y) dy$

Step 4:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},{\color{Cyan} \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t} } ) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1}\\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1}, {\color{Cyan} \mathbf{u}_{t} } ) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1} \\
\end{aligned}
\end{equation}
현재 상태는 바로 이전 상태에만 의존성을 지니는 Markov Assumption을 적용한다.

Step 5:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1}\\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, {\color{Cyan}  \mathbf{u}_{1:t} } ) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, {\color{Cyan} \mathbf{u}_{1:t-1} } ) d\mathbf{x}_{t-1} \\
\end{aligned}
\end{equation}
$t$ 시점에서 제어 입력 $\mathbf{u}_t$는 $t-1$ 시점에서 상태 변수 $\mathbf{x}_{t-1}$에 영향을 주지 않으므로 생략한다.

Step 6:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1}\\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot {\color{Cyan} p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t-1})} d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot {\color{Cyan} \text{bel}(\mathbf{x}_{t-1}) } d\mathbf{x}_{t-1} \\
\end{aligned}
\end{equation}

$p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t-1})$는 $\text{bel}(\mathbf{x}_{t-1})$의 정의와 동일하므로 치환한다.

Step 7:
\begin{equation}
\begin{aligned}
\text{bel}(\mathbf{x}_{t}) & = p( \mathbf{x}_{t}  | \mathbf{z}_{1:t}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}, \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot p( \mathbf{x}_{t} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1}\\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t}) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot p(\mathbf{x}_{t-1} | \mathbf{z}_{1:t-1}, \mathbf{u}_{1:t-1}) d\mathbf{x}_{t-1} \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot {\color{Cyan} \int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot \text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1} } \\
& = \eta \cdot p(\mathbf{z}_{t} |  \mathbf{x}_{t}) \cdot  {\color{Cyan}  \overline{\text{bel}}(\mathbf{x}_{t}) }\\
\end{aligned}
\end{equation}

$\int_{\mathbf{x}_{t-1}} p( \mathbf{x}_{t} | \mathbf{x}_{t-1},\mathbf{u}_{t}) \cdot \text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1}$를 $\overline{\text{bel}}(\mathbf{x}_{t})$로 치환한다.

Gaussian belief case

$\text{bel}(\mathbf{x}_{t})$가 가우시안 분포를 따르는 경우 이를 특별히 칼만 필터(kalman filter)라고 한다.
\begin{equation}
  \begin{aligned}
     & \overline{\text{bel}}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1}) \quad \text{(Kalman Filter Prediction)} \\
   & \text{bel}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t}) \quad \text{(Kalman Filter Correction)}
  \end{aligned}
\end{equation}

평균과 분산은 $(\hat{\mathbf{x}}$, $\mathbf{P})$ 또는 $(\hat{\boldsymbol{\mu}}, \boldsymbol{\Sigma})$로 표현하기도 한다. 이는 표기법만 다를 뿐 동일한 의미를 지닌다.

Kalman filter (KF)

NOMENCLATURE of kalman filter

스칼라는 일반 소문자로 표기한다 e.g., a
벡터는 굵은 소문자로 표기한다 e.g., $\mathbf{a}$
행렬은 굵은(bold) 대문자로 표기한다 e.g., $\mathbf{R}$
prediction: $\overline{\text{bel}}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t-1}, {\mathbf{P}}_{t|t-1})$
- $\hat{\mathbf{x}}_{t|t-1}$ : $t-1$ 스텝의 correction 값이 주어졌을 때 $t$ 스텝의 평균. 일부 문헌은 $\mathbf{x}^{-}_{t}$로도 표기함.
- $\hat{\mathbf{P}}_{t|t-1}$ : $t-1$ 스텝의 correction 값이 주어졌을 때 $t$ 스텝의 공분산. 일부 문헌은 $\mathbf{P}^{-}_{t}$로도 표기함.
correction: $\text{bel}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t})$
- $\hat{\mathbf{x}}_{t|t}$ : $t$ 스텝의 prediction 값이 주어졌을 때 $t$ 스텝의 평균. 일부 문헌은 $\mathbf{x}^{+}_{t}$로도 표기함.
- $\hat{\mathbf{P}}_{t|t}$ : $t$ 스텝의 prediction 값이 주어졌을 때 $t$ 스텝의 공분산. 일부 문헌은 $\mathbf{P}^{+}_{t}$로도 표기함.

시간 $t$에 로봇의 위치를 $\mathbf{x}_{t}$, 로봇의 센서로 부터 관측한 값을 $\mathbf{z}_{t}$, 로봇의 제어입력을 $\mathbf{u}_{t}$라고 하면 이를 통해 모션 모델(motion model)과 관측 모델(observation model)을 정의할 수 있다. 이 때, 모션 모델과 관측 모델은 선형이어야(linear model) 한다는 제약조건이 있다. 모션 모델과 관측 모델은 다음과 같다.
\begin{equation}
  \begin{aligned}
    & \text{Motion Model: } & \mathbf{x}_{t} = \mathbf{F}_{t}\mathbf{x}_{t-1} + \mathbf{B}_{t}\mathbf{u}_{t} + \mathbf{w}_{t} \\
    & \text{Observation Model: } & \mathbf{z}_{t} = \mathbf{H}_{t}\mathbf{x}_{t} + \mathbf{v}_{t}
  \end{aligned}
\end{equation}
- $\mathbf{x}_{t}$: 모델의 상태 변수(state variable)
- $\mathbf{u}_{t}$: 모델의 입력(input)
- $\mathbf{z}_{t}$: 모델의 관측값(measurement)
- $\mathbf{F}_{t}$: 모델의 상태천이(state transition) 행렬
- $\mathbf{B}_{t}$: 모델에 입력 $\mathbf{u}_{t}$가 주어졌을 때 $\mathbf{u}_{t}$를 상태 변수로 변환해주는 행렬
- $\mathbf{H}_{t}$: 모델의 관측(observation) 행렬
- $\mathbf{w}_{t} \sim \mathcal{N}(0, \mathbf{Q}_{t})$: 모션 모델의 노이즈. $\mathbf{Q}_{t}$는 $\mathbf{w}_{t}$의 공분산 행렬을 의미한다.
- $\mathbf{v}_{t} \sim \mathcal{N}(0, \mathbf{R}_{t})$: 관측 모델의 노이즈. $\mathbf{R}_{t}$는 $\mathbf{v}_{t}$의 공분산 행렬을 의미한다.

확률변수가 모두 가우시안 분포를 따른다고 가정하면 $p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t}), p(\mathbf{z}_{t} \ | \ \mathbf{x}_{t})$는 다음과 같이 나타낼 수 있다.
\begin{equation} \label{eq:kf1}
  \begin{aligned}
     & p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t}) && \sim \mathcal{N}(\mathbf{F}_{t}\mathbf{x}_{t-1} + \mathbf{B}_{t}\mathbf{u}_{t}, \mathbf{Q}_{t}) \\
     & && = \frac{1}{ \sqrt{\text{det}(2\pi \mathbf{Q}_t)}}\exp\bigg( -\frac{1}{2}(\mathbf{x}_{t} -\mathbf{F}_{t}\mathbf{x}_{t-1}-\mathbf{B}_{t}\mathbf{u}_t)^{\intercal}\mathbf{Q}_t^{-1}(\mathbf{x}_{t}-\mathbf{F}_{t}\mathbf{x}_{t-1}-\mathbf{B}_{t}\mathbf{u}_{t}) \bigg)\\
  \end{aligned}
\end{equation}
\begin{equation} \label{eq:kf2}
  \begin{aligned}
    & p(\mathbf{z}_{t} \ | \ \mathbf{x}_{t})  && \sim \mathcal{N}(\mathbf{H}_{t}\mathbf{x}_{t}, \mathbf{R}_{t}) \\
    & && = \frac{1}{ \sqrt{\text{det}(2\pi \mathbf{R}_t)}}\exp\bigg( -\frac{1}{2}(\mathbf{z}_{t}-\mathbf{H}_{t}\mathbf{x}_{t})^{\intercal}\mathbf{R}_{t}^{-1}(\mathbf{z}_{t}-\mathbf{H}_{t}\mathbf{x}_{t}) \bigg)
  \end{aligned}
\end{equation}

다음으로 칼만 필터를 통해 구해야 하는 $\overline{\text{bel}}(\mathbf{x}_{t}), \text{bel}(\mathbf{x}_{t})$은 아래와 같이 나타낼 수 있다.
\begin{equation} \label{eq:kf3}
    \begin{aligned}
        & \overline{\text{bel}}(\mathbf{x}_{t}) = \int p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t})\text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1} \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1}) \\
        & \text{bel}(\mathbf{x}_{t}) = \eta \cdot p( \mathbf{z}_{t} \ | \  \mathbf{x}_{t})\overline{\text{bel}}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t})
    \end{aligned}
\end{equation}

칼만 필터는 prediction 스텝에서 이전 스텝의 posterior 값과 모션 모델을 사용하여 예측값 $\overline{\text{bel}}(\mathbf{x}_{t})$을 먼저 구한 후 correction 스텝에서 예측값과 관측 모델을 사용하여 보정된 값 $\text{bel}(\mathbf{x}_{t})$을 구하는 방식으로 동작한다. (\ref{eq:kf1}), (\ref{eq:kf2})를 (\ref{eq:kf3})에 대입하여 정리하면 prediction, correction 스텝의 평균과 공분산 $(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1}), (\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t})$을 각각 구할 수 있다. 자세한 유도 과정은 "Derivation of Kalman filter" 섹션을 참조하면 된다

초기값 $\text{bel}(\mathbf{x}_{0})$은 다음과 같이 주어진다.
\begin{equation}
  \begin{aligned}
    \text{bel}(\mathbf{x}_{0}) \sim \mathcal{N}(\hat{\mathbf{x}}_{0}, \mathbf{P}_{0})
  \end{aligned}
\end{equation}
- $\hat{\mathbf{x}}_{0}$: 일반적으로 0으로 설정한다
- $\mathbf{P}_{0}$ : 일반적으로 작은 값(<1e-2)으로 설정한다.

Prediction step

Prediction은 $\overline{\text{bel}}(\mathbf{x}_{t})$를 구하는 과정을 말한다. $\overline{\text{bel}}(\mathbf{x}_{t})$는 가우시안 분포를 따르므로 평균 $\hat{\mathbf{x}}_{t|t-1}$과 분산 $\mathbf{P}_{t|t-1}$을 각각 구해보면 아래와 같이 구할 수 있다.
\begin{equation}
  \boxed{ \begin{aligned}
    & \hat{\mathbf{x}}_{t|t-1} = \mathbf{F}_{t}\hat{\mathbf{x}}_{t-1|t-1} + \mathbf{B}_{t}\mathbf{u}_{t} \\
    & \mathbf{P}_{t|t-1} = \mathbf{F}_{t}\mathbf{P}_{t-1|t-1}\mathbf{F}_{t}^{\intercal} + \mathbf{Q}_{t}
  \end{aligned} }
\end{equation}

Correction step

Correction은 $\text{bel}(\mathbf{x}_{t})$를 구하는 과정을 말한다. $\text{bel}(\mathbf{x}_{t})$ 또한 가우시안 분포를 따르므로 평균 $\hat{\mathbf{x}}_{t|t}$과 분산 $\mathbf{P}_{t|t}$을 각각 구해보면 다음과 같다. 이 때, $\mathbf{K}_{t}$는 칼만 게인(kalman gain)을 의미한다.
\begin{equation}
\boxed{ \begin{aligned}
  &     \mathbf{K}_{t} = \mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal}(\mathbf{H}_{t}\mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal} + \mathbf{R}_{t})^{-1} \\
    & \hat{\mathbf{x}}_{t|t}  = \hat{\mathbf{x}}_{t|t-1} + \mathbf{K}_{t}( \mathbf{z}_{t} - \mathbf{H}_{t}\hat{\mathbf{x}}_{t|t-1}) \\
    &     \mathbf{P}_{t|t}    = (\mathbf{I} - \mathbf{K}_{t}\mathbf{H}_{t})\mathbf{P}_{t|t-1}
  \end{aligned} }
\end{equation}

1D Kalman filter

지금까지 설명한 칼만 필터는 상태 변수가 벡터인(=$\mathbf{x}_t$) 경우에 대한 내용이었다. 다음으로 상태 변수가 스칼라인(=$x_t$) 1D 칼만 필터를 살펴보자. 1D 칼만 필터는 기존 nD 칼만 필터와 모든 내용이 동일하지만 행렬이 아닌 스칼라 값을 통해 식이 구성된다는 점이 다르다. 1D 칼만 필터 버전으로 나타낸 Belief는 다음과 같다.
\begin{equation}
\begin{aligned}
& \overline{\text{bel}}(x_{t}) \sim \mathcal{N}(\bar{\mu}_t, \bar{\sigma}^2_t) \\
& \text{bel}(x_{t}) \sim \mathcal{N}(\mu_{t}, \sigma^2_{t})
\end{aligned}
\end{equation}
- $\bar{\mu}_t$:  prediction 스텝의 평균
- $\bar{\sigma}^2_t$:  prediction 스텝의 분산
- ${\mu}_t$:  correction 스텝의 평균
- ${\sigma}^2_t$:  correction 스텝의 분산

모션 모델과 관측 모델은 다음과 같다.
\begin{equation}
\begin{aligned}
& \text{Motion Model: } & x_{t} =  x_{t-1} +  u_{t} + \sigma^2_{\text{motion},t} \\
& \text{Observation Model: } & z_{t} =  x_{t} + \sigma^2_{\text{obs},t}
\end{aligned}
\end{equation}
- $u_{t}$:  모션 모델의 제어 입력
- $\sigma^2_{\text{motion},t}$: 모션 모델의 노이즈
- $\sigma^2_{\text{obs},t}$: 관측 모델의 노이즈

1D 칼만 필터의 precition 스텝은 다음과 같다.
\begin{equation}
\boxed{ \begin{aligned}
& \bar{\mu}_{t} = \mu_{t-1} + u_{t}  \\
& \bar{\sigma}^2_{t} = \sigma^2_{t-1} + \sigma_{\text{motion},t}^2
\end{aligned} }
\end{equation}

다음으로 1D 칼만 필터의 correction 스텝은 다음과 같다.
\begin{equation}
\boxed{ \begin{aligned}
&     K_{t} = \frac{\bar{\sigma}^2_{t}}{\bar{\sigma}^2_{t} + {\sigma}^2_{\text{obs},t}} \\
& \mu_{t}  = \bar{\mu}_t + K_t(\mu_{\text{obs,t}} - \bar{\mu}_t) = \frac{\mu_{\text{obs,t}}\bar{\sigma}^2_{t} + \bar{\mu}_t \sigma^2_{\text{obs,t}}}{\bar{\sigma}^2_{t} + {\sigma}^2_{\text{obs},t}}\\
&     \sigma^2_{t}    = (1-K_t) \bar{\sigma}^2_{t} = \frac{\bar{\sigma}^2_{t}{\sigma}^2_{\text{obs},t}}{\bar{\sigma}^2_{t} + {\sigma}^2_{\text{obs},t}}
\end{aligned} }
\end{equation}
- $\mu_{\text{obs,t}}$: 관측 모델의 평균
- ${\sigma}^2_{\text{obs},t}$: 관측 모델의 노이즈

식을 자세히 살펴보면 앞서 언급했던 벡터 버전 칼만 필터와 구조가 동일한 것을 확인할 수 있다. $\mathbf{F}_t, \mathbf{B}_t,  \mathbf{H}_t$는 $1$이 되어 생략되고 $\mathbf{Q}_t \leftrightarrow \sigma_{\text{motion},t}$와 대응하며 $\mathbf{R}_t \leftrightarrow \sigma_{\text{obs},t}$와 대응하는 것을 알 수 있다. 보다 쉽게 한눈에 비교하기 위해 그림을 첨부하였다.

Discussion

Discussion about KF and posterior pdf

다음과 같이 간단한 1차원에서 로봇의 위치 $x_t$를 KF로 추정하는 문제가 주어졌다고 하자. 세로축은 확률밀도함수(probability density function, pdf), 가로축은 1차원 위치 $x$라고 하고 $t$ 시점의 로봇의 위치를 상태 변수 $x_t$라고 하자.

위 그림을 단계 별로 자세히 살펴보면 다음과 같다.

로봇은 이전 스텝의 위치 $x_{t-1}$로부터 모션 모델에 의해 prediction을 수행하여 prior $\overline{\text{bel}}(x_t)$를 예측한다.
다음으로 로봇의 센서로부터 현재의 위치를 측정하여 likelihood $p(z_t | x_t)$를 얻는다.
prior $\overline{\text{bel}}(x_t)$와 likelihood $p(z_t | x_t)$로부터 correction 스텝을 수행하여 posterior $\text{bel}(x_t)$를 얻는다.
1$\sim$3 과정을 다음 스텝에 대해 반복하여 $x_{t+1}$에 대한 posterior $\text{bel}(x_{t+1})$를 얻는다.

따라서 KF의 한 스텝은 이전 상태 변수로부터 prior를 예측(=prediction)하고 관측값이 주어지면 이를 통해 likelihood를 구한 후 bayesian rule에 따라 posterior pdf를 구하는(=correction) 전형적인 Bayesian 필터의 특성을 지님을 알 수 있다. Bayesian 필터링을 가우시안 분포를 따르는 Belief에 대하여 재귀적으로 수행하는 것이 칼만 필터이다.

Discussion about Kalman gain

Correction 스텝을 자세히 살펴 보면  평균 $\hat{\mathbf{x}}_{t|t}$을 구할 때 칼만 게인 $\mathbf{K}_t$에 따라 가중치가 다르게 더해진다.
\begin{equation}
\begin{aligned}
\hat{\mathbf{x}}_{t|t}  = \underbrace{\hat{\mathbf{x}}_{t|t-1}}_{\text{prediction}} + \mathbf{K}_{t} \underbrace{ ( \mathbf{z}_{t} - \mathbf{H}_{t}\hat{\mathbf{x}}_{t|t-1}) }_{\text{innovation}}
\end{aligned}
\end{equation}
- innovation: 관측값($\mathbf{z}_t$)와 예측값($\mathbf{H}_t \hat{\mathbf{x}}_{t|t-1}$)의 차이, 즉 에러(error)값

$\mathbf{K}_t \rightarrow \mathbf{0}$이면 센서의 관측값 $\mathbf{z}_t$를 전혀 신뢰하지 않고 오직 시스템의 예측값만을 반영하겠다는 것을 의미한다(관측값 $\mathbf{z}_t$ 반영 안함). 반면에 $\mathbf{K}_t \rightarrow \mathbf{1}$이면 센서의 관측값 $\mathbf{z}_t$를 100% 신뢰하여 센서 관측값만을 반영하겠다는 것을 의미한다(예측값 $\hat{\mathbf{x}}_{t|t-1}$ 반영 안함).

또한, 센서 노이즈 ${\color{Red}\mathbf{R}}$이 큰 경우 $\mathbf{K}_t$는 감소하게 되고 이는 센서의 관측값보다 시스템 모델의 예측값을 더 반영하겠다는 것을 의미한다. 반면에 시스템 노이즈 $\color{Cyan} \mathbf{Q}$가 큰 경우 $\mathbf{P}_{t|t-1}$이 증가하여 $\mathbf{K}_t$ 또한 증가하게 되고 이는 곧 시스템 모델의 예측값보다 관측값을 더 반영하겠다는 것을 의미한다.
\begin{equation}
\begin{aligned}
& \mathbf{P}_{t|t-1} = \mathbf{F}_{t}\mathbf{P}_{t-1|t-1}\mathbf{F}_{t}^{\intercal} + {\color{Cyan} \mathbf{Q}_{t}} \\
& \mathbf{K}_{t} = \mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal}(\mathbf{H}_{t}\mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal} + {\color{Red} \mathbf{R}_{t}})^{-1}  \\
\end{aligned}
\end{equation}
- ${\color{Red}\mathbf{R}_{t}}(\uparrow) \Rightarrow (\ast + {\color{Red}\mathbf{R}_{t}})^{-1}(\downarrow)  \Rightarrow \mathbf{K}_{t}(\downarrow)$
- ${\color{Cyan}\mathbf{Q}_{t}}(\uparrow) \Rightarrow \mathbf{P}_{t|t-1}(\uparrow)  \Rightarrow \mathbf{K}_{t}(\uparrow)$

Summary

칼만 필터를 함수로 표현하면 다음과 같다.
\begin{equation}
\boxed{ \begin{aligned}
& \text{KalmanFilter}(\hat{\mathbf{x}}_{t-1|t-1}, \mathbf{P}_{t-1|t-1}, \mathbf{u}_{t}, \mathbf{z}_{t}) \{ \\
& \quad\quad \text{(Prediction Step)}\\
& \quad\quad \hat{\mathbf{x}}_{t|t-1} = \mathbf{F}_{t}\hat{\mathbf{x}}_{t-1|t-1} + \mathbf{B}_{t}\mathbf{u}_{t} \\
& \quad\quad \mathbf{P}_{t|t-1} = \mathbf{F}_{t}\mathbf{P}_{t-1|t-1}\mathbf{F}_{t}^{\intercal} + \mathbf{Q}_{t} \\
& \\
& \quad\quad \text{(Correction Step)} \\
& \quad\quad \mathbf{K}_{t} = \mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal}(\mathbf{H}_{t}\mathbf{P}_{t|t-1}\mathbf{H}_{t}^{\intercal} + \mathbf{R}_{t})^{-1} \\
& \quad\quad \hat{\mathbf{x}}_{t|t} = \hat{\mathbf{x}}_{t|t-1} + \mathbf{K}_{t}( \mathbf{z}_{t} - \mathbf{H}_{t}\hat{\mathbf{x}}_{t|t-1}) \\
& \quad\quad \mathbf{P}_{t|t} = (\mathbf{I} - \mathbf{K}_{t}\mathbf{H}_{t})\mathbf{P}_{t|t-1} \\
& \quad\quad \text{return} \ \ \hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t} \\
& \}
\end{aligned} }
\end{equation}

Extended kalman filter (EKF)

칼만 필터는 모션 모델과 관측 모델이 선형이라는 가정 하에 상태를 추정한다. 하지만 현실세계의 대부분의 현상들은 비선형으로 모델링되므로 앞서 정의한 칼만 필터를 그대로 적용하면 정상적으로 동작하지 않는다. 비선형의 모션 모델과 관측 모델에서도 칼만 필터를 사용하기 위해 확장칼만필터(extended kalman filter, EKF)가 제안되었다. EKF는 테일러 1차 근사(talyor 1st approximation)을 사용하여 비선형 모델을 선형 모델로 근사한 후 칼만 필터를 적용하는 방법을 사용한다. EKF에서 모션 모델과 관측 모델은 다음과 같다.
\begin{equation}
  \begin{aligned}
    & \text{Motion Model: } & \mathbf{x}_{t} = \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t} , \mathbf{w}_{t}) \\
    & \text{Observation Model: } &  \mathbf{z}_{t} = \mathbf{h}(\mathbf{x}_{t}, \mathbf{v}_{t})
  \end{aligned}
\end{equation}
- $\mathbf{x}_{t}$: 모델의 상태 변수(state variable)
- $\mathbf{u}_{t}$: 모델의 입력(input)
- $\mathbf{z}_{t}$: 모델의 관측값(measurement)
- $\mathbf{f}(\cdot)$: 비선형 모션(motion) 모델 함수
- $\mathbf{h}(\cdot)$: 비선형 관측(observation) 모델 함수
- $\mathbf{w}_{t} \sim \mathcal{N}(0, \mathbf{Q}_{t})$: 모션 모델의 노이즈. $\mathbf{Q}_{t}$는 $\mathbf{w}_{t}$의 공분산 행렬을 의미
- $\mathbf{v}_{t} \sim \mathcal{N}(0, \mathbf{R}_{t})$: 관측 모델의 노이즈. $\mathbf{R}_{t}$는 $\mathbf{v}_{t}$의 공분산 행렬을 의미

위 식에서 $\mathbf{f}(\cdot)$은 비선형 모션 모델을 의미하고 $\mathbf{h}(\cdot)$은 비선형 관측 모델을 의미한다. $\mathbf{f}(\cdot), \mathbf{h}(\cdot)$에 각각 1차 테일러 근사를 수행하면 다음과 같다.
\begin{equation} \label{eq:20}
  \begin{aligned}
    & \mathbf{x}_{t} \approx \mathbf{f}(\hat{\mathbf{x}}_{t-1|t-1}, \mathbf{u}_{t} , 0 ) + \mathbf{F}_{t}(\mathbf{x}_{t-1} - \hat{\mathbf{x}}_{t-1|t-1}) + \mathbf{F}_{\mathbf{w}} \mathbf{w}_{t} \\
    &  \mathbf{z}_{t} \approx \mathbf{h}(\hat{\mathbf{x}}_{t|t-1}, 0) + \mathbf{H}_{t}(\mathbf{x}_{t} - \hat{\mathbf{x}}_{t|t-1}) + \mathbf{H}_{\mathbf{v}}\mathbf{v}_{t}
  \end{aligned}
\end{equation}

이 때, $\mathbf{F}_{t}$는 $\hat{\mathbf{x}}_{t-1|t-1}$에서 계산한 모션 모델의 자코비안 행렬을 의미하며 $\mathbf{H}_{t}$는 $\hat{\mathbf{x}}_{t|t-1}$에서 계산한 관측 모델의 자코비안 행렬을 의미한다. 그리고 $\mathbf{F}_{\mathbf{w}}, \mathbf{H}_{\mathbf{v}}$는 각각 $\mathbf{w}_{t}=0, \mathbf{v}_{t}=0$에서 노이즈에 대한 자코비안 행렬을 의미한다. 자코비안에 대한 자세한 내용은 해당 포스트 를 참조하면된다.
\begin{equation}
\begin{aligned}
    & \mathbf{F}_{t} = \frac{\partial \mathbf{f}}{\partial \mathbf{x}_{t-1}}\bigg|_{\mathbf{x}_{t-1}=\hat{\mathbf{x}}_{t-1|t-1}} & \mathbf{F}_{\mathbf{w}} = \frac{\partial \mathbf{f}}{\partial \mathbf{w}_{t}} \bigg|_{\substack{ & \mathbf{x}_{t-1}=\hat{\mathbf{x}}_{t-1|t-1} \\ & \mathbf{w}_{t}=0}} \\
  \end{aligned}
\end{equation}
\begin{equation}
  \begin{aligned}
    & \mathbf{H}_{t} = \frac{\partial \mathbf{h}}{\partial \mathbf{x}_{t}}\bigg|_{\mathbf{x}_{t}=\hat{\mathbf{x}}_{t|t-1}} & \mathbf{H}_{\mathbf{v}} = \frac{\partial \mathbf{h}}{\partial \mathbf{v}_{t}} \bigg|_{\substack{& \mathbf{x}_{t}=\hat{\mathbf{x}}_{t|t-1} \\ & \mathbf{v}_{t} = 0}} \\
  \end{aligned}
\end{equation}

($\ref{eq:20})$ 식을 전개하면 다음과 같다.
\begin{equation} \label{eq:ekf1}
\boxed{  \begin{aligned}
      \mathbf{x}_{t} & = \mathbf{F}_{t}\mathbf{x}_{t-1} +
\mathbf{f}(\hat{\mathbf{x}}_{t-1|t-1}, \mathbf{u}_{t} , 0 ) - \mathbf{F}_{t} \hat{\mathbf{x}}_{t-1|t-1} + \mathbf{F}_{\mathbf{w}} \mathbf{w}_{t} \\
     & =  \mathbf{F}_{t}\mathbf{x}_{t-1} + \tilde{\mathbf{u}}_{t} + \tilde{\mathbf{w}}_{t}
  \end{aligned} }
\end{equation}
- $\tilde{\mathbf{u}}_{t} = \mathbf{f}(\hat{\mathbf{x}}_{t-1|t-1}, \mathbf{u}_{t} , 0 ) - \mathbf{F}_{t} \hat{\mathbf{x}}_{t-1|t-1}$
- $\tilde{\mathbf{w}}_{t} = \mathbf{F}_{\mathbf{w}} \mathbf{w}_{t} \sim \mathcal{N}(0,  \mathbf{F}_{\mathbf{w}}\mathbf{Q}_{t} \mathbf{F}_{\mathbf{w}}^{\intercal})$

\begin{equation} \label{eq:ekf2}
\boxed{ \begin{aligned}
      \mathbf{z}_{t} & = \mathbf{H}_{t}\mathbf{x}_{t} + \mathbf{h}(\hat{\mathbf{x}}_{t|t-1}, 0) - \mathbf{H}_{t}\hat{\mathbf{x}}_{t|t-1} + \mathbf{H}_{\mathbf{v}}\mathbf{v}_{t} \\
& = \mathbf{H}_{t}\mathbf{x}_{t}  + \tilde{\mathbf{z}}_{t} + \tilde{\mathbf{v}}_{t}
  \end{aligned} }
\end{equation}
- $\tilde{\mathbf{z}}_{t} = \mathbf{h}(\hat{\mathbf{x}}_{t|t-1}, 0) - \mathbf{H}_{t}\hat{\mathbf{x}}_{t|t-1}$
- $\tilde{\mathbf{v}}_{t} = \mathbf{H}_{\mathbf{v}}\mathbf{v}_{t} \sim \mathcal{N}(0,  \mathbf{H}_{\mathbf{v}}\mathbf{R}_{t} \mathbf{H}_{\mathbf{v}}^{\intercal})$

확률변수가 모두 가우시안 분포를 따른다고 가정하면 $p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t}), p(\mathbf{z}_{t} \ | \ \mathbf{x}_{t})$는 다음과 같이 나타낼 수 있다.
\begin{equation}
  \begin{aligned}
     & p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t}) && \sim \mathcal{N}(\mathbf{F}_{t}\mathbf{x}_{t-1} + \tilde{\mathbf{u}}_{t}, \mathbf{F}_{\mathbf{w}}\mathbf{Q}_{t} \mathbf{F}_{\mathbf{w}}^{\intercal} ) \\
     & && = \frac{1}{ \sqrt{\text{det}(2\pi \mathbf{F}_{\mathbf{w}}\mathbf{Q}_{t} \mathbf{F}_{\mathbf{w}}^{\intercal} )}}\exp\bigg( -\frac{1}{2}(\mathbf{x}_{t} - \mathbf{F}_{t}\mathbf{x}_{t-1} - \tilde{\mathbf{u}}_{t})^{\intercal}( \mathbf{F}_{\mathbf{w}}\mathbf{Q}_{t} \mathbf{F}_{\mathbf{w}}^{\intercal} )^{-1}(\mathbf{x}_{t} - \mathbf{F}_{t}\mathbf{x}_{t-1} - \tilde{\mathbf{u}}_{t}) \bigg)\\
  \end{aligned}
\end{equation}
\begin{equation} \label{eq:ekf4}
  \begin{aligned}
    & p(\mathbf{z}_{t} \ | \ \mathbf{x}_{t})  && \sim \mathcal{N}(\mathbf{H}_{t}\mathbf{x}_{t} + \tilde{\mathbf{z}}_{t},  \mathbf{H}_{\mathbf{v}}\mathbf{R}_{t} \mathbf{H}_{\mathbf{v}}^{\intercal} ) \\
    & && = \frac{1}{ \sqrt{\text{det}(2\pi \mathbf{H}_{\mathbf{v}}\mathbf{R}_{t} \mathbf{H}_{\mathbf{v}}^{\intercal} )}}\exp\bigg( -\frac{1}{2}(\mathbf{z}_{t} - \mathbf{H}_{t}\mathbf{x}_{t} - \tilde{\mathbf{z}}_{t})^{\intercal}(  \mathbf{H}_{\mathbf{v}}\mathbf{R}_{t} \mathbf{H}_{\mathbf{v}}^{\intercal} )^{-1}(\mathbf{z}_{t}- \mathbf{H}_{t}\mathbf{x}_{t} - \tilde{\mathbf{z}}_{t}) \bigg)
  \end{aligned}
\end{equation}

$\mathbf{F}_{\mathbf{w}}\mathbf{Q}_{t} \mathbf{F}_{\mathbf{w}}^{\intercal}$는 선형화된 모션 모델의 노이즈를 의미하며 $\mathbf{H}_{\mathbf{v}}\mathbf{R}_{t} \mathbf{H}_{\mathbf{v}}^{\intercal}$은 선형화된 관측 모델의 노이즈를 의미한다. 다음으로 칼만 필터를 통해 구해야 하는 $\overline{\text{bel}}(\mathbf{x}_{t}), \text{bel}(\mathbf{x}_{t})$은 아래와 같이 나타낼 수 있다.
\begin{equation}
    \begin{aligned}
        & \overline{\text{bel}}(\mathbf{x}_{t}) = \int p(\mathbf{x}_{t} \ | \ \mathbf{x}_{t-1}, \mathbf{u}_{t})\text{bel}(\mathbf{x}_{t-1}) d \mathbf{x}_{t-1} \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1}) \\
        & \text{bel}(\mathbf{x}_{t}) = \eta \cdot p( \mathbf{z}_{t} \ | \  \mathbf{x}_{t})\overline{\text{bel}}(\mathbf{x}_{t}) \sim \mathcal{N}(\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t})
    \end{aligned}
\end{equation}

EKF 또한 KF와 동일하게 prediction에서 이전 스텝의 값과 모션 모델을 사용하여 예측값 $\overline{\text{bel}}(\mathbf{x}_{t})$을 먼저 구한 후 correction에서 관측값과 관측 모델을 사용하여 보정된 값 $\text{bel}(\mathbf{x}_{t})$을 구하는 방식으로 동작한다. $p(\mathbf{x}_{t} | \mathbf{x}_{t-1}, \mathbf{u}_{t})$, $p(\mathbf{z}_{t} | \mathbf{x}_{t})$를 위 식에 대입하여 정리하면 prediction, correction 스텝의 평균과 공분산 $(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1}), (\hat{\mathbf{x}}_{t|t}, \mathbf{P}_{t|t})$을 각각 구할 수 있다. 자세한 유도 과정은"Derivation of Kalman filter"섹션을 참조하면 된다