Vision-based State Estimation Relative to Dynamic Environments

Unmanned aerial vehicles (UAV) have gained significant roles and aided the military on the battlefield over the last decade by performing missions such as reconnaissance, surveillance, and target tracking with the aid of humans. These vehicles are now being considered for more complex missions that involve increased decision making to operate in cluttered environments. This approach attempts to alleviate the human workload by designing a more autonomous vehicle. One avenue being explore is the use of vision to sense the environment. Decisions are made based on the type of mission through image processing which then passes those commands to the control system to navigate the aircraft.

With UAV becoming more prevalent in the aerospace community, researchers are striving to extend their capabilities while making them more reliable. Operating in various surroundings and from long range are an important resource for military applications. For example, imagine a scenario where a UAV is tasked with a surveillance mission to fly long range then through a city where targets of interest are located. The first step in this mission is completing the long range flight to the desired location which may require an aerial refueling maneuver. The UAV uses senor fusion from both INS and vision to complete this task autonomously. The mission continues on as the UAV is on course for the desired city. Ultimately, the UAV arrives at the desired location and maneuvers through the city to track and estimate the target's position and orientation using visual measurements. This type of mission for UAV requires both accurate knowledge of the aircraft states along with reliable image processing for state estimation.

Problem Statement
The problem addressed in this work consists of target state estimation of unknown stochastic motion for autonomous systems using a moving monocular camera. The estimation of 3-dimensional points in space given two perspective views relies heavily on camera configuration, accurate camera calibration, and perfect image processing; however, the practical realizations in a camera systems involve limitations to configurations, correspondence issues, significant uncertainties and noise. Therefore, in order to estimate the states of a moving object in the presence of uncertainty, several key issues have to be addressed. These technical challenges include:
  • segmenting moving targets from stationary targets within the scene
  • classifying moving targets into deterministic and stochastic motions
  • coupling the vehicle dynamics into the sensor observations (i.e. images)
  • formulating the homography equations between a moving camera and the viewable targets
  • propagating the effects of uncertainty through the state estimation equations
  • establishing confidence bounds on target state estimation

State Estimation
An image processing technique used to estimate the relative translation and rotation in 3D space between two consecutive image frames is called an homography. This 3D scene reconstruction of a moving target is determined using the known motion of a camera and a moving reference frame. Therefore, the combination of vision and traditional sensors such as a global positioning system (GPS) and an inertial measurement unit (IMU) facilitates the problem of estimating the states of a moving target for a single camera configuration.

In general, a single moving camera alone is unable to reconstruct the 3D scene containing moving objects. This restriction is due to the loss of the epipolar constraint, where the plane formed by the position vectors of the target and the translation vector is no longer valid. The contribution of this work establishes the Euclidean homography between the target and the reference object from a single image through transformations that maintain the reference object stationary in the image across two frames. Relating this information with known measurements from GPS and IMU the reconstruction of the target's motion regardless of its dynamics can be retained. Several assumptions are required for this approach work including the objects must remain in the image at all times, feature point distance is known, and known motion from both the camera and the reference object. These relationships can then be related back to the vehicle's frame through a known transformation and used in control strategies that perform either homing or docking maneuvers.

Simulation and Results
A simulation was executed in Matlab and replayed in a virtual environment to test the state estimation algorithm. The setup consisted of three vehicles: an UAV flying above with a mounted camera, a reference ground vehicle, and a target vehicle. The camera setup considered in this problem consist of a single camera attached to the UAV with fixed position and orientation. While in flight the camera measures and tracks feature points on both the target vehicle and the reference vehicle. This simulation assumes perfect camera calibration, feature point extraction, and tracking so that the state estimation algorithm can be verified. Later more realistic aspects of the camera system will be included in this simulation to create a practical scenario.

The motion of the vehicles were generated to cover a vast range of situations to test the algorithm. The UAV's motion was generated from a nonlinear aircraft model. Meanwhile, the reference vehicle and the target vehicle exhibited a standard car model with similar velocities. Sinusoidal disturbances were added to the target's position and heading to create some complexity it's motion. The three trajectories are plotted below for illustration.

Photo Photo

The homography was computed for this simulation to find the relative rotation and translation between the ground vehicles. These results are then used to find the relative motion from the UAV to the target of interest. The error of this motion for translation and rotation are depicted in the plots below. These results indicate that with synthetic images and perfect tracking of the target, nearly perfect motion can be extracted.

Photo Photo

The simulation was then played in our Hardware In The Loop Simulation (HILS) facility to enhance the graphics and illustrate the application of this algorithm. Some snapshots from the camera view were taken depicting the surrounding scene and the two vehicles, red designating the reference vehicle and grey for the target vehicle. The next step in this process is to implement an actual feature tracking algorithm that follows the vehicles on synthetic images. This modification alone will degrade the homography results immensely due to the troublesome characteristics of a feature point tracker. Filtering Techniques will be explored to alleviate these problems.

Photo Photo Photo Photo

Ryan Causey (
Last Modified : November 2006