ICPR2010 "Image Retrieval of First-Person Vision for Pedestrian Navigation in Urban Area"

Image Retrieval of First-Person Vision for Pedestrian Navigation in Urban Area

Massive Sensing, Research Index, kameda-lab.org, 2010/01/31, 2010/08/30

Presented on ICPR (International Conference on Pattern Recognition) 2010.
Authors: Yoshinari Kameda and Yuichi Ohta
DOI: DOI 10.1109/ICPR.2010.1140
Note: ICPR 2010 supplementals

Reference

Panasonic DMC-FX37, 28mm focal length (35 mm film)
Duration = 12:13.50 [sec]

Original Video: VGA, 30fps. P1180643.MOV [1.0GB]
QVGA version, 30 fps. Just for check [179MB]
A set of reference images: QVGA, 2100 images (30fps AVI) [17MB]
Thumbnail 1400x1800 (40x30, 60 cols by 35 rows, JPG) [867KB]
Thumbnail 5600x7200 (160x120, 60 cols by 35 rows, JPG) [9.4MB]
Map of the path
1. path 1 : underground red arrow
2. path 2 : ground blue arrow

Query

Video ID	1	2	3	4
Duration	11:47	15:53	13:34.500	14:38.020
Camera	DMC-FX37	DMC-FX37	DMC-FX37	iPod nano
[A] original video: VGA	P1180644.MOV(982MB)	P1150934.MOV(1.3GB)	P1180758.MOV(1.1GB)	IMG_0005.mp4(281MB)
[B] query images QVGA	t020-query_643-644.avi	t020-query_643-934.avi	t020-query_643-758.avi	t020-query_643-I0005.avi
[C] result video (QVGA x 3 + figures), (=Fig2)	t020-20_643-644.avi(172MB)	t020-20_643-934.avi(231MB)	t020-20_643-758.avi (197MB)	t020-20_643-I0005.avi (205MB)
Graph (Fig-1 + extra) / www	Result-1(4.0MB)	Result-2(4.1MB)	Result-3(4.1MB)	Result-4(3.9MB)
Graph (Fig-1 + extra) / pdf	Result-1(830KB)	Result-2(1.0MB)	Result-3(1.0MB)	Result-4(871KB)

Result video

Top-left (QVGA) : Query image (same as [B])
- When you see green lines, that shows the projected point by the estimated 2D affine matrix. (With more than 4 pairs)
- When you see (three) blue lines, that shows the points o fthe matched keys in the reference images.
Top-right (QVGA) : Top candidate
- Left-down cross line: (2) Rejected due to the size consistency
- Left-up cross line : (3) Rejected due to the direction consistency
- Upper horizontal cross line: (4) Rejected due to the 2D affine residual limit
- Lower horizontal cross line: (5) Rejected due to the area condition (too small)
- 1st (at 1/5th) vertical cross line: (-) No key in query
- 2nd (at 2/5th) vertical cross line: (1) Only one or two pairs
- 3rd (at 3/5th) vertical cross line: (6) Reject by axis inversion
- 4th (at 4/5th) vertical cross line: (7) Reject by triangular vector direction
- When you see (three) blue lines, that shows the points o fthe matched keys in the reference images.
Bottom-left (QVGA) : last answered reference image
- This is an very simple example application. You may feel the availability of this approach for pedestrian navigation. (Yet we won't use this by itself, rather integrating with pedometer/GPS ...)
Bototm-right : some figures
- Query: Query image ID
- Result: Top candidate ID (Reference image ID), of up-right picture
- Shown: Last answered referencd image ID, of bottom-left picture
- (Number) after "Shown": Number of frames while the top candidates are being rejected (= number of frames you see the same picture at bottom-left)
- XXX <=> YYY features found: XXX keys in query, YYY features in reference (top candidate)
- X pairs: Number of the pairs found with the top candidate (= number of red lines)
- size: Esize
- dir: Edir [unsigned degree]
- ptdist: Eaffine [pixel]
- ai: 1 = axis inversion found, 0 otherwise
- tv: 1 = triangular vector direction error found, 0 otherwise
- * after tv: appears when only 3 pairs are found (condition of triangle vector direction error check)
- triarea: approximately estimated Earea
- dist: average Euclidian distance of descriptor pairs between query and top candidate (not used, just for check)
- [A0 A1 A2]
- [A3 A4 A5] affine matrix
- Threshold: threshold of Esize and Edir

Graph

Top left (ID based result)
Horizontal: Query image ID ( up to the number of total frames of query video ).
Vertical: Verified answer ID (= verified reference image ID) for each query (reference ID is ranging from 0 to 2099).
Blue dots: All the top candidates, including rejected ones.
Red dots: Verified (accepted) candidates.
Top right (Path distance based result, similar to Figure 1 in the paper)
Horizontal: Path distance of each query image ID ( up to about 900 meter ).
Vertical: Verified Path distance of the reference image ID of the corresponding top candidate (up to about 900 meter).
Blue dots : all the estimated distances, including rejected ones
Red dots: Verified answers.
Bottom right (Comparison with basic pair-counting method)
Horizontal: Same as Top right fig.
Vertical: Same as Top right fig.
Black dots : Both the proposed method and the pair-counting method told the same ID.
Red dots: The pair-counting method told the ID while the proposed method didn't because it didn't pass the verification.
Green dots: The proposed method told the ID while the pair-counting method cannot because the number of the pairs found in the query didn't reach to the threshold.

The pair counting method :
As for the verification step, just check the number of pairs. If the number of the found pairs is equal or larger than the threshold, it is accepted as the answer.
The threshold is set to keep (almost) the same answer ratio as the poposed method.

On the top left, you see some flat area or vertical jump because of "staying (such as signal waiting)" period during the walks (both in reference and query). They disappear in the top right figure since they are at the same position on path distance notation. (Actually even on waiting signals, sometimes the camera moves a little bit.)
The blue dots in top left/right figures tell how many false-positive top candidates (blue dots far from red dots) are found if we just run the generic image retrieval.
There are some unsuccessful sections, probably because of strong sun back light, and darker sky (making blur larger, loosing SURF keys). But still you find less false-negative with our approach.

kameda[at]iit.tsukuba.ac.jp, kameda.aa[at]gmail.com