Contents of this page are obsolete. This page is preserved and stored at this URL just from historical viewpoint. Original URL was http://www.mm.media.kyoto-u.ac.jp/members/kameda/...
Please visit www.kameda-lab.org for recent information. (2002/12/06, kameda@ieee.org)

VISPRU Group-04


VISPRU TOP / Group-04 TOP / MEMBER LIST / PUBLICATION Japanese page is here. @


Video Information Summary Mechanism with Non-linear Spatio-Temporal Compression for Status Understanding of Human Behaviour

Group leader Michihiko MINOH (Kyoto University, Japan)
Research fellow Takashi MATSUYAMA (Kyoto University, Japan)
Tatsuya KAWAHARA (Kyoto University, Japan)
Yoshinari KAMEDA (Kyoto University, Japan)
Shogo TOKAI (Fukui University, Japan)

Abstract
    As all the media are going to be served in digital format with the internet, people will have to receive tremendous amount of information. As a result, they will feel difficult on getting valuable information. Therefore, it is necessary to summarize information which is really needed for each person automatically.
    Video medium is intrinsically spatio-temporal (4D) data. It is important to summarize it according to the information included in it, not by temporally linear compression like MPEG.
    We introduce certain constraints to realize the summary method.
    Video records a 3D scene in a sequence of 2D images. To frame the 3D scene two-dimensionally, one sets a camera at certain location and specifies zoom value. Then, as time goes on, he/she move the camera in the 3D space. We call this work camera-work, and the camera-work is considered as what is determined by the situation of the scene. The camera-works are common to some extent in the field of move making and have great correlation with the situation. Therefore, it is very valuable for information summary methods like video editing / processing to extract the camera-works in the video because they are surely a clue to infer the situation.
    Sometimes video is taken along a story which is prepared in advance. For example, TV programs are taken along scripts. With regard to universities, lectures are considered to be played along teaching text, which has a role of a script. It is possible to frame the scene of the lecture by interpreting the story of the lecture because the story can be inferred by understanding the teaching material automatically. This approach may lead extensive researches of teaching material evaluation. The story will also give us hints to summarize the lecture video.
    We mention two situations, one is the situation that a video is given at first, and the other is that we can control the cameras according the the scene. We first clarify the common methods which can be applied on the both situations. Then, we analize the difference between bottom-up approach and top-down approach. The bottom-up approach extracts constraints (camera-works) from data (video), whereas the top-down approach does video-taping and achieves video-summary according the constraint (story) which is given in top-down way. We will evaluate our proposed method by implementing our method in our prototype system and let people use it. We also aim to obtain theoritical influence of the number of constraints and their strength on non-linear, spatio-temporal compression of video medium.

    [Translated from Japanese script by Kameda, 2000/06.]

Research topics
Related information

@ Last modified: May 31, 2000