As all the media are going to be served in digital format with the
internet, people will have to receive tremendous amount of information.
As a result, they will feel difficult on getting valuable information.
Therefore, it is necessary to summarize information which is really
needed for each person automatically.
Video medium is intrinsically spatio-temporal (4D) data.
It is important to summarize it according to the information included
in it, not by temporally linear compression like MPEG.
We introduce certain constraints to realize the summary method.
Video records a 3D scene in a sequence of 2D images.
To frame the 3D scene two-dimensionally,
one sets a camera at certain location and specifies zoom value.
Then, as time goes on, he/she move the camera in the 3D space.
We call this work camera-work, and the camera-work is considered as
what is determined by the situation of the scene.
The camera-works are common to some extent in the field of move
making and have great correlation with the situation.
Therefore, it is very valuable for information summary methods
like video editing / processing to extract the camera-works in the
video because they are surely a clue to infer the situation.
Sometimes video is taken along a story which is prepared in advance.
For example, TV programs are taken along scripts.
With regard to universities, lectures are considered to be played
along teaching text, which has a role of a script.
It is possible to frame the scene of the lecture by interpreting the
story of the lecture because the story can be inferred by
understanding the teaching material automatically.
This approach may lead extensive researches of teaching material
evaluation.
The story will also give us hints to summarize the lecture video.
We mention two situations, one is the situation that a video is given
at first, and the other is that we can control the cameras according
the the scene.
We first clarify the common methods which can be applied on the both
situations.
Then, we analize the difference between bottom-up approach and
top-down approach. The bottom-up approach extracts constraints
(camera-works) from data (video), whereas the top-down approach
does video-taping and achieves video-summary according the constraint
(story) which is given in top-down way.
We will evaluate our proposed method by implementing our method in our
prototype system and let people use it.
We also aim to obtain theoritical influence of the number of
constraints and their strength on non-linear, spatio-temporal
compression of video medium.
[Translated from Japanese script by Kameda, 2000/06.]
| |