雷娟 - Semantic Scholar...Geo-Social Media Mining, Analysis, Recommendation and Retrieval • 4....

ICME 开会小结

雷娟

2014 IEEE International Conference on Multimedia and Expo

Outline

1. Overview

2. Agenda

3. Awards

Outline

1. Overview

ICME is the premier forum for the presentation of the latest advances in multimedia technologies, systems, and applications from both academic and industrial perspectives.

It is sponsored by four IEEE societies: Signal Processing, Circuits and Systems, Computer, and Communications.

ICME 2014 is the 15th in the series that has been held annually since 2000.

Main program ( including special sessions):716 submissions, 212 accepted. 29.6% acceptance rate 72 oral papers and 26 special papers. 13.6% acceptance rate

Review Process: double-blind review process + author rebuttal + reviewer discussions

ICME 2014 has 14 associated workshops which received a total of 243 submissions with 168 papers being accepted. ICME 2014 also has a separate demo program consisting of 16 demos.

500+ Participants

General Chairs

Shipeng Li, Microsoft Research Asia, China

Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland

Houjun Wang, University of Electronic Science and Technology of China (UESTC), China

Jie Yang, National Science Foundation, USA

Program Chairs

Dong Xu, NanyangTechnological University, Singapore

Xuelong Li, Chinese Academy of Sciences, China

Zicheng Liu, Microsoft Research, USA

Eckehard Steinbach, Munich U of Technology, Germany

Chengcui Zhang, University of Alabama at Birmingham, USA

Shao-Yi Chien, National Taiwan University, Taiwan

Outline2. Agenda

2.1 Tutorial (day 1)

2.2 Keynote

2.3 Oral & Poster & Special Seccsions

2.4 Industry Forum & Grand Challenge

2.5 Workshop (day 1 & 5)

(day 2,3,4 )

2.1 Tutorial1. Hashing Large-scale Data with Applications to Cross-media Analysis

Wei Liu (IBM), Fei Wu (ZJ U, China)

2. Deep Learning in Image and Video Understanding Xiaogang Wang, Wanli Ouyang (CUHK)

3. Social and Geographic-Aware Multimedia Applications and TechnologiesJiebo Luo (U. Rochester) , Tao Mei (MSRA), Roger Zimmermann, Yi Yu (NUS)

4. Learning-based Feature Extraction for Social Media AnalysisXudong Jiang (NTU), Jiwen Lu (ADSC, Singapore), Weihong Deng (BUPT)

5. A Tutorial on Nonnegative Matrix Factorisation with Applications to Audiovisual Content Analysis

Slim Essid (Telecom ParisTech, France) , Alexey Ozerov (Technicolor, France)

6. A Tutorial on Online Learning Methods for Multimedia Big Data Analytics Steven C.H. Hoi (NTU)

2.2 Keynote talks

1. Multimedia Technologies for Multimodal Interaction and Immersive Telecommunications Zhengyou Zhang (Microsoft, USA)

2. Towards Online Visual Search Wen Gao (Peking University, China)

3. Behavioral Imaging and the Study of Autism James M. Rehg (Georgia Institute of Technology, USA)

Keynote-1

Multimedia Technologies for Multimodal Interaction and Immersive Telecommunications

Zhengyou Zhang , Microsoft, USA

Motivation:

• Natural human-computer interaction

• Immersive human- human telecommunications and collaboration.

Solution:

• Capturing and rendering 3D dynamic environments in order to create the illusion that the remote participants are in the same room.

• A number of projects involving multi-camera systems, RGBD sensors, microphone arrays, spatial audio, large electronic whiteboards, and mobile devices.

Avatar Kinect Virtual Environment

TeleConference

Viewport: A Fully Distributed Immersive Teleconferencing System

Remote Collaboration

ViiBoard: Vision-enhanced Immersive Interaction with Touch Board

Conclusion

Keynote-2

Wen Gao , Peking University, China

Towards Online Visual Search

Towards Online Visual Search

Outline

1. Two Kinds of visual search

2. Remote Visual search (IEEE std 1875)

3. Mobile Visual Search (CDVS)

4. Summary

1. Two Kinds of visual search

1) Remote Visual SearchRemote camera captures image/videoEncoding image/video, sending to server site via wideband networkDecoding image/video, and do search

2) Mobile Visual SearchMobile phone captures image/videExtract the feature (compact descriptor), sending to server via wireless

networkUsing compact descriptor do search

2. Remote Visual search (IEEE std 1875)

The State of Art is IEEE std 1857/IEEE P1857.4

Combine the process of visual data encoding and analysis into one standard, using machine learning technology to model the background, and detecting the object from image or video, encoding object and background in different layer of data stream.

3. Mobile Visual Search (CDVS)

• The state of art is CDVS (Compact Descriptor for Visual Search).Uses package of CDVS parameters instead of compressed image for transmitting and search

More information: http://imre.idm.pku.edu.cn/ Institute of Digital Media, PeKing University

http://imre.idm.pku.edu.cn/�

Summary

Keynote-3

James M. Rehg

Georgia Institute of Technology, USA

Behavioral Imaging and the Study of Autism

Behavioral Imaging and the Study of Autism

Outline

1. Rapid-ABC Protocol and MMDB dataset

2. Analysis of engagement

3. Detection of eye contact from wearable camera

4. Behavior retrieval from multi-camera classroom video

1. Rapid-ABC Protocol and MMDB dataset

• Rapid-ABC is a behavioral screening instrument for early detection of risk for Autism and related developmental disabilities based on a scripted sequence of interaction between a clinician and child.

• Multimodal Dyadic Behavior (MMDB) dataset. A unique collection of multimodal (video, audio and physiological) recordings of the social and communicative behavior toddlers.

Dataset available: www.cbi.gatech.edu/mmdb/

http://www.cbi.gatech.edu/mmdb/�

2. Analysis of engagement

1) Eye Contact Detection

2) Monitoring Problem Behaviors

3. Detection of eye contact from wearable camera

• Key Idea #1 :Detect child’s face to interpret examiner’s point of gaze

• Key Idea #2 :Detect child’s gaze direction relative to camera (proxy for examiner)

Monitoring Classroom Behaviors

• Goal: --Enable caregivers to quickly assess the frequency and duration of

problem behaviors

• Challenges:--Behaviors are often unique to individual-- No resources to support large scale annotation

• Approach:--Behavior Retrieval from multimedia repository

Conclusions

• Children’s social behaviors are a challenging and novel topic for the multimedia community

---- MMDB dataset of adult-child interactions---- Identification and treatment of developmental disorders

• Wearable cameras are a promising approach to behavior measurement

• Help us create the science of Behavioral Imaging !

2.3 Oral & Poster Sessions (day 2)

• 1. Image Recognition and Image Retrieval• 2. High Efficiency Video Coding• 3. Image and Video Coding• 4. Human Computer Interaction and Graphics

• 5. Image Filtering, Deblurring and Superresolution• 6. Visual Tracking• 7. Compressed Sensing, Low Rank, and Deep Learning• 8. Social Multimedia and Cloud


• 9. Multimedia Security and Forensics, and Face Recognition

• 10. 3D and Augmented Reality• 11. Video Analysis, Event Recognition, and Segmentation• 12. Image Processing and Quality Assessment


2.3 Special Sessions

• 1. Cross-media Computing• 2. Visual Saliency: Emerging Models and Applications in

Multimedia Processing• 3. Geo-Social Media Mining, Analysis, Recommendation

and Retrieval• 4. Neuroimaging-guided Multimedia Analysis• 5. Human Action and Activity Understanding from Rich

Media and Sensors

2.4 Industry Forum & Panel

Topic :Big Data and Deep Learning Chair: Jian Lu, Yinglong Xia Speakers: Yunwen Chen (Shanda Literature, China)

Ching-Yung Lin (IBM, USA) Qian Lin (HP, USA) Gokhan Tur (Microsfot, USA) Kai Yu (Baidu, China)

Topic: Mobile Multimedia: Challenges and Opportunities Chair: Jian Lu, Yinglong Xia Panelists: Sanjeev Mehrotra (Microsoft, USA)

Tao Mei (Microsoft, China) Yimin Zhang (Intel, China) Aidong Zhang (Huawei, China) Hanning Zhou (Zhigu, China)

2.4 Grand Challenges• Microsoft: MSR-Bing Image Retrieval grand challenge

Winner: Cross-Media Relevance Mining For Evaluating Text-Based Image Search Engine Zhongwen Xu , Yi Yang, Ashraf Kassim, Shuicheng Yan (The University of Queensland & National

University of Singapore)

• HuaWei: Accurate and Fast mobile video annotation challengeWinner:

Huawei-001 Challenge Huawei Challenge: Fusing Multimodal Features With Deep Neural Networks For Mobile Video Annotation

Jian Tu, Zuxuan Wu, Qi Dai, Yu-Gang Jiang, Xiangyang Xue(Fudan University, China)

2.5 Workshop

1. Visualization of Heterogeneous Multimedia Content2. Cloud Gaming Systems and Networks3. Multimedia Big Data Computing4. Cross-media Analysis from Social Multimedia5. Frontier of Crowdsourcing for Multimedia Computing6. Multimedia Services and Technologies for E-Health7. Emerging Multimedia Systems and Applications8. Management Information Systems in Multimedia Art, Education,

Entertainment, and Culture.9. Human Identification in Multimedia 10. Ambient Multimedia and Sensory Environment11. Hot Topis in 3D Multimedia12. Mobile Multimedia Computing13. Multimedia Affective Computing14. Audio and Video Coding Standardlization

Hot 3D workshop :

5th IEEE international workshop on Hot Topics in 3D ----Hot3D

Keynote speech: Towards Multidimensional & Multiscale Visual Computing Prof Qionghai Dai, Tsinghua University, China (Yebin Liu)

http://media.au.tsinghua.edu.cn/liuyebin.jsp

• 1. A Multi-camera and Multi-Lighting Dome for 3D Reconstruction and Relighting

20 PointGrey Flea2 cameras spaced on a ring. The camera resolution is 1024 by 768 and the capture rate is 25fps. The models are constructed based on the 20 view images using point cloud based multi-view stereo (PCMVS). Textures are mapped on the models using view-independent rendering.

http://www.ptgrey.com/products/flea2/index.asp�



• 2. Video-based Hand Manipulation Capture Through Composite Motion Control

A motion capture method for acquiring physically realistic hand grasping and manipulation data from multiple video streams. The key idea is to introduce a composite motion control to simultaneously model hand articulation, object movement, and subtle interaction between the hand and object.

3. Awards

Best Paper 1. Find You from Your Friends: Graph-Based Residence Location Prediction for

Users in Social Media. (Dan Xu*, Peng Cui, Shiqiang Yang, Tsinghua University)

2. High Resolution Free-View Interpolation of Planar Structure. ( Jie Hu* , DongqingZhang, Heather Yu, Chang Wen Chen ( University at Buffalo, USA & Futurewei Technologies Inc.)

Best Student Paper 1. Robust Visual Tracking Using Latent Subspace Projection Pursuit.

( Wei Jin, Risheng Liu*, Zhixun Su, Changcheng Zhang, Shanshan Bai, Dalian University of Technology )

2. An Expressive Deep Model for Parsing Human Action from a Single Image( Zhujin Liang, Xiaolong Wang, Rui Huang, and Liang Lin, Sun Yat-Sen University )

High Resolution Free-View Interpolation of Planar Structure

Problem：Synthesize the image taken from the viewpoint Vf of a planar structure from the images taken from viewpoint V1~V10.

Best Paper:

Image representation

)( hvx =

Denote the n input images as

nkgwy kkk ,,1),( )()()( ==

While denote the synthesized image as )( hvx =

coordinates and intensity

Then for each view, we have:

)()( kk wvH =

)()()( kkk hg βα λλ +=

Thus, we can get an initial result

Also, we assume that:

， are homogeneous coordinates

)( hvx =′

[1]

[1] L. Pickup, D. Capel, S. Roberts, and A. Zisserman, “Bayesian image super-resolution, continued,” in Advances in Neural Information Processing Systems, pp. 1089–1096, 2006.

Proposed solution:

),|( )( xyxp k ′

After initialization, novel view optimization and parameter calculation are executed iteratively.

This objective function can be optimized using bounded quasi-Newton method of BFGS (L-BFGS-B) [2].

Objective function:

[2] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: Lbfgs-b: Fortran subroutines for large-scale bound-constrained optimization,” ACM Transactions on Mathematical Software,vol. 23, no. 4, pp. 550–560, 1997.

The Jigsaw puzzle dataset contains 24400*300 images of a completed jigsaw puzzle which is hung on a white wall. All 24 images are captured from varying angles and depths.

Fig. 7 present views with focus axis rotated along the vertical and horizontal axes from perpendicular to the wall.

Results

The street in Jerusalem sequence consists of 1800 frames with each frame of size as 360*240. We extract frames 984 to 1005 for experiments.

Local methods are improved.

Cost-Volume Filtering-Based Stereo Matching with Improved Matching Cost and Secondary Refinement

Best Student Paper Candidate:

Problem:The approaches for stereo correspondence can be classified into global and local methods. Global methods usually achieve more accurate disparity map with higher computational complexity, while local methods are more efficient.

Contribution:propose a cost-volume filtering-based local stereo matching method that employs a new combined cost + a novel secondary disparity refinement mechanism .

Results

3D Reconstruction

Active Key Frame Selection For 3d Model Reconstruction From Crowdsourced Geo-Tagged Videos

• Guanfeng Wang* (NUS),* Ying Lu (usc.edu), Luming Zhang (comp.nus.edu.sg), Abdullah Alfarrarjeh (usc.edu), Roger Zimmermann (National University of Singapore), Seon Ho Kim (usc.edu), Cyrus Shahabi (usc.edu)

ICME 2015-2018

• Italy• USA• HongKong, China• USA

• Thanks!

ICME 2014 Electronic Proceeding is available.

雷娟 - Semantic Scholar...Geo-Social Media Mining, Analysis, Recommendation and Retrieval • 4....

Documents

Transcript of 雷娟 - Semantic Scholar...Geo-Social Media Mining, Analysis, Recommendation and Retrieval • 4....