Prakticna nastava cns - peti blok - eeg, evp, emng. funkcionalni neuroimaging
雷娟 - Semantic Scholar...Geo-Social Media Mining, Analysis, Recommendation and Retrieval • 4....
Transcript of 雷娟 - Semantic Scholar...Geo-Social Media Mining, Analysis, Recommendation and Retrieval • 4....
ICME 开会小结
雷娟
2014 IEEE International Conference on Multimedia and Expo
Outline
1. Overview
2. Agenda
3. Awards
Outline
1. Overview
ICME is the premier forum for the presentation of the latest advances in multimedia technologies, systems, and applications from both academic and industrial perspectives.
It is sponsored by four IEEE societies: Signal Processing, Circuits and Systems, Computer, and Communications.
ICME 2014 is the 15th in the series that has been held annually since 2000.
Main program ( including special sessions):716 submissions, 212 accepted. 29.6% acceptance rate 72 oral papers and 26 special papers. 13.6% acceptance rate
Review Process: double-blind review process + author rebuttal + reviewer discussions
ICME 2014 has 14 associated workshops which received a total of 243 submissions with 168 papers being accepted. ICME 2014 also has a separate demo program consisting of 16 demos.
500+ Participants
General Chairs
Shipeng Li, Microsoft Research Asia, China
Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Houjun Wang, University of Electronic Science and Technology of China (UESTC), China
Jie Yang, National Science Foundation, USA
Program Chairs
Dong Xu, NanyangTechnological University, Singapore
Xuelong Li, Chinese Academy of Sciences, China
Zicheng Liu, Microsoft Research, USA
Eckehard Steinbach, Munich U of Technology, Germany
Chengcui Zhang, University of Alabama at Birmingham, USA
Shao-Yi Chien, National Taiwan University, Taiwan
Outline2. Agenda
2.1 Tutorial (day 1)
2.2 Keynote
2.3 Oral & Poster & Special Seccsions
2.4 Industry Forum & Grand Challenge
2.5 Workshop (day 1 & 5)
(day 2,3,4 )
2.1 Tutorial1. Hashing Large-scale Data with Applications to Cross-media Analysis
Wei Liu (IBM), Fei Wu (ZJ U, China)
2. Deep Learning in Image and Video Understanding Xiaogang Wang, Wanli Ouyang (CUHK)
3. Social and Geographic-Aware Multimedia Applications and TechnologiesJiebo Luo (U. Rochester) , Tao Mei (MSRA), Roger Zimmermann, Yi Yu (NUS)
4. Learning-based Feature Extraction for Social Media AnalysisXudong Jiang (NTU), Jiwen Lu (ADSC, Singapore), Weihong Deng (BUPT)
5. A Tutorial on Nonnegative Matrix Factorisation with Applications to Audiovisual Content Analysis
Slim Essid (Telecom ParisTech, France) , Alexey Ozerov (Technicolor, France)
6. A Tutorial on Online Learning Methods for Multimedia Big Data Analytics Steven C.H. Hoi (NTU)
2.2 Keynote talks
1. Multimedia Technologies for Multimodal Interaction and Immersive Telecommunications Zhengyou Zhang (Microsoft, USA)
2. Towards Online Visual Search Wen Gao (Peking University, China)
3. Behavioral Imaging and the Study of Autism James M. Rehg (Georgia Institute of Technology, USA)
Keynote-1
Multimedia Technologies for Multimodal Interaction and Immersive Telecommunications
Zhengyou Zhang , Microsoft, USA
Motivation:
• Natural human-computer interaction
• Immersive human- human telecommunications and collaboration.
Solution:
• Capturing and rendering 3D dynamic environments in order to create the illusion that the remote participants are in the same room.
• A number of projects involving multi-camera systems, RGBD sensors, microphone arrays, spatial audio, large electronic whiteboards, and mobile devices.
Avatar Kinect Virtual Environment
TeleConference
Viewport: A Fully Distributed Immersive Teleconferencing System
Remote Collaboration
ViiBoard: Vision-enhanced Immersive Interaction with Touch Board
Conclusion
Keynote-2
Wen Gao , Peking University, China
Towards Online Visual Search
Towards Online Visual Search
Outline
1. Two Kinds of visual search
2. Remote Visual search (IEEE std 1875)
3. Mobile Visual Search (CDVS)
4. Summary
1. Two Kinds of visual search
1) Remote Visual SearchRemote camera captures image/videoEncoding image/video, sending to server site via wideband networkDecoding image/video, and do search
2) Mobile Visual SearchMobile phone captures image/videExtract the feature (compact descriptor), sending to server via wireless
networkUsing compact descriptor do search
2. Remote Visual search (IEEE std 1875)
The State of Art is IEEE std 1857/IEEE P1857.4
Combine the process of visual data encoding and analysis into one standard, using machine learning technology to model the background, and detecting the object from image or video, encoding object and background in different layer of data stream.
3. Mobile Visual Search (CDVS)
• The state of art is CDVS (Compact Descriptor for Visual Search).Uses package of CDVS parameters instead of compressed image for transmitting and search
More information: http://imre.idm.pku.edu.cn/ Institute of Digital Media, PeKing University
Summary
Keynote-3
James M. Rehg
Georgia Institute of Technology, USA
Behavioral Imaging and the Study of Autism
Behavioral Imaging and the Study of Autism
Outline
1. Rapid-ABC Protocol and MMDB dataset
2. Analysis of engagement
3. Detection of eye contact from wearable camera
4. Behavior retrieval from multi-camera classroom video
1. Rapid-ABC Protocol and MMDB dataset
• Rapid-ABC is a behavioral screening instrument for early detection of risk for Autism and related developmental disabilities based on a scripted sequence of interaction between a clinician and child.
• Multimodal Dyadic Behavior (MMDB) dataset. A unique collection of multimodal (video, audio and physiological) recordings of the social and communicative behavior toddlers.
Dataset available: www.cbi.gatech.edu/mmdb/
2. Analysis of engagement
1) Eye Contact Detection
2) Monitoring Problem Behaviors
3. Detection of eye contact from wearable camera
• Key Idea #1 :Detect child’s face to interpret examiner’s point of gaze
• Key Idea #2 :Detect child’s gaze direction relative to camera (proxy for examiner)
Monitoring Classroom Behaviors
• Goal: --Enable caregivers to quickly assess the frequency and duration of
problem behaviors
• Challenges:--Behaviors are often unique to individual-- No resources to support large scale annotation
• Approach:--Behavior Retrieval from multimedia repository
Conclusions
• Children’s social behaviors are a challenging and novel topic for the multimedia community
---- MMDB dataset of adult-child interactions---- Identification and treatment of developmental disorders
• Wearable cameras are a promising approach to behavior measurement
• Help us create the science of Behavioral Imaging !
2.3 Oral & Poster Sessions (day 2)
• 1. Image Recognition and Image Retrieval• 2. High Efficiency Video Coding• 3. Image and Video Coding• 4. Human Computer Interaction and Graphics
• 5. Image Filtering, Deblurring and Superresolution• 6. Visual Tracking• 7. Compressed Sensing, Low Rank, and Deep Learning• 8. Social Multimedia and Cloud
2.3 Oral & Poster Sessions (day 3)
• 9. Multimedia Security and Forensics, and Face Recognition
• 10. 3D and Augmented Reality• 11. Video Analysis, Event Recognition, and Segmentation• 12. Image Processing and Quality Assessment
2.3 Oral & Poster Sessions (day 4)
2.3 Special Sessions
• 1. Cross-media Computing• 2. Visual Saliency: Emerging Models and Applications in
Multimedia Processing• 3. Geo-Social Media Mining, Analysis, Recommendation
and Retrieval• 4. Neuroimaging-guided Multimedia Analysis• 5. Human Action and Activity Understanding from Rich
Media and Sensors
2.4 Industry Forum & Panel
Topic :Big Data and Deep Learning Chair: Jian Lu, Yinglong Xia Speakers: Yunwen Chen (Shanda Literature, China)
Ching-Yung Lin (IBM, USA) Qian Lin (HP, USA) Gokhan Tur (Microsfot, USA) Kai Yu (Baidu, China)
Topic: Mobile Multimedia: Challenges and Opportunities Chair: Jian Lu, Yinglong Xia Panelists: Sanjeev Mehrotra (Microsoft, USA)
Tao Mei (Microsoft, China) Yimin Zhang (Intel, China) Aidong Zhang (Huawei, China) Hanning Zhou (Zhigu, China)
2.4 Grand Challenges• Microsoft: MSR-Bing Image Retrieval grand challenge
Winner: Cross-Media Relevance Mining For Evaluating Text-Based Image Search Engine Zhongwen Xu , Yi Yang, Ashraf Kassim, Shuicheng Yan (The University of Queensland & National
University of Singapore)
• HuaWei: Accurate and Fast mobile video annotation challengeWinner:
Huawei-001 Challenge Huawei Challenge: Fusing Multimodal Features With Deep Neural Networks For Mobile Video Annotation
Jian Tu, Zuxuan Wu, Qi Dai, Yu-Gang Jiang, Xiangyang Xue(Fudan University, China)
2.5 Workshop
1. Visualization of Heterogeneous Multimedia Content2. Cloud Gaming Systems and Networks3. Multimedia Big Data Computing4. Cross-media Analysis from Social Multimedia5. Frontier of Crowdsourcing for Multimedia Computing6. Multimedia Services and Technologies for E-Health7. Emerging Multimedia Systems and Applications8. Management Information Systems in Multimedia Art, Education,
Entertainment, and Culture.9. Human Identification in Multimedia 10. Ambient Multimedia and Sensory Environment11. Hot Topis in 3D Multimedia12. Mobile Multimedia Computing13. Multimedia Affective Computing14. Audio and Video Coding Standardlization
Hot 3D workshop :
5th IEEE international workshop on Hot Topics in 3D ----Hot3D
Keynote speech: Towards Multidimensional & Multiscale Visual Computing Prof Qionghai Dai, Tsinghua University, China (Yebin Liu)
http://media.au.tsinghua.edu.cn/liuyebin.jsp
• 1. A Multi-camera and Multi-Lighting Dome for 3D Reconstruction and Relighting
20 PointGrey Flea2 cameras spaced on a ring. The camera resolution is 1024 by 768 and the capture rate is 25fps. The models are constructed based on the 20 view images using point cloud based multi-view stereo (PCMVS). Textures are mapped on the models using view-independent rendering.
• 2. Video-based Hand Manipulation Capture Through Composite Motion Control
A motion capture method for acquiring physically realistic hand grasping and manipulation data from multiple video streams. The key idea is to introduce a composite motion control to simultaneously model hand articulation, object movement, and subtle interaction between the hand and object.
3. Awards
Best Paper 1. Find You from Your Friends: Graph-Based Residence Location Prediction for
Users in Social Media. (Dan Xu*, Peng Cui, Shiqiang Yang, Tsinghua University)
2. High Resolution Free-View Interpolation of Planar Structure. ( Jie Hu* , DongqingZhang, Heather Yu, Chang Wen Chen ( University at Buffalo, USA & Futurewei Technologies Inc.)
Best Student Paper 1. Robust Visual Tracking Using Latent Subspace Projection Pursuit.
( Wei Jin, Risheng Liu*, Zhixun Su, Changcheng Zhang, Shanshan Bai, Dalian University of Technology )
2. An Expressive Deep Model for Parsing Human Action from a Single Image( Zhujin Liang, Xiaolong Wang, Rui Huang, and Liang Lin, Sun Yat-Sen University )
High Resolution Free-View Interpolation of Planar Structure
Problem:Synthesize the image taken from the viewpoint Vf of a planar structure from the images taken from viewpoint V1~V10.
Best Paper:
Image representation
)( hvx =
Denote the n input images as
nkgwy kkk ,,1),( )()()( ==
While denote the synthesized image as )( hvx =
coordinates and intensity
Then for each view, we have:
)()( kk wvH =
)()()( kkk hg βα λλ +=
Thus, we can get an initial result
Also, we assume that:
, are homogeneous coordinates
)( hvx =′
[1]
[1] L. Pickup, D. Capel, S. Roberts, and A. Zisserman, “Bayesian image super-resolution, continued,” in Advances in Neural Information Processing Systems, pp. 1089–1096, 2006.
Proposed solution:
),|( )( xyxp k ′
After initialization, novel view optimization and parameter calculation are executed iteratively.
This objective function can be optimized using bounded quasi-Newton method of BFGS (L-BFGS-B) [2].
Objective function:
[2] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: Lbfgs-b: Fortran subroutines for large-scale bound-constrained optimization,” ACM Transactions on Mathematical Software,vol. 23, no. 4, pp. 550–560, 1997.
The Jigsaw puzzle dataset contains 24400*300 images of a completed jigsaw puzzle which is hung on a white wall. All 24 images are captured from varying angles and depths.
Fig. 7 present views with focus axis rotated along the vertical and horizontal axes from perpendicular to the wall.
Results
The street in Jerusalem sequence consists of 1800 frames with each frame of size as 360*240. We extract frames 984 to 1005 for experiments.
Local methods are improved.
Cost-Volume Filtering-Based Stereo Matching with Improved Matching Cost and Secondary Refinement
Best Student Paper Candidate:
Problem:The approaches for stereo correspondence can be classified into global and local methods. Global methods usually achieve more accurate disparity map with higher computational complexity, while local methods are more efficient.
Contribution:propose a cost-volume filtering-based local stereo matching method that employs a new combined cost + a novel secondary disparity refinement mechanism .
Results
3D Reconstruction
Active Key Frame Selection For 3d Model Reconstruction From Crowdsourced Geo-Tagged Videos
• Guanfeng Wang* (NUS),* Ying Lu (usc.edu), Luming Zhang (comp.nus.edu.sg), Abdullah Alfarrarjeh (usc.edu), Roger Zimmermann (National University of Singapore), Seon Ho Kim (usc.edu), Cyrus Shahabi (usc.edu)
ICME 2015-2018
• Italy• USA• HongKong, China• USA
• Thanks!
ICME 2014 Electronic Proceeding is available.