Skip to main content

Advanced Data Analytics for Scene Understanding in Large Scale Multimedia Datasets

Project Member(s): Wu, Q.

Funding or Partner Organisation: National ICT Australia

Start year: 2016

Summary: This project targets on advanced data analysis in large multimedia (including text, image and video) data for scene understanding. Scene understanding has gone beyond the classic object recognition which only focuses on learning and recognising patterns of individual objects. Instead, scene understanding explores the potential interactions amongst individual object. This project focuses on the development of a holistic model which systematically integrates several learning tasks including low-level object representation, middle-level scene categorisation, saliency detection and high-level semantic event sensing/inference. Couple relations among these tasks are carefully explored in order to jointly optimise tasks to seek a join optimal for best scene understanding. In this project, graph theory will be deeply researched to be the fundamental block to represent the scene, where nodes represent the isolated objects in the scene including the typical object entity and abstract objects such as motion/action in the video sequence. The links between the nodes measure the interactive relationship between the nodes. Then, the graph cut is explored to further divide the original graph into several sub-graphs which will more explicitly represent each semantic component (i.e. interaction between object) and then best represent the scene. In this process, a context driven approach is considered to measure/predict the importance of relationships between objects (we assume that context is scene dependent). In such a way, instead of learning a fixed structure from the training dataset, it will learn the space of allowable structures in the scene and then predict a structure for a test scene data based on its global scene and local features. To differ from the existing work, this structure is built based on the high level interactive entity (i.e. interaction) instead of the typical object entity in order to more precisely describe a scene.


Zhang, Z, Wu, Q, Wang, Y & Chen, F 1970, 'Size-Invariant Attention Accuracy Metric for Image Captioning with High-Resolution Residual Attention', 2018 Digital Image Computing: Techniques and Applications (DICTA), 2018 Digital Image Computing: Techniques and Applications (DICTA), IEEE, Canberra, Australia, pp. 1-8.
View/Download from: Publisher's site

Edwards, D, Cheng, M, Wong, IA, Zhang, J & Wu, Q 2017, 'Ambassadors of knowledge sharing', International Journal of Contemporary Hospitality Management, vol. 29, no. 2, pp. 690-708.
View/Download from: Publisher's site

Wang, Y, Zhang, J, Liu, Z, Wu, Q, Chou, PA, Zhang, Z & Jia, Y 2016, 'Handling Occlusion and Large Displacement Through Improved RGB-D Scene Flow Estimation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 7, pp. 1265-1278.
View/Download from: Publisher's site

FOR Codes: Computer Vision, Pattern Recognition and Data Mining, Application Tools and System Utilities, Information Processing Services (incl. Data Entry and Capture), Data engineering and data science, Information systems, technologies and services not elsewhere classified