Continuous Deployment of Data Analytics Applications

Funding or Partner Organisation: National ICT Australia

Start year: 2015

Summary: More data are being collected and made available by different organizations and governments. There is an increasing need to easily explore data and quickly develop and deploy data analytics applications. Many existing efforts focus on the big data processing framework itself or data wrangling of existing data. However, two key pieces are still missing for the end-to-end picture. One is the continuous ingestion of data from sources like sensors, computerized visual images and other IoT (Internet of Things) devices and its continuous and rapid wrangling. Another gap is to support data scientists to easily explore on subsets of data locally, then the automatic transformation of successful exploration actions into large-scale deployment for both testing and production environment and continuously serving the learned insights (e.g. learned models) to other systems that require these insights, sometimes even with real-time requirements. This project aims to fill these gaps by focusing on the continuous nature of data ingestion, data exploration and deployment from source, to the local exploration environment, and then to the cluster environment, and finally to the systems using the learned insights/models. We will propose novel approaches for continuous ingestion and automated ways of transforming local data wrangling and manipulation to larger cluster environment with enhanced remote debugging.

Keywords: Big Data, Data Analytics, Data Science

FOR Codes: Internet Hosting Services (incl. Application Hosting Services), Pattern Recognition and Data Mining, Information Processing Services (incl. Data Entry and Capture), Data models, storage and indexing, Information systems, technologies and services not elsewhere classified