Anderson, C & Ryan, L 2017, 'A Comparison of Spatio-Temporal Disease Mapping Approaches Including an Application to Ischaemic Heart Disease in New South Wales, Australia', International Journal of Environmental Research and Public Health, vol. 14, no. 2, pp. 146-146.
View/Download from: Publisher's site
View description>>
© 2017 by the authors; licensee MDPI, Basel, Switzerland. The field of spatio-temporal modelling has witnessed a recent surge as a result of developments in computational power and increased data collection. These developments allow analysts to model the evolution of health outcomes in both space and time simultaneously. This paper models the trends in ischaemic heart disease (IHD) in New South Wales, Australia over an eight-year period between 2006 and 2013. A number of spatio-temporal models are considered, and we propose a novel method for determining the goodness-of-fit for these models by outlining a spatio-temporal extension of the Moran’s I statistic. We identify an overall decrease in the rates of IHD, but note that the extent of this health improvement varies across the state. In particular, we identified a number of remote areas in the north and west of the state where the risk stayed constant or even increased slightly.
Avilés-Ochoa, E, Perez-Arellano, LA, León-Castro, E & Merigó, JM 2017, 'PRIORITIZED INDUCED PROBABILISTIC DISTANCES IN TRANSPARENCY AND ACCESS TO INFORMATION LAWS', FUZZY ECONOMIC REVIEW, vol. 22, no. 01, pp. 45-55.
View/Download from: Publisher's site
View description>>
© 2016 Int. Association for Fuzzy-Set Management and Economy. All rights reserved. In this paper, a new extension of the ordered weighted average (OWA) operator is developed using four different methods: prioritized operators, induced operators, probabilistic operators and distance techniques. This new operator is called the prioritized induced probabilistic ordered weighted average distance (PIPOWAD) operator. The primary advantage is that we include in one formulation different characteristics and information provided by a group of decision makers to compare actual and ideal situations. Finally, an example of transparency and access to information law in Mexico is presented to forecast the score based on the expectations of decision makers.
Awwad, S & Piccardi, M 2017, 'Prototype-based budget maintenance for tracking in depth videos', Multimedia Tools and Applications, vol. 76, no. 20, pp. 21117-21132.
View/Download from: Publisher's site
View description>>
© 2016 Springer Science+Business Media New YorkThe use of conventional video tracking based on color or gray-level videos often raises concerns about the privacy of the tracked targets. To alleviate this issue, this paper presents a novel tracker that operates solely from depth data. The proposed tracker is designed as an extension of the popular Struck algorithm which leverages the effective framework of structural SVM. The main contributions of our paper are: i) a dedicated depth feature based on local depth patterns, ii) a heuristic for handling view occlusions in depth frames, and iii) a technique for keeping the number of the support vectors within a given “budget” so as to limit computational costs. Experimental results over the challenging Princeton Tracking Benchmark (PTB) dataset report a remarkable accuracy compared to the original Struck tracker and other state-of-the-art trackers using depth and RGB data.
Bakirov, R, Gabrys, B & Fay, D 2017, 'Multiple adaptive mechanisms for data-driven soft sensors', Computers & Chemical Engineering, vol. 96, pp. 42-54.
View/Download from: Publisher's site
View description>>
Recent data-driven soft sensors often use multiple adaptive mechanisms to cope with non-stationary environments. These mechanisms are usually deployed in a prescribed order which does not change. In this work we use real world data from the process industry to compare deploying adaptive mechanisms in a fixed manner to deploying them in a flexible way, which results in varying adaptation sequences. We demonstrate that flexible deployment of available adaptive methods coupled with techniques such as cross-validatory selection and retrospective model correction can benefit the predictive accuracy over time. As a vehicle for this study, we use a soft-sensor for batch processes based on an adaptive ensemble method which employs several adaptive mechanisms to react to the changes in data.
Blanco-Mesa, F & Merigó, JM 2017, 'BONFERRONI DISTANCES WITH HYBRID WEIGHTED DISTANCE AND IMMEDIATE WEIGHTED DISTANCE', FUZZY ECONOMIC REVIEW, vol. 22, no. 02, pp. 2274-2274.
View/Download from: Publisher's site
View description>>
© 2017 Int. Association for Fuzzy-Set Management and Economy. All rights reserved. The aim of the paper is to develop new aggregation operators using Bonferroni means, ordered weighted averaging (OWA) operators and some measures of distance. We introduce the Bonferroni Hybrid-weighted distance (BON-HWD), and Bonferroni distances with OWA operators and weighted averages (BON-IWOWAD). The main advantages of using these operators are that they allow the consideration of different aggregations contexts to be considered and multiple-comparison between each argument and distance measures in the same formulation. We develop a mathematical application to show the versatility of new models. Finally, this new group of family distances can be used in a wide range of management and economic fields.
Blanco-Mesa, F, Merigó, JM & Gil-Lafuente, AM 2017, 'Fuzzy decision making: A bibliometric-based review', Journal of Intelligent & Fuzzy Systems, vol. 32, no. 3, pp. 2033-2050.
View/Download from: Publisher's site
View description>>
© 2017 IOS Press and the authors. All rights reserved. Fuzzy decision-making consists in making decisions under complex and uncertain environments where the information can be assessed with fuzzy sets and systems. The aim of this study is to review the main contributions in this field by using a bibliometric approach. For doing so, the article uses a wide range of bibliometric indicators including the citations and the h-index. Moreover, it also uses the VOS viewer software in order to map the main trends in this area. The work considers the leading journals, articles, authors and institutions. The results indicate that the USA was the traditional leader in this field with the most significant researcher. However, during the last years, this field is receiving more attention by Asian authors that are starting to lead the field. This discipline has a strong potential and the expectations for the future is that it will continue to grow.
Cancino, C, Merigo, JM, Coronado, F, Dessouky, Y & Dessouky, M 2017, 'Forty years of computers and industrial engineering: A bibliometric analysis', Proceedings of International Conference on Computers and Industrial Engineering, CIE, vol. 0, pp. 614-629.
View/Download from: Publisher's site
View description>>
Computers & Industrial Engineering is a leading international journal in the field of industrial engineering that published its first issue in 1976. In 2016, the journal has celebrated the 40th anniversary. Motivated by this event, the aim of this study is to develop a bibliometric overview of the publications of the journal between 1976 and 2015. The objective is to identify the leading trends that are occurring in the journal in terms of productivity and influence of topics, authors, universities and countries. For doing so, the work uses the Web of Science Core Collection database to analyse the bibliometric data. The results show the strong position of the USA in the journal although China and other Asian countries are becoming very significant.
Cancino, C, Merigó, JM, Coronado, F, Dessouky, Y & Dessouky, M 2017, 'Forty years of Computers & Industrial Engineering: A bibliometric analysis', Computers & Industrial Engineering, vol. 113, pp. 614-629.
View/Download from: Publisher's site
View description>>
Computers & Industrial Engineering is a leading international journal in the field of industrial engineering that published its first issue in 1976. In 2016, the journal has celebrated the 40th anniversary. Motivated by this event, the aim of this study is to develop a bibliometric overview of the publications of the journal between 1976 and 2015. The objective is to identify the leading trends that are occurring in the journal in terms of productivity and influence of topics, authors, universities and countries. For doing so, the work uses the Web of Science Core Collection database to analyse the bibliometric data. The results show the strong position of the USA in the journal although China and other Asian countries are becoming very significant.
Cancino, CA, Merigo, JM & Coronado, FC 2017, 'Big Names in Innovation Research:A Bibliometric Overview', Current Science, vol. 113, no. 08, pp. 1507-1507.
View/Download from: Publisher's site
View description>>
Over the last few years an increasing number of scientific studies related to innovation research has been carried out. The present study analyses innovation research developed between 1989 and 2013. It uses the Web of Science database and provides several author-level bibliometric indicators including the total number of publications and citations, and the h-index. The results indicate that the most influential professors over the last 25 years, according to their highest h-index, are David Audretsch, Michael Hitt, Shaker Zahra, Rajshree Agarwal, Eric Von Hippel, David Teece, Will Mitchell and Robert Cooper. Among these authors, it is possible to demonstrate that they are not necessarily the most productive authors, with the highest number of publications; however, they are the most influential, with the highest number of citations. The incorporation of a larger number of journals to the Web of Science has granted different authors access to publish their work on innovation research.
Cancino, CA, Merigó, JM & Coronado, FC 2017, 'A bibliometric analysis of leading universities in innovation research', Journal of Innovation & Knowledge, vol. 2, no. 3, pp. 106-124.
View/Download from: Publisher's site
View description>>
© 2017 Journal of Innovation & Knowledge The number of innovation studies with a management perspective has grown considerably over the last 25 years. This study identified the universities that are most productive and influential in innovation research. The leading innovation research journals were also studied individually to identify the most productive universities for each journal. Data from the Web of Science were analyzed. Studies that were published between 1989 and 2013 were filtered first by the keyword “innovation” and second by 18 management-related research areas. The results indicate that US universities are the most productive and influential because they account for the most publications with a high number of citations and high h-index. Following advances in the productivity of numerous European journals, however, universities from the UK and the Netherlands are the most involved in publishing in journals that specialize in innovation research.
Chandrakanthan, V, Kang, YC, Knezevic, K, Qiao, Q, Oliver, RA, Unnikrishnan, A, Beck, D, Lee, B, Brownlee, C, Power, C & Pimanda, JE 2017, 'Genetic Fate Mapping of Mesenchymal Stem-Like Cells in the Aorta-Gonad Mesonephros (AGM) and Their Contribution to Definitive Hematopoiesis', Mechanisms of Development, vol. 145, pp. S55-S56.
View/Download from: Publisher's site
Chen, Q, Lan, C, Chen, B, Wang, L, Li, J & Zhang, C 2017, 'Exploring Consensus RNA Substructural Patterns Using Subgraph Mining', IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 14, no. 5, pp. 1134-1146.
View/Download from: Publisher's site
Chen, Q, Wang, Y, Chen, B, Zhang, C, Wang, L & Li, J 2017, 'Using propensity scores to predict the kinases of unannotated phosphopeptides', Knowledge-Based Systems, vol. 135, pp. 60-76.
View/Download from: Publisher's site
View description>>
© 2017 Protein phosphorylation is the process of binding a protein kinase to a specific site in a protein substrate for post-translational modification. Thousands of distinct phosphorylation sites have been identified, but most of them are not annotated with any kinase information. This work proposes a novel kinase-subgrouping propensity method (kiSP) to predict the binding kinases for phosphopeptides. Existing methods do not distinguish the residue conservation properties of the kinase family subgroups for annotation. Our method exploits maximum entropy variance to prune non-conserved sites from the subset of phosphopeptides that bind to the same kinase family. We also use maximal mutual information to estimate an appropriate upstream-downstream window size for this subset. A propensity score for every kinase family is calculated from its positive and negative data, which indicates its effectiveness as a site for each test phosphopeptide. Experimental results demonstrate that our method outperforms current algorithms in specificity and sensitivity under cross-validation. kiSP is also demonstrated to correctly predict kinase families for phosphopeptides with unknown kinase information.
Chen, Y, Yue, X, Xu, RYD & Fujita, H 2017, 'Region scalable active contour model with global constraint', Knowledge-Based Systems, vol. 120, pp. 57-73.
View/Download from: Publisher's site
View description>>
© 2016Existing Active Contour methods suffer from the deficiencies of initialization sensitivity, slow convergence, and being insufficient in the presence of image noise and inhomogeneity. To address these problems, this paper proposes a region scalable active contour model with global constraint (RSGC). The energy function is formulated by incorporating local and global constraints. The local constraint is a region scalable fitting term that draws upon local region information under controllable scales. The global constraint is constructed through estimating the global intensity distribution of image content. Specifically, the global intensity distribution is approximated with a Gaussian Mixture Model (GMM) and estimated by Expectation Maximization (EM) algorithm as a prior. The segmentation process is implemented through optimizing the improved energy function. Comparing with two other representative models, i.e. region-scalable fitting model (RSF) and active contour model without edges (CV), the proposed RSGC model achieves more efficient, stable and precise results on most testing images under the joint actions of local and global constraints.
Cheng, H, Zhang, J, Wu, Q, An, P & Liu, Z 2017, 'Stereoscopic visual saliency prediction based on stereo contrast and stereo focus', EURASIP Journal on Image and Video Processing, vol. 2017, no. 1.
View/Download from: Publisher's site
View description>>
© 2017, The Author(s). In this paper, we exploit two characteristics of stereoscopic vision: the pop-out effect and the comfort zone. We propose a visual saliency prediction model for stereoscopic images based on stereo contrast and stereo focus models. The stereo contrast model measures stereo saliency based on the color/depth contrast and the pop-out effect. The stereo focus model describes the degree of focus based on monocular focus and the comfort zone. After obtaining the values of the stereo contrast and stereo focus models in parallel, an enhancement based on clustering is performed on both values. We then apply a multi-scale fusion to form the respective maps of the two models. Last, we use a Bayesian integration scheme to integrate the two maps (the stereo contrast and stereo focus maps) into the stereo saliency map. Experimental results on two eye-tracking databases show that our proposed method outperforms the state-of-the-art saliency models.
Deng, S, Huang, L, Xu, G, Wu, X & Wu, Z 2017, 'On Deep Learning for Trust-Aware Recommendations in Social Networks', IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 5, pp. 1164-1177.
View/Download from: Publisher's site
View description>>
© 2016 IEEE. With the emergence of online social networks, the social network-based recommendation approach is popularly used. The major benefit of this approach is the ability of dealing with the problems with cold-start users. In addition to social networks, user trust information also plays an important role to obtain reliable recommendations. Although matrix factorization (MF) becomes dominant in recommender systems, the recommendation largely relies on the initialization of the user and item latent feature vectors. Aiming at addressing these challenges, we develop a novel trust-based approach for recommendation in social networks. In particular, we attempt to leverage deep learning to determinate the initialization in MF for trust-aware social recommendations and to differentiate the community effect in user's trusted friendships. A two-phase recommendation process is proposed to utilize deep learning in initialization and to synthesize the users' interests and their trusted friends' interests together with the impact of community effect for recommendations. We perform extensive experiments on real-world social network data to demonstrate the accuracy and effectiveness of our proposed approach in comparison with other state-of-the-art methods.
Edwards, D, Cheng, M, Wong, IA, Zhang, J & Wu, Q 2017, 'Ambassadors of knowledge sharing', International Journal of Contemporary Hospitality Management, vol. 29, no. 2, pp. 690-708.
View/Download from: Publisher's site
View description>>
PurposeThe aim of this study is to understand the knowledge-sharing structure and co-production of trip-related knowledge through online travel forums.Design/methodology/approachThe travel forum threads were collected from TripAdvisor’s Sydney travel forum for the period from 2010 to 2014, which contains 115,847 threads from 8,346 conversations. The data analytical technique was based on a novel methodological approach – visual analytics, including semantic pattern generation and network analysis.FindingsFindings indicate that the knowledge structure is created by community residents who camouflage as local experts and serve as ambassadors of a destination. The knowledge structure presents collective intelligence co-produced by community residents and tourists. Further findings reveal how these community residents associate with each other and form a knowledge repertoire with information covering various travel domain areas.Practical implicationsThe study offers valuable insights to help destination-management organizations and tour operators identify existing and emerging tourism issues to achieve a competitive destination advantage.Originality/valueThis study highlights the process of social media mediated travel knowledge co-production. It also discovers how community residents engage in reaching out to tourists by camouflaging as ordinary users.
Fan, X, Xu, RYD, Cao, L & Song, Y 2017, 'Learning Nonparametric Relational Models by Conjugately Incorporating Node Information in a Network', IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 589-599.
View/Download from: Publisher's site
View description>>
© 2013 IEEE. Relational model learning is useful for numerous practical applications. Many algorithms have been proposed in recent years to tackle this important yet challenging problem. Existing algorithms utilize only binary directional link data to recover hidden network structures. However, there exists far richer and more meaningful information in other parts of a network which one can (and should) exploit. The attributes associated with each node, for instance, contain crucial information to help practitioners understand the underlying relationships in a network. For this reason, in this paper, we propose two models and their solutions, namely the node-information involved mixed-membership model and the node-information involved latent-feature model, in an effort to systematically incorporate additional node information. To effectively achieve this aim, node information is used to generate individual sticks of a stick-breaking process. In this way, not only can we avoid the need to prespecify the number of communities beforehand, the algorithm also encourages that nodes exhibiting similar information have a higher chance of assigning the same community membership. Substantial efforts have been made toward achieving the appropriateness and efficiency of these models, including the use of conjugate priors. We evaluate our framework and its inference algorithms using real-world data sets, which show the generality and effectiveness of our models in capturing implicit network structures.
Ge, XJ, Livesey, P, Wang, J, Huang, S, He, X & Zhang, C 2017, 'Deconstruction waste management through 3d reconstruction and bim: a case study', Visualization in Engineering, vol. 5, no. 1.
View/Download from: Publisher's site
View description>>
AbstractThe construction industry is responsible for 50% of the solid waste generated worldwide. Governments around the world formulate legislation and regulations concerning recycling and re-using building materials, aiming to reduce waste and environmental impact. Researchers have also been developing strategies and models of waste management for construction and demolition of buildings. The application of Building Information Modeling (BIM) is an example of this. BIM is emergent technology commonly used to maximize the efficiency of design, construction and maintenance throughout the entire lifecycle. The uses of BIM on deconstruction or demolition are not common; especially the fixtures and fittings of buildings are not considered in BIM models. The development of BIM is based on two-dimensional drawings or sketches, which may not be accurately converted to 3D BIM models. In addition, previous researches mainly focused on construction waste management. There are few studies about the deconstruction waste management focusing on demolition. To fill this gap, this paper aims to develop a framework using a reconstructed 3D model with BIM, for the purpose of improving BIM accuracy and thus developing a deconstruction waste management system to improve demolition efficiency, effective recycling and cost savings. In particular, the developed as-built BIM will be used to identify and measure recyclable materials, as well as to develop a plan for the recycling process.
Gheisari, S, Charlton, A, Catchpoole, DR & Kennedy, PJ 2017, 'Computers can classify neuroblastic tumours from histopathological images using machine learning', Pathology, vol. 49, pp. S72-S73.
View/Download from: Publisher's site
Ghosh, S, Li, J, Cao, L & Ramamohanarao, K 2017, 'Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns', Journal of Biomedical Informatics, vol. 66, pp. 19-31.
View/Download from: Publisher's site
View description>>
© 2016 Background and objective Critical care patient events like sepsis or septic shock in intensive care units (ICUs) are dangerous complications which can cause multiple organ failures and eventual death. Preventive prediction of such events will allow clinicians to stage effective interventions for averting these critical complications. Methods It is widely understood that physiological conditions of patients on variables such as blood pressure and heart rate are suggestive to gradual changes over a certain period of time, prior to the occurrence of a septic shock. This work investigates the performance of a novel machine learning approach for the early prediction of septic shock. The approach combines highly informative sequential patterns extracted from multiple physiological variables and captures the interactions among these patterns via coupled hidden Markov models (CHMM). In particular, the patterns are extracted from three non-invasive waveform measurements: the mean arterial pressure levels, the heart rates and respiratory rates of septic shock patients from a large clinical ICU dataset called MIMIC-II. Evaluation and results For baseline estimations, SVM and HMM models on the continuous time series data for the given patients, using MAP (mean arterial pressure), HR (heart rate), and RR (respiratory rate) are employed. Single channel patterns based HMM (SCP-HMM) and multi-channel patterns based coupled HMM (MCP-HMM) are compared against baseline models using 5-fold cross validation accuracies over multiple rounds. Particularly, the results of MCP-HMM are statistically significant having a p-value of 0.0014, in comparison to baseline models. Our experiments demonstrate a strong competitive accuracy in the prediction of septic shock, especially when the interactions between the multiple variables are coupled by the learning model. Conclusions It can be concluded that the novelty of the approach, stems from the integration of sequence-based phy...
Gill, AQ, Braytee, A & Hussain, FK 2017, 'Adaptive service e-contract information management reference architecture', VINE Journal of Information and Knowledge Management Systems, vol. 47, no. 3, pp. 395-410.
View/Download from: Publisher's site
View description>>
PurposeThe aim of this paper is to report on the adaptive e-contract information management reference architecture using the systematic literature review (SLR) method. Enterprises need to effectively design and implement complex adaptive e-contract information management architecture to support dynamic service interactions or transactions.Design/methodology/approachThe SLR method is three-fold and was adopted as follows. First, a customized literature search with relevant selection criteria was developed, which was then applied to initially identify a set of 1,573 papers. Second, 55 of 1,573 papers were selected for review based on the initial review of each identified paper title and abstract. Finally, based on the second review, 24 papers relevant to this research were selected and reviewed in detail.FindingsThis detailed review resulted in the adaptive e-contract information management reference architecture elements including structure, life cycle and supporting technology.Research limitations/implicationsThe reference architecture elements could serve as a taxonomy for researchers and practitioners to develop context-specific service e-contract information management architecture to support dynamic service interactions for value co-creation. The results are limited to the number of selected databases and papers reviewed in this study.Originality/valueThis paper offers a review of the body of knowledge and novel e-contract information management reference architecture, ...
Goodswen, SJ, Kennedy, PJ & Ellis, JT 2017, 'On the application of reverse vaccinology to parasitic diseases: a perspective on feature selection and ranking of vaccine candidates', International Journal for Parasitology, vol. 47, no. 12, pp. 779-790.
View/Download from: Publisher's site
View description>>
Reverse vaccinology has the potential to rapidly advance vaccine development against parasites, but it is unclear which features studied in silico will advance vaccine development. Here we consider Neospora caninum which is a globally distributed protozoan parasite causing significant economic and reproductive loss to cattle industries worldwide. The aim of this study was to use a reverse vaccinology approach to compile a worthy vaccine candidate list for N. caninum, including proteins containing pathogen-associated molecular patterns to act as vaccine carriers. The in silico approach essentially involved collecting a wide range of gene and protein features from public databases or computationally predicting those for every known Neospora protein. This data collection was then analysed using an automated high-throughput process to identify candidates. The final vaccine list compiled was judged to be the optimum within the constraints of available data, current knowledge, and existing bioinformatics programs. We consider and provide some suggestions and experience on how ranking of vaccine candidate lists can be performed. This study is therefore important in that it provides a valuable resource for establishing new directions in vaccine research against neosporosis and other parasitic diseases of economic and medical importance.
Guo, D, Xu, J, Zhang, J, Xu, M, Cui, Y & He, X 2017, 'User relationship strength modeling for friend recommendation on Instagram', Neurocomputing, vol. 239, pp. 9-18.
View/Download from: Publisher's site
View description>>
© 2017 Elsevier B.V. Social strength modeling in the social media community has attracted increasing research interest. Different from Flickr, which has been explored by many researchers, Instagram is more popular for mobile users and is conducive to likes and comments but seldom investigated. On Instagram, a user can post photos/videos, follow other users, comment and like other users’ posts. These actions generate diverse forms of data that result in multiple user relationship views. In this paper, we propose a new framework to discover the underlying social relationship strength. User relationship learning under multiple views and the relationship strength modeling are coupled into one process framework. In addition, given the learned relationship strength, a coarse-to-fine method is proposed for friend recommendation. Experiments on friend recommendations for Instagram are presented to show the effectiveness and efficiency of the proposed framework. As exhibited by our experimental results, it can obtain better performance over other related methods. Although our method has been proposed for Instagram, it can be easily extended to any other social media communities.
Hasan, MAM, Li, J, Ahmad, S & Molla, MKI 2017, 'predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue', Analytical Biochemistry, vol. 525, pp. 107-113.
View/Download from: Publisher's site
View description>>
The carbonylation is found as an irreversible post-translational modification and considered a biomarker of oxidative stress. It plays major role not only in orchestrating various biological processes but also associated with some diseases such as Alzheimer's disease, diabetes, and Parkinson's disease. However, since the experimental technologies are costly and time-consuming to detect the carbonylation sites in proteins, an accurate computational method for predicting carbonylation sites is an urgent issue which can be useful for drug development. In this study, a novel computational tool termed predCar-Site has been developed to predict protein carbonylation sites by (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing the effect of skewed training dataset by Different Error Costs method, and (3) constructing a predictor using support vector machine as classifier. This predCar-Site predictor achieves an average AUC (area under curve) score of 0.9959, 0.9999, 1, and 0.9997 in predicting the carbonylation sites of K, P, R, and T, respectively. All of the experimental results along with AUC are found from the average of 5 complete runs of the 10-fold cross-validation and those results indicate significantly better performance than existing predictors. A user-friendly web server of predCar-Site is available at http://research.ru.ac.bd/predCar-Site/.
He, X, Wu, Y, Yu, D & Merigó, JM 2017, 'Exploring the Ordered Weighted Averaging Operator Knowledge Domain: A Bibliometric Analysis', International Journal of Intelligent Systems, vol. 32, no. 11, pp. 1151-1166.
View/Download from: Publisher's site
View description>>
© 2017 Wiley Periodicals, Inc. Ordered weighted averaging (OWA) operator has been received increasingly widespread interest since its appearance in 1988. Recently, a topic search with the keywords “ordered weighted averaging operator” or “OWA operator” on Web of Science (WOS) found 1231 documents. As the publications about OWA operator increase rapidly, thus a scientometric analysis of this research field and discovery of its knowledge domain becomes very important and necessary. This paper studies the publications about OWA operator between 1988 and 2015, and it is based on 1213 bibliographic records obtained by using topic search from WOS. The disciplinary distribution, most cited papers, influential journals, as well as influential authors are analyzed through citation and cocitation analysis. The emerging trends in OWA operator research are explored by keywords and references burst detection analysis. The research methods and results in this paper are meaningful for researchers associated with OWA operator field to understand the knowledge domain and establish their own future research direction.
Hou, S, Chen, L, Tao, D, Zhou, S, Liu, W & Zheng, Y 2017, 'Multi-layer multi-view topic model for classifying advertising video', Pattern Recognition, vol. 68, pp. 66-81.
View/Download from: Publisher's site
View description>>
© 2017 Elsevier Ltd The recent proliferation of advertising (ad) videos has driven the research in multiple applications, ranging from video analysis to video indexing and retrieval. Among them, classifying ad video is a key task because it allows automatic organization of videos according to categories or genres, and this further enables ad video indexing and retrieval. However, classifying ad video is challenging compared to other types of video classification because of its unconstrained content. While many studies focus on embedding ads relevant to videos, to our knowledge, few focus on ad video classification. In order to classify ad video, this paper proposes a novel ad video representation that aims to sufficiently capture the latent semantics of video content from multiple views in an unsupervised manner. In particular, we represent ad videos from four views, including bag-of-feature (BOF), vector of locally aggregated descriptors (VLAD), fisher vector (FV) and object bank (OB). We then devise a multi-layer multi-view topic model, mlmv_LDA, which models the topics of videos from different views. A topical representation for video, supporting category-related task, is finally achieved by the proposed method. Our empirical classification results on 10,111 real-world ad videos demonstrate that the proposed approach effectively differentiate ad videos.
Hu, L, Cao, L, Cao, J, Gu, Z, Xu, G & Wang, J 2017, 'Improving the Quality of Recommendations for Users and Items in the Tail of Distribution', ACM Transactions on Information Systems, vol. 35, no. 3, pp. 1-37.
View/Download from: Publisher's site
View description>>
Short-head and long-tail distributed data are widely observed in the real world. The same is true of recommender systems (RSs), where a small number of popular items dominate the choices and feedback data while the rest only account for a small amount of feedback. As a result, most RS methods tend to learn user preferences from popular items since they account for most data. However, recent research in e-commerce and marketing has shown that future businesses will obtain greater profit from long-tail selling. Yet, although the number of long-tail items and users is much larger than that of short-head items and users, in reality, the amount of data associated with long-tail items and users is much less. As a result, user preferences tend to be popularity-biased. Furthermore, insufficient data makes long-tail items and users more vulnerable to shilling attack. To improve the quality of recommendations for items and users in the tail of distribution, we propose a coupled regularization approach that consists of two latent factor models: C-HMF, for enhancing credibility, and S-HMF, for emphasizing specialty on user choices. Specifically, the estimates learned from C-HMF and S-HMF recurrently serve as the empirical priors to regularize one another. Such coupled regularization leads to the comprehensive effects of final estimates, which produce more qualitative predictions for both tail users and tail items. To assess the effectiveness of our model, we conduct empirical evaluations on large real-world datasets with various metrics. The results prove that our approach significantly outperforms the compared methods.
Hu, L, Cao, L, Cao, J, Gu, Z, Xu, G & Yang, D 2017, 'Learning Informative Priors from Heterogeneous Domains to Improve Recommendation in Cold-Start User Domains', ACM Transactions on Information Systems, vol. 35, no. 2, pp. 1-37.
View/Download from: Publisher's site
View description>>
In the real-world environment, users have sufficient experience in their focused domains but lack experience in other domains. Recommender systems are very helpful for recommending potentially desirable items to users in unfamiliar domains, and cross-domain collaborative filtering is therefore an important emerging research topic. However, it is inevitable that the cold-start issue will be encountered in unfamiliar domains due to the lack of feedback data. The Bayesian approach shows that priors play an important role when there are insufficient data, which implies that recommendation performance can be significantly improved in cold-start domains if informative priors can be provided. Based on this idea, we propose a Weighted Irregular Tensor Factorization (WITF) model to leverage multi-domain feedback data across all users to learn the cross-domain priors w.r.t. both users and items. The features learned from WITF serve as the informative priors on the latent factors of users and items in terms of weighted matrix factorization models. Moreover, WITF is a unified framework for dealing with both explicit feedback and implicit feedback. To prove the effectiveness of our approach, we studied three typical real-world cases in which a collection of empirical evaluations were conducted on real-world datasets to compare the performance of our model and other state-of-the-art approaches. The results show the superiority of our model over comparison models.
Hu, S-S, Chen, P, Wang, B & Li, J 2017, 'Protein binding hot spots prediction from sequence only by a new ensemble learning method', Amino Acids, vol. 49, no. 10, pp. 1773-1785.
View/Download from: Publisher's site
View description>>
Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set.http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .
Huang, S, Zhang, J, Schonfeld, D, Wang, L & Hua, X-S 2017, 'Two-Stage Friend Recommendation Based on Network Alignment and Series Expansion of Probabilistic Topic Model', IEEE Transactions on Multimedia, vol. 19, no. 6, pp. 1314-1326.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Precise friend recommendation is an important problem in social media. Although most social websites provide some kinds of auto friend searching functions, their accuracies are not satisfactory. In this paper, we propose a more precise auto friend recommendation method with two stages. In the first stage, by utilizing the information of the relationship between texts and users, as well as the friendship information between users, we align different social networks and choose some 'possible friends.' In the second stage, with the relationship between image features and users, we build a topic model to further refine the recommendation results. Because some traditional methods, such as variational inference and Gibbs sampling, have their limitations in dealing with our problem, we develop a novel method to find out the solution of the topic model based on series expansion. We conduct experiments on the Flickr dataset to show that the proposed algorithm recommends friends more precisely and faster than traditional methods.
Hussain, W, Hussain, FK, Hussain, OK, Damiani, E & Chang, E 2017, 'Formulating and managing viable SLAs in cloud computing from a small to medium service provider's viewpoint: A state-of-the-art review', Information Systems, vol. 71, pp. 240-259.
View/Download from: Publisher's site
View description>>
In today's competitive world, service providers need to be customer-focused and proactive in their marketing strategies to create consumer awareness of their services. Cloud computing provides an open and ubiquitous computing feature in which a large random number of consumers can interact with providers and request services. In such an environment, there is a need for intelligent and efficient methods that increase confidence in the successful achievement of business requirements. One such method is the Service Level Agreement (SLA), which is comprised of service objectives, business terms, service relations, obligations and the possible action to be taken in the case of SLA violation. Most of the emphasis in the literature has, until now, been on the formation of meaningful SLAs by service consumers, through which their requirements will be met. However, in an increasingly competitive market based on the cloud environment, service providers too need a framework that will form a viable SLA, predict possible SLA violations before they occur, and generate early warning alarms that flag a potential lack of resources. This is because when a provider and a consumer commit to an SLA, the service provider is bound to reserve the agreed amount of resources for the entire period of that agreement – whether the consumer uses them or not. It is therefore very important for cloud providers to accurately predict the likely resource usage for a particular consumer and to formulate an appropriate SLA before finalizing an agreement. This problem is more important for a small to medium cloud service provider which has limited resources that must be utilized in the best possible way to generate maximum revenue. A viable SLA in cloud computing is one that intelligently helps the service provider to determine the amount of resources to offer to a requesting consumer, and there are number of studies on SLA management in the literature. The aim of this paper is two-fold. First, it pr...
Hussein, F & Piccardi, M 2017, 'V-JAUNE', ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 13, no. 2, pp. 1-19.
View/Download from: Publisher's site
View description>>
Video summarization and action recognition are two important areas of multimedia video analysis. While these two areas have been tackled separately to date, in this article, we present a latent structural SVM framework to recognize the action and derive the summary of a video in a joint, simultaneous fashion. Efficient inference is provided by a submodular score function that accounts for the action and summary jointly. In this article, we also define a novel measure to evaluate the quality of a predicted video summary against the annotations of multiple annotators. Quantitative and qualitative results over two challenging action datasets—the ACE and MSR DailyActivity3D datasets—show that the proposed joint approach leads to higher action recognition accuracy and equivalent or better summary quality than comparable approaches that perform these tasks separately.
Jiang, Y, Tsai, P, Yeh, W-C & Cao, L 2017, 'A honey-bee-mating based algorithm for multilevel image segmentation using Bayesian theorem', Applied Soft Computing, vol. 52, pp. 1181-1190.
View/Download from: Publisher's site
View description>>
© 2016 Elsevier B.V. The image thresholding techniques are considered as a must for objects segmentation, compression and target recognition, and they have been widely studied for the last few decades; for example, the multi-level thresholding methods, and as such (they) render more great challenges for image segmentation techniques that remain computationally more expensive, when their choices of threshold numbers were increased. Therefore, our aim was to propose an algorithm based on Bayesian theorem and the so-called honey-bee-mating algorithm (HBMA), called a Bayesian honey bee mating algorithm BHBMA. It can not only reduce the computational time and curse of dimensionality, but also can run more reliably and more stably. This enhanced capability was technically accomplished by embedding a new population initialization strategy based on the characteristics of multi-level thresholding technique in pixel-based intensity images arranged from lower grey levels to higher ones. Extensive experiments have shown that our proposed method outperformed other state-of-the-art algorithms empirically in terms of their effectiveness and efficiency, when applying to complex image processing scenario such as automatic target recognition.
Khuat, TT & Le, MH 2017, 'A genetic algorithm with multi-parent crossover using quaternion representation for numerical function optimization', Applied Intelligence, vol. 46, no. 4, pp. 810-826.
View/Download from: Publisher's site
Khuat, TT & Le, MH 2017, 'Applying teaching-learning to artificial bee colony for parameter optimization of software effort estimation model', Journal of Engineering Science and Technology, vol. 12, no. 5, pp. 1178-1190.
View description>>
Artificial Bee Colony inspired by the foraging behaviour of honey bees is a novel meta-heuristic optimization algorithm in the community of swarm intelligence algorithms. Nevertheless, it is still insufficient in the speed of convergence and the quality of solutions. This paper proposes an approach in order to tackle these downsides by combining the positive aspects of Teaching-Learning based optimization and Artificial Bee Colony. The performance of the proposed method is assessed on the software effort estimation problem, which is the complex and important issue in the project management. Software developers often carry out the software estimation in the early stages of the software development life cycle to derive the required cost and schedule for a project. There are a large number of methods for effort estimation in which COCOMO II is one of the most widely used models. However, this model has some restricts because its parameters have not been optimized yet. In this work, therefore, we will present the approach to overcome this limitation of COCOMO II model. The experiments have been conducted on NASA software project dataset and the obtained results indicated that the improvement of parameters provided better estimation capabilities compared to the original COCOMO II model.
Laengle, S, Loyola, G & Merigo, JM 2017, 'Mean-Variance Portfolio Selection With the Ordered Weighted Average', IEEE Transactions on Fuzzy Systems, vol. 25, no. 2, pp. 350-362.
View/Download from: Publisher's site
View description>>
© 1993-2012 IEEE. Portfolio selection is the theory that studies the process of selecting the optimal proportion of different assets. The first approach was introduced by Harry Markowitz and was based on a mean-variance framework. This paper introduces the ordered weighted average (OWA) in the mean-variance model. The main idea is to replace the classical mean and variance by the OWA operator. By doing so, the new model is able to study different degrees of optimism and pessimism in the analysis being able to develop an approach that considers the decision makers attitude in the selection process. This paper also suggests a new framework for dealing with the attitudinal character of the decision maker based on the numerical values of the available arguments. The main advantage of this method is the ability to adapt to many situations offering a more complete representation of the available data from the most pessimistic situation to the most optimistic one. An illustrative with fictitious data and a real example are studied.
Laengle, S, Merigó, JM, Miranda, J, Słowiński, R, Bomze, I, Borgonovo, E, Dyson, RG, Oliveira, JF & Teunter, R 2017, 'Forty years of the European Journal of Operational Research: A bibliometric overview', European Journal of Operational Research, vol. 262, no. 3, pp. 803-816.
View/Download from: Publisher's site
View description>>
© 2017 Elsevier B.V. The European Journal of Operational Research (EJOR) published its first issue in 1977. This paper presents a general overview of the journal over its lifetime by using bibliometric indicators. We discuss its performance compared to other journals in the field and identify key contributing countries/institutions/authors as well as trends in research topics based on the Web of Science Core Collection database. The results indicate that EJOR is one of the leading journals in the area of operational research (OR) and management science (MS), with a wide range of authors from institutions and countries from all over the world publishing in it. Graphical visualization of similarities (VOS) provides further insights into how EJOR links to other journals and how it links researchers across the globe.
Le, M, Gabrys, B & Nauck, D 2017, 'A hybrid model for business process event and outcome prediction', Expert Systems, vol. 34, no. 5, pp. 1-11.
View/Download from: Publisher's site
View description>>
AbstractLarge service companies run complex customer service processes to provide communication services to their customers. The flawless execution of these processes is essential because customer service is an important differentiator. They must also be able to predict if processes will complete successfully or run into exceptions in order to intervene at the right time, preempt problems and maintain customer service. Business process data are sequential in nature and can be very diverse. Thus, there is a need for an efficient sequential forecasting methodology that can cope with this diversity. This paper proposes two approaches, a sequential k nearest neighbour and an extension of Markov models both with an added component based on sequence alignment. The proposed approaches exploit temporal categorical features of the data to predict the process next steps using higher order Markov models and the process outcomes using sequence alignment technique. The diversity aspect of the data is also added by considering subsets of similar process sequences based on k nearest neighbours. We have shown, via a set of experiments, that our sequential k nearest neighbour offers better results when compared with the original ones; our extension Markov model outperforms random guess, Markov models and hidden Markov models.
Lee, JYL, Brown, JJ & Ryan, LM 2017, 'Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era', The American Statistician, vol. 71, no. 3, pp. 202-208.
View/Download from: Publisher's site
View description>>
© 2017 American Statistical Association. The big data era demands new statistical analysis paradigms, since traditional methods often break down when datasets are too large to fit on a single desktop computer. Divide and Recombine (D&R) is becoming a popular approach for big data analysis, where results are combined over subanalyses performed in separate data subsets. In this article, we consider situations where unit record data cannot be made available by data custodians due to privacy concerns, and explore the concept of statistical sufficiency and summary statistics for model fitting. The resulting approach represents a type of D&R strategy, which we refer to as summary statistics D&R; as opposed to the standard approach, which we refer to as horizontal D&R. We demonstrate the concept via an extended Gamma–Poisson model, where summary statistics are extracted from different databases and incorporated directly into the fitting algorithm without having to combine unit record data. By exploiting the natural hierarchy of data, our approach has major benefits in terms of privacy protection. Incorporating the proposed modelling framework into data extraction tools such as TableBuilder by the Australian Bureau of Statistics allows for potential analysis at a finer geographical level, which we illustrate with a multilevel analysis of the Australian unemployment data. Supplementary materials for this article are available online.
Li, J, Deng, C, Da Xu, RY, Tao, D & Zhao, B 2017, 'Robust Object Tracking With Discrete Graph-Based Multiple Experts', IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2736-2750.
View/Download from: Publisher's site
View description>>
© 1992-2012 IEEE. Variations of target appearances due to illumination changes, heavy occlusions, and target deformations are the major factors for tracking drift. In this paper, we show that the tracking drift can be effectively corrected by exploiting the relationship between the current tracker and its historical tracker snapshots. Here, a multi-expert framework is established by the current tracker and its historical trained tracker snapshots. The proposed scheme is formulated into a unified discrete graph optimization framework, whose nodes are modeled by the hypotheses of the multiple experts. Furthermore, an exact solution of the discrete graph exists giving the object state estimation at each time step. With the unary and binary compatibility graph scores defined properly, the proposed framework corrects the tracker drift via selecting the best expert hypothesis, which implicitly analyzes the recent performance of the multi-expert by only evaluating graph scores at the current frame. Three base trackers are integrated into the proposed framework to validate its effectiveness. We first integrate the online SVM on a budget algorithm into the framework with significant improvement. Then, the regression correlation filters with hand-crafted features and deep convolutional neural network features are introduced, respectively, to further boost the tracking performance. The proposed three trackers are extensively evaluated on three data sets: TB-50, TB-100, and VOT2015. The experimental results demonstrate the excellent performance of the proposed approaches against the state-of-the-art methods.
Liu, B, Xiao, Y & Cao, L 2017, 'SVM-based multi-state-mapping approach for multi-class classification', Knowledge-Based Systems, vol. 129, pp. 79-96.
View/Download from: Publisher's site
View description>>
© 2017.Traditional SVM-based multi-class classification algorithms mainly adopt the strategy of mapping the data set with all classes into a single feature space via a kernel function, in which SVM is constructed for each decomposed binary classification problem. However, it is not always possible to find an appropriate kernel function to render all the classes distinguishable in a single feature space, since each class is always derived from different data distributions. Consequently, the performance is not always as good as expected. To improve the performance of multi-class classification, this paper proposes an improved approach, called multi-state-mapping (MSM) with SVM based on hierarchical architecture, which maps the data set with all classes into different feature spaces at the different states of the decomposition of a multi-class classification problem in terms of a binary tree architecture. We prove that the computational complexity of MSM at its worst lies between that of the one-against-all scheme and one-against-one scheme. Substantial experiments have been conducted on sixteen UCI data sets to show the performance of our method. The statistical results show that MSM outperforms state-of-the-art methods in terms of accuracy and standard deviation.
Liu, K, Beck, D, Thoms, JAI, Liu, L, Zhao, W, Pimanda, JE & Zhou, X 2017, 'Annotating function to differentially expressed LincRNAs in myelodysplastic syndrome using a network-based method', Bioinformatics, vol. 33, no. 17, pp. 2622-2630.
View/Download from: Publisher's site
View description>>
Abstract Motivation Long non-coding RNAs (lncRNAs) have been implicated in the regulation of diverse biological functions. The number of newly identified lncRNAs has increased dramatically in recent years but their expression and function have not yet been described from most diseases. To elucidate lncRNA function in human disease, we have developed a novel network based method (NLCFA) integrating correlations between lncRNA, protein coding genes and noncoding miRNAs. We have also integrated target gene associations and protein-protein interactions and designed our model to provide information on the combined influence of mRNAs, lncRNAs and miRNAs on cellular signal transduction networks. Results We have generated lncRNA expression profiles from the CD34+ haematopoietic stem and progenitor cells (HSPCs) from patients with Myelodysplastic syndromes (MDS) and healthy donors. We report, for the first time, aberrantly expressed lncRNAs in MDS and further prioritize biologically relevant lncRNAs using the NLCFA. Taken together, our data suggests that aberrant levels of specific lncRNAs are intimately involved in network modules that control multiple cancer-associated signalling pathways and cellular processes. Importantly, our method can be applied to prioritize aberrantly expressed lncRNAs for functional validation in other diseases and biological contexts. Availability and implementation The method is implemented in R language and Matlab. Supplementary information ...
Liu, W, Luo, X, Zhang, J, Xue, R & Xu, RYD 2017, 'Semantic summary automatic generation in news event', Concurrency and Computation: Practice and Experience, vol. 29, no. 24, pp. e4287-e4287.
View/Download from: Publisher's site
View description>>
SummaryHow to generate summary with more novel and rich semantics is a challenging issue in the area of multi‐document automatic summary. In this paper, a core semantics extraction model (CSEM) is proposed to improve the novel and rich semantics of multi‐document summary. Firstly, for improving the rich semantics, semantic units, which are a group of association relations of keywords, are used to express texts' semantics. Secondly, for improving the novel semantics, an attenuation function is introduced to adjust the importance of semantic units according to the appearing times that semantic units in the candidate of summary sentences. Thirdly, in order to maximize the novel and rich semantics of summary, the generating process of summary is converted into the optimization process on how to find a set of sentences with a higher importance. Finally, CSEM extracts the least number of sentences to cover the most core semantics in corpus as summary. Experimental results on the benchmark DUC 2004 show that our model outperforms the state‐of‐art approaches (eg, OCCAMS_V, JS‐Gen‐2) under official metric. Especially, the recall of our model in ROUGE‐1 is 40.684%, which is better than other approaches (eg, OCCAMS_V 38.497% and JS‐Gen‐2 36.739%).
Llopis-Albert, C, Merigó, JM, Xu, Y & Liao, H 2017, 'Improving Regional Climate Projections by Prioritized Aggregation via Ordered Weighted Averaging Operators', Environmental Engineering Science, vol. 34, no. 12, pp. 880-886.
View/Download from: Publisher's site
View description>>
© Copyright 2017, Mary Ann Liebert, Inc. 2017. Decision makers express a strong need for reliable information on future climate changes to develop with the best mitigation and adaptation strategies to address impacts. These decisions are based on future climate projections that are simulated by using different Representative Concentration Pathways (RCPs), General Circulation Models (GCMs), and downscaling techniques to obtain high-resolution Regional Climate Models. RCPs defined by the Intergovernmental Panel on Climate Change entail a certain combination of the underlying driving forces behind climate and land use/land cover changes, which leads to different anthropogenic Greenhouse Gases concentration trajectories. Projections of global and regional climate change should also take into account relevant sources of uncertainty and stakeholders' risk attitudes when defining climate polices. The goal of this article is to improve regional climate projections by their prioritized aggregation through the ordered weighted averaging (OWA) operator. The aggregated projection is achieved by considering the similarity of the projections obtained by combining different GCMs, RCPs, and downscaling techniques. Relative weights of different projections to be aggregated by the OWA operator are obtained by regular increasing monotone fuzzy quantifiers, which enables modeling the stakeholders' risk attitudes. The methodology provides a robust decision-making tool to evaluate performance of future climate projections and to design sustainable policies under uncertainty and risk tolerance, which has been successfully applied to a real-case study.
Meng, Q, Catchpoole, D, Skillicorn, D & Kennedy, PJ 2017, 'DBNorm: normalizing high-density oligonucleotide microarray data based on distributions', BMC Bioinformatics, vol. 18, no. 1.
View/Download from: Publisher's site
View description>>
© 2017 The Author(s). Background: Data from patients with rare diseases is often produced using different platforms and probe sets because patients are widely distributed in space and time. Aggregating such data requires a method of normalization that makes patient records comparable. Results: This paper proposed DBNorm, implemented as an R package, is an algorithm that normalizes arbitrarily distributed data to a common, comparable form. Specifically, DBNorm merges data distributions by fitting functions to each of them, and using the probability of each element drawn from the fitted distribution to merge it into a global distribution. DBNorm contains state-of-the-art fitting functions including Polynomial, Fourier and Gaussian distributions, and also allows users to define their own fitting functions if required. Conclusions: The performance of DBNorm is compared with z-score, average difference, quantile normalization and ComBat on a set of datasets, including several that are publically available. The performance of these normalization methods are compared using statistics, visualization, and classification when class labels are known based on a number of self-generated and public microarray datasets. The experimental results show that DBNorm achieves better normalization results than conventional methods. Finally, the approach has the potential to be applicable outside bioinformatics analysis.
Meng, X, Cao, L, Zhang, X & Shao, J 2017, 'Top-k coupled keyword recommendation for relational keyword queries', Knowledge and Information Systems, vol. 50, no. 3, pp. 883-916.
View/Download from: Publisher's site
View description>>
© 2016 Springer-Verlag LondonProviding top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.
Meo, PD, Musial-Gabrys, K, Rosaci, D, Sarnè, GML & Aroyo, L 2017, 'Using Centrality Measures to Predict Helpfulness-Based Reputation in Trust Networks', ACM Transactions on Internet Technology, vol. 17, no. 1, pp. 1-20.
View/Download from: Publisher's site
View description>>
In collaborative Web-based platforms, user reputation scores are generally computed according to two orthogonal perspectives: (a) helpfulness-based reputation (HBR) scores and (b) centrality-based reputation (CBR) scores. In HBR approaches, the most reputable users are those who post the most helpful reviews according to the opinion of the members of their community. In CBR approaches, a “who-trusts-whom” network—known as a trust network —is available and the most reputable users occupy the most central position in the trust network, according to some definition of centrality. The identification of users featuring large HBR scores is one of the most important research issue in the field of Social Networks, and it is a critical success factor of many Web-based platforms like e-marketplaces, product review Web sites, and question-and-answering systems. Unfortunately, user reviews/ratings are often sparse, and this makes the calculation of HBR scores inaccurate. In contrast, CBR scores are relatively easy to calculate provided that the topology of the trust network is known. In this article, we investigate if CBR scores are effective to predict HBR ones, and, to perform our study, we used real-life datasets extracted from CIAO and Epinions (two product review Web sites) and Wikipedia and applied five popular centrality measures—Degree Centrality, Closeness Centrality, Betweenness Centrality, PageRank and Eigenvector Centrality—to calculate CBR scores. Our analysis provides a positive answer to our research question: CBR scores allow for predicting HBR ones and Eigenvector Centrality was found to be the most important predictor. Our findings prove that we can leverage trust relationships to spot those users producing the most helpful reviews for the whole community.
Merigó, JM & Yang, J 2017, 'Accounting Research: A Bibliometric Analysis', Australian Accounting Review, vol. 27, no. 1, pp. 71-100.
View/Download from: Publisher's site
View description>>
Bibliometrics is a fundamental field of information science that studies bibliographic material quantitatively. It is very useful for organising available knowledge within a specific scientific discipline. This study presents a bibliometric overview of accounting research using the Web of Science database, identifying the most relevant research in the field classified by papers, authors, journals, institutions and countries. The results show that the most influential journals are: The Journal of Accounting and Economics, Journal of Accounting Research, The Accounting Review and Accounting, Organizations and Society. It also shows that US institutions are the most influential worldwide. However, it is important to note that some very good research in this area, including a small number of papers and citations, may not show up in this study due to the specific characteristics of different subtopics.
Merigó, JM & Yang, J-B 2017, 'A bibliometric analysis of operations research and management science', Omega, vol. 73, pp. 37-48.
View/Download from: Publisher's site
View description>>
© 2016 Bibliometric analysis is the quantitative study of bibliographic material. It provides a general picture of a research field that can be classified by papers, authors and journals. This paper presents a bibliometric overview of research published in operations research and management science in recent decades. The main objective of this study is to identify some of the most relevant research in this field and some of the newest trends according to the information found in the Web of Science database. Several classifications are made, including an analysis of the most influential journals, the two hundred most cited papers of all time and the most productive and influential authors. The results obtained are in accordance with the common wisdom, although some variations are found.
Merigó, JM, Blanco-Mesa, F, Gil-Lafuente, AM & Yager, RR 2017, 'Thirty Years of theInternational Journal of Intelligent Systems: A Bibliometric Review', International Journal of Intelligent Systems, vol. 32, no. 5, pp. 526-554.
View/Download from: Publisher's site
View description>>
© 2016 Wiley Periodicals, Inc. The International Journal of Intelligent Systems was created in 1986. Today, the journal is 30 years old. To celebrate this anniversary, this study develops a bibliometric review of all of the papers published in the journal between 1986 and 2015. The results are largely based on the Web of Science Core Collection, which classifies leading bibliographic material by using several indicators including total number of publications and citations, the h-index, cites per paper, and citing articles. The work also uses the VOS viewer software for visualizing the main results through bibliographic coupling and co-citation. The results show a general overview of leading trends that have influenced the journal in terms of highly cited papers, authors, journals, universities and countries.
Merigó, JM, Linares-Mustarós, S & Ferrer-Comalat, JC 2017, 'Guest editorial', Kybernetes, vol. 46, no. 1, pp. 2-7.
View/Download from: Publisher's site
Merigó, JM, Palacios-Marqués, D & Soto-Acosta, P 2017, 'Distance measures, weighted averages, OWA operators and Bonferroni means', Applied Soft Computing, vol. 50, pp. 356-366.
View/Download from: Publisher's site
View description>>
© 2016 Elsevier B.V. The ordered weighted average (OWA) is an aggregation operator that provides a parameterized family of operators between the minimum and the maximum. This paper presents the OWA weighted average distance operator. The main advantage of this new approach is that it unifies the weighted Hamming distance and the OWA distance in the same formulation and considering the degree of importance that each concept has in the analysis. This operator includes a wide range of particular cases from the minimum to the maximum distance. Some further generalizations are also developed with generalized and quasi-arithmetic means. The use of Bonferroni means under this framework is also studied. The paper ends with an application of the new approach in a group decision making problem with Dempster-Shafer belief structure regarding the selection of strategies.
Mirtalaie, MA, Hussain, OK, Chang, E & Hussain, FK 2017, 'A decision support framework for identifying novel ideas in new product development from cross-domain analysis', Information Systems, vol. 69, pp. 59-80.
View/Download from: Publisher's site
View description>>
In current competitive times, product manufacturers need not only to retain their existing customer base, but also to increase their market share. One way they can achieve this is by generating new ideas and developing novel products with new features. As highlighted in the literature, in generating new ideas to develop novel and innovative products, it is important that product designers satisfy the needs of both current customers and new customers. However, despite the large number of existing studies that identify novel features in the ideation phase, product designers do not have a systematic framework that utilises additional information relating to products from either far-field or related domains to generate such new ideas in the ideation phase. This paper presents our proposed framework FEATURE which provides just such a systemic framework for product designers in the ideation phase of new product development. FEATURE has three phases. The first phase identifies and recommends to the product designers novel features that can be added to the next version of a reference product. In order to incorporate the customer's voice into the ideation phase, the second phase ascertains the popularity of the proposed features by using social media. The third phase ranks the proposed features based on the designer's decision criteria to select those that should be considered further in the next phases of new product development. We explain the importance of each phase of FEATURE and show the working of its first module in detail.
Musial, K, Bródka, P & De Meo, P 2017, 'Analysis and Applications of Complex Social Networks', Complexity, vol. 2017, pp. 1-2.
View/Download from: Publisher's site
Pan, S, Wu, J, Zhu, X, Long, G & Zhang, C 2017, 'Boosting for graph classification with universum', Knowledge and Information Systems, vol. 50, no. 1, pp. 53-77.
View/Download from: Publisher's site
View description>>
© 2016, Springer-Verlag London. Recent years have witnessed extensive studies of graph classification due to the rapid increase in applications involving structural data and complex relationships. To support graph classification, all existing methods require that training graphs should be relevant (or belong) to the target class, but cannot integrate graphs irrelevant to the class of interest into the learning process. In this paper, we study a new universum graph classification framework which leverages additional “non-example” graphs to help improve the graph classification accuracy. We argue that although universum graphs do not belong to the target class, they may contain meaningful structure patterns to help enrich the feature space for graph representation and classification. To support universum graph classification, we propose a mathematical programming algorithm, ugBoost, which integrates discriminative subgraph selection and margin maximization into a unified framework to fully exploit the universum. Because informative subgraph exploration in a universum setting requires the search of a large space, we derive an upper bound discriminative score for each subgraph and employ a branch-and-bound scheme to prune the search space. By using the explored subgraphs, our graph classification model intends to maximize the margin between positive and negative graphs and minimize the loss on the universum graph examples simultaneously. The subgraph exploration and the learning are integrated and performed iteratively so that each can be beneficial to the other. Experimental results and comparisons on real-world dataset demonstrate the performance of our algorithm.
Peng, H, Lan, C, Liu, Y, Liu, T, Blumenstein, M & Li, J 2017, 'Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes', Oncotarget, vol. 8, no. 45, pp. 78901-78916.
View/Download from: Publisher's site
View description>>
© Peng et al. Disease-related protein-coding genes have been widely studied, but diseaserelated non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the stateof- the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long noncoding genes.
Peng, H, Lan, C, Zheng, Y, Hutvagner, G, Tao, D & Li, J 2017, 'Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite', BMC Bioinformatics, vol. 18, no. 1, pp. 1-17.
View/Download from: Publisher's site
View description>>
© 2017 The Author(s). Background: MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics. Methods and results: We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs. Conclusions: With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.
Peris-Ortiz, M, Gómez, JA, Merigó, JM & Rueda-Armengot, C 2017, 'Preface', Innovation, Technology and Knowledge Management, pp. ix-xiii.
Qiao, M, Xu, RYD, Bian, W & Tao, D 2017, 'Fast Sampling for Time-Varying Determinantal Point Processes', ACM Transactions on Knowledge Discovery from Data, vol. 11, no. 1, pp. 1-24.
View/Download from: Publisher's site
View description>>
Determinantal Point Processes (DPPs) are stochastic models which assign each subset of a base dataset with a probability proportional to the subset’s degree of diversity. It has been shown that DPPs are particularly appropriate in data subset selection and summarization (e.g., news display, video summarizations). DPPs prefer diverse subsets while other conventional models cannot offer. However, DPPs inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing is required. To address this limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data (e.g., news corpora updating, communication network evolving), where the data changes between time stamps are relatively small. The proposed algorithm is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both a real-world news dataset and the Enron Corpus confirm the efficiency of the proposed algorithm.
Romeo, M, Yepes-Baldó, M, Boria-Reverter, S & Merigó, JM 2017, 'Twenty-five years of research on work and organizational psychology: A bibliometric perspective', Anuario de Psicología, vol. 47, no. 1, pp. 32-44.
View/Download from: Publisher's site
View description>>
© 2017 Universitat de Barcelona The research aims to analyze the scientific productivity in the field of work/organizational psychology (WOP) in the last 25 years. We focus our analysis on the most influential journals and articles, generally and for 5-year periods, as well as structures of co-citation among the highest quality journals based on their h-index. We found that a high percentage of papers published each year receive between 5 and 10 cites. Secondly, we observe an exponential increase in the number of papers published, citations, and h-index. Additionally, the number of self-citations significantly increases in the last 5 years. In this sense, we consider that the most recent papers need more time to increase their level of citation and, subsequently, to correct the bias on self-citation. This research shows the status of research in the field of work/organizational psychology, analyzing the scientific journals and papers published in the Web of Science.
Schwarzer, A, Emmrich, S, Schmidt, F, Beck, D, Ng, M, Reimer, C, Adams, FF, Grasedieck, S, Witte, D, Käbler, S, Wong, JWH, Shah, A, Huang, Y, Jammal, R, Maroz, A, Jongen-Lavrencic, M, Schambach, A, Kuchenbauer, F, Pimanda, JE, Reinhardt, D, Heckl, D & Klusmann, J-H 2017, 'The non-coding RNA landscape of human hematopoiesis and leukemia', Nature Communications, vol. 8, no. 1, pp. 1-17.
View/Download from: Publisher's site
View description>>
AbstractNon-coding RNAs have emerged as crucial regulators of gene expression and cell fate decisions. However, their expression patterns and regulatory functions during normal and malignant human hematopoiesis are incompletely understood. Here we present a comprehensive resource defining the non-coding RNA landscape of the human hematopoietic system. Based on highly specific non-coding RNA expression portraits per blood cell population, we identify unique fingerprint non-coding RNAs—such as LINC00173 in granulocytes—and assign these to critical regulatory circuits involved in blood homeostasis. Following the incorporation of acute myeloid leukemia samples into the landscape, we further uncover prognostically relevant non-coding RNA stem cell signatures shared between acute myeloid leukemia blasts and healthy hematopoietic stem cells. Our findings highlight the importance of the non-coding transcriptome in the formation and maintenance of the human blood hierarchy.
Shen, F, Yang, Y, Liu, L, Liu, W, Tao, D & Shen, HT 2017, 'Asymmetric Binary Coding for Image Search', IEEE Transactions on Multimedia, vol. 19, no. 9, pp. 2022-2032.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Learning to hash has attracted broad research interests in recent computer vision and machine learning studies, due to its ability to accomplish efficient approximate nearest neighbor search. However, the closely related task, maximum inner product search (MIPS), has rarely been studied in this literature. To facilitate the MIPS study, in this paper, we introduce a general binary coding framework based on asymmetric hash functions, named asymmetric inner-product binary coding (AIBC). In particular, AIBC learns two different hash functions, which can reveal the inner products between original data vectors by the generated binary vectors. Although conceptually simple, the associated optimization is very challenging due to the highly nonsmooth nature of the objective that involves sign functions. We tackle the nonsmooth optimization in an alternating manner, by which each single coding function is optimized in an efficient discrete manner. We also simplify the objective by discarding the quadratic regularization term which significantly boosts the learning efficiency. Both problems are optimized in an effective discrete way without continuous relaxations, which produces high-quality hash codes. In addition, we extend the AIBC approach to the supervised hashing scenario, where the inner products of learned binary codes are forced to fit the supervised similarities. Extensive experiments on several benchmark image retrieval databases validate the superiority of the AIBC approaches over many recently proposed hashing algorithms.
Unanue, IJ, Borzeshi, EZ & Piccardi, M 2017, 'Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition', Journal of Biomedical Informatics, vol. 76, no. December 2017, pp. 102-109.
View/Download from: Publisher's site
View description>>
Background. Previous state-of-the-art systems on Drug Name Recognition (DNR)and Clinical Concept Extraction (CCE) have focused on a combination of text'feature engineering' and conventional machine learning algorithms such asconditional random fields and support vector machines. However, developing goodfeatures is inherently heavily time-consuming. Conversely, more modern machinelearning approaches such as recurrent neural networks (RNNs) have provedcapable of automatically learning effective features from either randomassignments or automated word 'embeddings'. Objectives. (i) To create a highlyaccurate DNR and CCE system that avoids conventional, time-consuming featureengineering. (ii) To create richer, more specialized word embeddings by usinghealth domain datasets such as MIMIC-III. (iii) To evaluate our systems overthree contemporary datasets. Methods. Two deep learning methods, namely theBidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF modelis set as the baseline to compare the deep learning systems to a traditionalmachine learning approach. The same features are used for all the models.Results. We have obtained the best results with the Bidirectional LSTM-CRFmodel, which has outperformed all previously proposed systems. The specializedembeddings have helped to cover unusual words in DDI-DrugBank and DDI-MedLine,but not in the 2010 i2b2/VA IRB Revision dataset. Conclusion. We present astate-of-the-art system for DNR and CCE. Automated word embeddings has allowedus to avoid costly feature engineering and achieve higher accuracy.Nevertheless, the embeddings need to be retrained over datasets that areadequate for the domain, in order to adequately cover the domain-specificvocabulary.
Valenzuela Fernández, L, Merigó, JM & Nicolas, C 2017, 'Universidades influyentes en investigación sobre orientación al mercado. Una visión general entre 1990 y 2014', Estudios Gerenciales, vol. 33, no. 144, pp. 221-227.
View/Download from: Publisher's site
View description>>
El objetivo de este estudio es identificar las universidades más productivas e influyentes para la comunidad cientÃfica sobre el tópico de orientación al mercado. Lo anterior se realiza principalmente a través de indicadores bibliométricos —como el Ãndice h— y la relación total citas/total artÃculos para el periodo 1990-2014, a partir de la información encontrada en Web of  Science. Dentro de los hallazgos se destaca el interés de la comunidad cientÃfica en esta temática, lo que se ve reflejado en el aumento considerado en la contribución que se ha generado durante los últimos 25 años. Además, se determina un ranking de las 30 universidades más influyentes, junto con un ranking que relaciona universidades y revistas con mayor influencia en temas de orientación al mercado.
Valenzuela, LM, Merigó, JM, Johnston, WJ, Nicolas, C & Jaramillo, JF 2017, 'Thirty years of the Journal of Business & Industrial Marketing: a bibliometric analysis', Journal of Business & Industrial Marketing, vol. 32, no. 1, pp. 1-17.
View/Download from: Publisher's site
View description>>
PurposeThe aim of this study is to reveal the contribution that Journal of Business & Industrial Marketing has to scientific research and its most influential thematic work in B-to-B since its beginning in 1986 until 2015, in commemoration of the 30th anniversary.Design/methodology/approachThe paper begins with a qualitative introduction: the emergence of the magazine, its origins, editorial and positioning. Subsequently, it is based on bibliometric methodologies to develop quantitative analysis. The distribution of annual publications is analyzed, the most cited papers, the keywords that are mostly used, the influence on the publishing industry and authors, universities and the countries that have the most publications.FindingsThe predominant role of the USA at all levels is highlighted. It also highlights the presence (given its size and population) of the countries of Northern Europe. There is great interest in appreciating the evolution of the number of publications that are always increasing which demonstrates the growing and sustained interest in these types of articles, with certain times of retreat (often coincide with economic crisis).Research limitations/implicationsThe Scopus database gives one unit to each author, university or country involved in the paper, without distinguishing whether it was one or more authors in the study. Therefore, this may bring some deviations in the analysis. However, the study considers some figures with fractional counting to partially solve these limitations.
Wang, H, Wu, J, Pan, S, Zhang, P & Chen, L 2017, 'Towards large-scale social networks with online diffusion provenance detection', Computer Networks, vol. 114, pp. 154-166.
View/Download from: Publisher's site
View description>>
© 2016 Elsevier B.V. In this paper we study a new problem of online discovering diffusion provenances in large networks. Existing work on network diffusion provenance identification focuses on offline learning where data collected from network detectors are static and a snapshot of the network is available before learning. However, an offline learning model does not meet the need for early warning, real-time awareness, or a real-time response to malicious information spreading in networks. To this end, we propose an online regression model for real-time diffusion provenance identification. Specifically, we first use offline collected network cascades to infer the edge transmission weights, and then use an online l1 non-convex regression model as the identification model. The proposed methods are empirically evaluated on both synthetic and real-world networks. Experimental results demonstrate the effectiveness of the proposed model.
Wang, H, Zhang, P, Zhu, X, Tsang, IW-H, Chen, L, Zhang, C & Wu, X 2017, 'Incremental Subgraph Feature Selection for Graph Classification', IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 128-142.
View/Download from: Publisher's site
View description>>
Graph classification is an important tool for analyzing data with structure dependency, where subgraphs are often used as features for learning. In reality, the dimension of the subgraphs crucially depends on the threshold setting of the frequency support parameter, and the number may become extremely large. As a result, subgraphs may be incrementally discovered to form a feature stream and require the underlying graph classifier to effectively discover representative subgraph features from the subgraph feature stream. In this paper, we propose a primal-dual incremental subgraph feature selection algorithm (ISF) based on a max-margin graph classifier. The ISF algorithm constructs a sequence of solutions that are both primal and dual feasible. Each primal-dual pair shrinks the dual gap and renders a better solution for the optimal subgraph feature set. To avoid bias of ISF algorithm on short-pattern subgraph features, we present a new incremental subgraph join feature selection algorithm (ISJF) by forcing graph classifiers to join short-pattern subgraphs and generate long-pattern subgraph features. We evaluate the performance of the proposed models on both synthetic networks and real-world social network data sets. Experimental results demonstrate the effectiveness of the proposed methods.
Wang, J, Merigó, JM & Jin, L 2017, 'S-H OWA Operators with Moment Measure', International Journal of Intelligent Systems, vol. 32, no. 1, pp. 51-66.
View/Download from: Publisher's site
View description>>
© 2016 Wiley Periodicals, Inc. Step-like or Hurwicz-like ordered weighted averaging (OWA) (S-H OWA) operators connect two fundamental OWA operators, step OWA operators and Hurwicz OWA operators. S-H OWA operators also generalize them and some other well-know OWA operators such as median and centered OWA operators. Generally, there are two types of determination methods for S-H OWA operators: One is from the motivation of some existed mathematical results; the other is by a set of “nonstrict” definitions and often via some intermediate elements. For the second type, in this study we define two sets of strict definitions for Hurwitz/step degree, which are more effective and necessary for theoretical studies and practical usages. Both sets of definitions are useful in different situations. In addition, they are based on the same concept moment of OWA operators proposed in this study, and therefore they become identical in limit forms. However, the Hurwicz/step degree (HD/SD) puts more concerns on its numerical measure and physical meaning, whereas the relative Hurwicz/step degree (rHD/rSD), still being accurate numerically, sometimes is more reasonable intuitively and has larger potential in further studies and practical applications.
Wang, JJJ, Bartlett, M & Ryan, L 2017, 'Non‐ignorable missingness in logistic regression', Statistics in Medicine, vol. 36, no. 19, pp. 3005-3021.
View/Download from: Publisher's site
View description>>
Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non‐ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non‐identifiable under non‐ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis.Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow‐up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality‐of‐life. Copyright © 2017 John Wiley & Sons, Ltd.
Wang, JJJ, Bartlett, M & Ryan, L 2017, 'On the impact of nonresponse in logistic regression: application to the 45 and Up study', BMC Medical Research Methodology, vol. 17, no. 1, pp. 1-13.
View/Download from: Publisher's site
View description>>
© 2017 The Author(s). Background: In longitudinal studies, nonresponse to follow-up surveys poses a major threat to validity, interpretability and generalisation of results. The problem of nonresponse is further complicated by the possibility that nonresponse may depend on the outcome of interest. We identified sociodemographic, general health and wellbeing characteristics associated with nonresponse to the follow-up questionnaire and assessed the extent and effect of nonresponse on statistical inference in a large-scale population cohort study. Methods: We obtained the data from the baseline and first wave of the follow-up survey of the 45 and Up Study. Of those who were invited to participate in the follow-up survey, 65.2% responded. Logistic regression model was used to identify baseline characteristics associated with follow-up response. A Bayesian selection model approach with sensitivity analysis was implemented to model nonignorable nonresponse. Results: Characteristics associated with a higher likelihood of responding to the follow-up survey include female gender, age categories 55-74, high educational qualification, married/de facto, worked part or partially or fully retired and higher household income. Parameter estimates and conclusions are generally consistent across different assumptions on the missing data mechanism. However, we observed some sensitivity for variables that are strong predictors for both the outcome and nonresponse. Conclusions: Results indicated in the context of the binary outcome under study, nonresponse did not result in substantial bias and did not alter the interpretation of results in general. Conclusions were still largely robust under nonignorable missing data mechanism. Use of a Bayesian selection model is recommended as a useful strategy for assessing potential sensitivity of results to missing data.
Wang, W, Yin, H, Chen, L, Sun, Y, Sadiq, S & Zhou, X 2017, 'ST-SAGE', ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, pp. 1-25.
View/Download from: Publisher's site
View description>>
With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important mobile application, especially when users travel away from home. However, this type of recommendation is very challenging compared to traditional recommender systems. A user may visit only a limited number of spatial items, leading to a very sparse user-item matrix. This matrix becomes even sparser when the user travels to a distant place, as most of the items visited by a user are usually located within a short distance from the user’s home. Moreover, user interests and behavior patterns may vary dramatically across different time and geographical regions. In light of this, we propose ST-SAGE, a spatial-temporal sparse additive generative model for spatial item recommendation in this article. ST-SAGE considers both personal interests of the users and the preferences of the crowd in the target region at the given time by exploiting both the co-occurrence patterns and content of spatial items. To further alleviate the data-sparsity issue, ST-SAGE exploits the geographical correlation by smoothing the crowd’s preferences over a well-designed spatial index structure called the spatial pyramid . To speed up the training process of ST-SAGE, we implement a parallel version of the model inference algorithm on the GraphLab framework. We conduct extensive experiments; the experimental results clearly demonstrate that ST-SAGE outperforms the state-of-the-art recommender systems in terms of recommendation effectiveness, model training efficiency, and online recommendation efficiency.
Wang, Z & Cao, L 2017, 'Coupled Attribute Similarity Learning on Categorical Data for Multi-Label Classification', Journal of Beijing Institute of Technology (English Edition), vol. 26, no. 3, pp. 404-410.
View/Download from: Publisher's site
View description>>
In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data (CASonMLCD). The CASonMLCD method not only computes the correlations between different attributes and multi-label sets using information gain, which can be regarded as the important degree of each attribute in the attribute learning method, but also further analyzes the intra-coupled and inter-coupled interactions between an attribute value pair for different attributes and multiple labels. The paper compared the CASonMLCD method with the OF distance and Jaccard similarity, which is based on the MLKNN algorithm according to 5 common evaluation criteria. The experiment results demonstrated that the CASonMLCD method can mine the similarity relationship more accurately and comprehensively, it can obtain better performance than compared methods.
Wu, J, Pan, S, Zhu, X, Zhang, C & Wu, X 2017, 'Positive and Unlabeled Multi-Graph Learning', IEEE Transactions on Cybernetics, vol. 47, no. 4, pp. 818-829.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. In this paper, we advance graph classification to handle multi-graph learning for complicated objects, where each object is represented as a bag of graphs and the label is only available to each bag but not individual graphs. In addition, when training classifiers, users are only given a handful of positive bags and many unlabeled bags, and the learning objective is to train models to classify previously unseen graph bags with maximum accuracy. To achieve the goal, we propose a positive and unlabeled multi-graph learning (puMGL) framework to first select informative subgraphs to convert graphs into a feature space. To utilize unlabeled bags for learning, puMGL assigns a confidence weight to each bag and dynamically adjusts its weight value to select 'reliable negative bags.' A number of representative graphs, selected from positive bags and identified reliable negative graph bags, form a 'margin graph pool' which serves as the base for deriving subgraph patterns, training graph classifiers, and further updating the bag weight values. A closed-loop iterative process helps discover optimal subgraphs from positive and unlabeled graph bags for learning. Experimental comparisons demonstrate the performance of puMGL for classifying real-world complicated objects.
Wu, Z, Lei, L, Li, G, Huang, H, Zheng, C, Chen, E & Xu, G 2017, 'A topic modeling based approach to novel document automatic summarization', Expert Systems with Applications, vol. 84, pp. 12-23.
View/Download from: Publisher's site
Wu, Z, Zhu, H, Li, G, Cui, Z, Huang, H, Li, J, Chen, E & Xu, G 2017, 'An efficient Wikipedia semantic matching approach to text document classification', Information Sciences, vol. 393, pp. 15-28.
View/Download from: Publisher's site
Xu, Z, Yan, J, Xu, RY & Mei, L 2017, 'Guest Editorial: Visual Multimedia Learning from Big Surveillance Data', Multimedia Tools and Applications, vol. 76, no. 13, pp. 14557-14557.
View/Download from: Publisher's site
Xuan, J, Lu, J, Zhang, G, Xu, RYD & Luo, X 2017, 'A Bayesian nonparametric model for multi-label learning', Machine Learning, vol. 106, no. 11, pp. 1787-1815.
View/Download from: Publisher's site
View description>>
© 2017, The Author(s). Multi-label learning has become a significant learning paradigm in the past few years due to its broad application scenarios and the ever-increasing number of techniques developed by researchers in this area. Among existing state-of-the-art works, generative statistical models are characterized by their good generalization ability and robustness on large number of labels through learning a low-dimensional label embedding. However, one issue of this branch of models is that the number of dimensions needs to be fixed in advance, which is difficult and inappropriate in many real-world settings. In this paper, we propose a Bayesian nonparametric model to resolve this issue. More specifically, we extend a Gamma-negative binomial process to three levels in order to capture the label-instance-feature structure. Furthermore, a mixing strategy for Gamma processes is designed to account for the multiple labels of an instance. The mixed process also leads to a difficulty in model inference, so an efficient Gibbs sampling inference algorithm is then developed to resolve this difficulty. Experiments on several real-world datasets show the performance of the proposed model on multi-label learning tasks, comparing with three state-of-the-art models from the literature.
Xuan, J, Lu, J, Zhang, G, Xu, RYD & Luo, X 2017, 'Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes', IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 7, pp. 1357-1369.
View/Download from: Publisher's site
View description>>
© 2016 IEEE. Traditional relational topic models provide a successful way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, could benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known a priori, which is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model using stochastic processes instead of fixed-dimensional probability distributions in this paper. Specifically, each document is assigned a Gamma process, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different Gamma process from the global Gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'A new web-supervised method for image dataset constructions', Neurocomputing, vol. 236, pp. 23-31.
View/Download from: Publisher's site
View description>>
© 2017 The goal of this work is to automatically collect a large number of highly relevant natural images from Internet for given queries. A novel automatic image dataset construction framework is proposed by employing multiple query expansions. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic descriptions, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN) based methods. To evaluate the performance of our proposed method for image dataset construction, we build an image dataset with 10 categories. We then run object detections on our image dataset with three other image datasets which were constructed by weak supervised, web supervised and full supervised learning, the experimental results indicated the effectiveness of our method is superior to weak supervised and web supervised state-of-the-art methods. In addition, we do a cross-dataset classification to evaluate the performance of our dataset with two publically available manual labelled dataset STL-10 and CIFAR-10.
Yin, H, Wang, W, Wang, H, Chen, L & Zhou, X 2017, 'Spatial-Aware Hierarchical Collaborative Deep Learning for POI Recommendation', IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 11, pp. 2537-2551.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Point-of-interest (POI) recommendation has become an important way to help people discover attractive and interesting places, especially when they travel out of town. However, the extreme sparsity of user-POI matrix and cold-start issues severely hinder the performance of collaborative filtering-based methods. Moreover, user preferences may vary dramatically with respect to the geographical regions due to different urban compositions and cultures. To address these challenges, we stand on recent advances in deep learning and propose a Spatial-Aware Hierarchical Collaborative Deep Learning model (SH-CDL). The model jointly performs deep representation learning for POIs from heterogeneous features and hierarchically additive representation learning for spatial-aware personal preferences. To combat data sparsity in spatial-aware user preference modeling, both the collective preferences of the public in a given target region and the personal preferences of the user in adjacent regions are exploited in the form of social regularization and spatial smoothing. To deal with the multimodal heterogeneous features of the POIs, we introduce a late feature fusion strategy into our SH-CDL model. The extensive experimental analysis shows that our proposed model outperforms the state-of-the-art recommendation models, especially in out-of-town and cold-start recommendation scenarios.
Yusoff, B, Merigó, JM & Ceballos, D 2017, 'Owa-based aggregation operations in multi-expert mcdm model', Economic Computation and Economic Cybernetics Studies and Research, vol. 51, no. 2, pp. 211-230.
View description>>
This paper presents an analysis of multi-expert multi-criteria decision making (ME-MCDM) model based on the ordered weighted averaging (OWA) operators. Two methods of modeling the majority opinion are studied as to aggregate the experts’ judgments, in which based on the induced OWA operators. Then, an overview of OWA with the inclusion of different degrees of importance is provided for aggregating the criteria. An alternative OWA operator with a new weighting method is proposed which termed as alternative OWAWA (AOWAWA) operator. Some extensions of ME-MCDM model with respect to two-stage aggregation processes are developed based on the classical and alternative schemes. A comparison of results of different decision schemes then is conducted. Moreover, with respect to the alternative scheme, a further comparison is given for different techniques in integrating the degrees of importance. A numerical example in the selection of investment strategy is used as to exemplify the model and for the analysis purpose.
Zeng, S, Merigó, JM, Palacios-Marqués, D, Jin, H & Gu, F 2017, 'Intuitionistic fuzzy induced ordered weighted averaging distance operator and its application to decision making', Journal of Intelligent & Fuzzy Systems, vol. 32, no. 1, pp. 11-22.
View/Download from: Publisher's site
View description>>
© 2017 - IOS Press and the authors. In this paper, we develop a new method for intuitionistic fuzzy decision making problems with induced aggregation operators and distance measures. Firstly, we introduce the intuitionistic fuzzy induced ordered weighted averaging distance (IFIOWAD) operator. It is an extension of the ordered weighted averaging (OWA) operator that uses the main characteristics of the induced OWA (IOWA), the distance measures and uncertain information represented by intuitionistic fuzzy numbers. The main advantage of this operator is that it is able to consider complex attitudinal characters of the decision-maker by using order-inducing variables in the aggregation of the distance measures. We further generalize the IFIOWAD by using weighted average. The result is the intuitionistic fuzzy induced ordered weighted averaging weighted average distance (IFIOWAWAD) operator. Finally, a practical example about the selection of investments is provided to illustrate the developed intuitionistic fuzzy aggregation operators.
Zhai, T, Gao, Y, Wang, H & Cao, L 2017, 'Classification of high-dimensional evolving data streams via a resource-efficient online ensemble', Data Mining and Knowledge Discovery, vol. 31, no. 5, pp. 1242-1265.
View/Download from: Publisher's site
View description>>
© 2017, The Author(s). A novel online ensemble strategy, ensemble BPegasos (EBPegasos), is proposed to solve the problems simultaneously caused by concept drifting and the curse of dimensionality in classifying high-dimensional evolving data streams, which has not been addressed in the literature. First, EBPegasos uses BPegasos, an online kernelized SVM-based algorithm, as the component classifier to address the scalability and sparsity of high-dimensional data. Second, EBPegasos takes full advantage of the characteristics of BPegasos to cope with various types of concept drifts. Specifically, EBPegasos constructs diverse component classifiers by controlling the budget size of BPegasos; it also equips each component with a drift detector to monitor and evaluate its performance, and modifies the ensemble structure only when large performance degradation occurs. Such conditional structural modification strategy makes EBPegasos strike a good balance between exploiting and forgetting old knowledge. Lastly, we first prove experimentally that EBPegasos is more effective and resource-efficient than the tree ensembles on high-dimensional data. Then comprehensive experiments on synthetic and real-life datasets also show that EBPegasos can cope with various types of concept drifts significantly better than the state-of-the-art ensemble frameworks when all ensembles use BPegasos as the base learner.
Zhang, Q, Wu, J, ZHANG, P, Long, G & Zhang, C 2017, 'Collective Hyping Detection System for Identifying Online Spam Activities', IEEE Intelligent Systems, vol. 32, no. 5, pp. 1-1.
View/Download from: Publisher's site
View description>>
© 2001-2011 IEEE. Although existing antispam strategies detect traditional spam activities effectively, evolving spam schemes can successfully cheat conventional testing by buying the comments that are written by genuine users and sold by specific web markets. Such spam activities turn into a kind of advertising campaign among business owners to maintain their rank in top positions. This article proposes a new collaborative marketing hyping detection solution that aims to identify spam comments generated by the Spam Reviewer Cloud and detect products that adopt an evolving spam strategy for promotion. The authors propose an unsupervised learning model that combines heterogeneous product review networks in an attempt to discover collective hyping activities. Their experiments validate the existence of the collaborative marketing hyping activities on a real-life ecommerce platform and demonstrate that their model can effectively and accurately identify these advanced spam activities.
Zhao, Y, Di, H, Zhang, J, Lu, Y, Lv, F & Li, Y 2017, 'Region-based Mixture Models for human action recognition in low-resolution videos', Neurocomputing, vol. 247, pp. 1-15.
View/Download from: Publisher's site
View description>>
© 2017 State-of-the-art performance in human action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, optical flow algorithms are far from perfect in low-resolution (LR) videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points (SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM encodes the spatial layout of features without any needs of body parts segmentation. Experimental results show that the approach is effective and, more importantly, the approach is more general for LR recognition tasks.
Zhou, X, Chen, L, Zhang, Y, Qin, D, Cao, L, Huang, G & Wang, C 2017, 'Enhancing online video recommendation using social user interactions', The VLDB Journal, vol. 26, no. 5, pp. 637-656.
View/Download from: Publisher's site
View description>>
© 2017, Springer-Verlag Berlin Heidelberg. The creation of media sharing communities has resulted in the astonishing increase of digital videos, and their wide applications in the domains like online news broadcasting, entertainment and advertisement. The improvement of these applications relies on effective solutions for social user access to videos. This fact has driven the research interest in the recommendation in shared communities. Though effort has been put into social video recommendation, the contextual information on social users has not been well exploited for effective recommendation. Motivated by this, in this paper, we propose a novel approach based on the video content and user information for the recommendation in shared communities. A new solution is developed by allowing batch video recommendation to multiple new users and optimizing the subcommunity extraction. We first propose an effective technique that reduces the subgraph partition cost based on graph decomposition and reconstruction for efficient subcommunity extraction. Then, we design a summarization-based algorithm which groups the clicked videos of multiple unregistered users and simultaneously provide recommendation to each of them. Finally, we present a nontrivial social updates maintenance approach for social data based on user connection summarization. We evaluate the performance of our solution over a large dataset considering different strategies for group video recommendation in sharing communities.
Al-Jubouri, B & Gabrys, B 1970, 'Diversity and Locality in Multi-Component, Multi-Layer Predictive Systems: A Mutual Information Based Approach', ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, International Conference on Advanced Data Mining and Applications (ADMA), Springer International Publishing, Singapore, SINGAPORE, pp. 313-325.
View/Download from: Publisher's site
Alkalbani, AM, Gadhvi, L, Patel, B, Hussain, FK, Ghamry, AM & Hussain, OK 1970, 'Analysing Cloud Services Reviews Using Opining Mining', 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), IEEE, Tamkang Univ, Taipei, TAIWAN, pp. 1124-1129.
View/Download from: Publisher's site
Anaissi, A, Khoa, NLD, Mustapha, S, Alamdari, MM, Braytee, A, Wang, Y & Chen, F 1970, 'Adaptive One-Class Support Vector Machine for Damage Detection in Structural Health Monitoring', Advances in Knowledge Discovery and Data Mining (LNAI), Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Springer International Publishing, Jeju, South Korea, pp. 42-57.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. Machine learning algorithms have been employed extensively in the area of structural health monitoring to compare new measurements with baselines to detect any structural change. One-class support vector machine (OCSVM) with Gaussian kernel function is a promising machine learning method which can learn only from one class data and then classify any new query samples. However, generalization performance of OCSVM is profoundly influenced by its Gaussian model parameter ϭ. This paper proposes a new algorithm named Appropriate Distance to the Enclosing Surface (ADES) for tuning the Gaussian model parameter. The semantic idea of this algorithm is based on inspecting the spatial locations of the edge and interior samples, and their distances to the enclosing surface of OCSVM. The algorithm selects the optimal value of ϭ which generates a hyperplane that is maximally distant from the interior samples but close to the edge samples. The sets of interior and edge samples are identified using a hard margin linear support vector machine. The algorithm was successfully validated using sensing data collected from the Sydney Harbour Bridge, in addition to five public datasets. The designed ADES algorithm is an appropriate choice to identify the optimal value of ϭ for OCSVM especially in high dimensional datasets.
Braytee, A, Liu, W & Kennedy, PJ 1970, 'Supervised context-aware non-negative matrix factorization to handle high-dimensional high-correlated imbalanced biomedical data', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 4512-4519.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Traditional feature selection techniques are used to identify a subset of the most useful features, and consider the rest as unimportant, redundant or noisy. In the presence of highly correlated features, many variable selection methods consider correlated features as redundant and need to be removed. In this paper, a novel supervised feature selection algorithm SCANMF is proposed by jointly integrating correlation analysis and structural analysis of the balanced supervised non-negative matrix factorization (NMF). Furthermore, ℓ2,1-norm minimization constraint is incorporated into the objective function to guarantee sparsity in the feature matrix rows and reduce noisy features. Our algorithm exploits the discriminative information, feature combinations, and the original features in the context of a supervised NMF method which can be beneficial for both classification and interpretation. An efficient iterative algorithm is designed to solve the constrained optimization problem with guaranteed convergence. Finally, a series of extensive experiments are conducted on 8 complex datasets. Promising results using multiple classifiers demonstrate the effectiveness and efficiency of our algorithm over state-of-the-art methods.
Braytee, A, Liu, W, Catchpoole, DR & Kennedy, PJ 1970, 'Multi-Label Feature Selection using Correlation Information', Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM '17: ACM Conference on Information and Knowledge Management, ACM, Singapore, Singapore, pp. 1649-1656.
View/Download from: Publisher's site
View description>>
© 2017 ACM. High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification performance in machine learning. In multi-label learning, to select the discriminative features among multiple labels, several challenges should be considered: interdependent labels, different instances may share different label correlations, correlated features, and missing and .awed labels. This work is part of a project at .e Children's Hospital at Westmead (TB-CHW), Australia to explore the genomics of childhood leukaemia. In this paper, we propose a CMFS (Correlated-and Multi-label Feature Selection method), based on non-negative matrix factorization (NMF) for simultaneously performing feature selection and addressing the aforementioned challenges. Significantly, a major advantage of our research is to exploit the correlation information contained in features, labels and instances to select the relevant features among multiple labels. Furthermore, l2;1-norm regularization is incorporated in the objective function to undertake feature selection by imposing sparsity on the feature matrix rows. We employ CMFS to decompose the data and multi-label matrices into a low-dimensional space. To solve the objective function, an efficient iterative optimization algorithm is proposed with guaranteed convergence. Finally, extensive experiments are conducted on high-dimensional multi-labeled datasets. The experimental results demonstrate that our method significantly outperforms state-of-the-art multi-label feature selection methods.
Butler, A, Xu, G & He, X 1970, 'What comes first the Co-authorship Network or the Citation?', Proceedings of the 4th Multidisciplinary International Social Networks Conference, MISNC '17: 4th Multidisciplinary International Social Networks Conference, ACM, Bangkok, Thailand.
View/Download from: Publisher's site
View description>>
For many decades citation counting has been used as the way to quantify the nebulous notion of research "quality". Indeed, in conversation the terms "research quality", "impact" or "excellence in research" are simply a reference to a scientific document’s citation count. Moreover, the commonly used journal "impact" factors are simply manipulated forms of citation counting. In recent times, the word "impact" has morphed into the new ’mot du jour’. This paper investigates and discusses the association between co-Authorship networks and citations of institutions within an arbitrary, but defined, subject area. The data examined is readily available and the analytical techniques employed are deliberately simple. The simplicity of this analysis is driven by the desire to show that citation counts are not explicitly related to the quality of research but that citations are a result of multifaceted author networks that are inherent in scientific endeavor. The paper presents an argument that the improved ability to conduct effective network analysis and related research shows that the notion of high citations being the same as "research quality" has run its course. Citation performance is more likely to be a result of co-Authorship network dynamics rather than any perceived notion of "quality". Moreover, it is time the folly of citation counting is put to rest and that if one wants know what "impact" one is having that you need look no further than your co-Authorship network and the reach it has across whatever subject area you are interested in. The discussion and results herein highlight that rather than counting citations, the "impact" of research is driven by connections through networks of people. © 2017 ACM.
Cao, L 1970, 'Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management', Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM.
View/Download from: Publisher's site
Chou, K-P, Prasad, M, Li, D-L, Bharill, N, Lin, Y-F, Hussain, F, Lin, C-T & Lin, W-C 1970, 'Automatic Multi-view Action Recognition with Robust Features', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information Processing, Springer International Publishing, Guangzhou, China, pp. 554-563.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. This paper proposes view-invariant features to address multi-view action recognition for different actions performed in different views. The view-invariant features are obtained from clouds of varying temporal scale by extracting holistic features, which are modeled to explicitly take advantage of the global, spatial and temporal distribution of interest points. The proposed view-invariant features are highly discriminative and robust for recognizing actions as the view changes. This paper proposes a mechanism for real world application which can follow the actions of a person in a video based on image sequences and can separate these actions according to given training data. Using the proposed mechanism, the beginning and ending of an action sequence can be labeled automatically without the need for manual setting. It is not necessary in the proposed approach to re-train the system if there are changes in scenario, which means the trained database can be applied in a wide variety of environments. The experiment results show that the proposed approach outperforms existing methods on KTH and WEIZMANN datasets.
Chu, C, Brownlow, J, Meng, Q, Fu, B, Culbert, B, Zhu, M, Xu, G & He, X 1970, 'Combining heterogeneous features for time series prediction', Proceedings of 4th International Conference on Behavioral, Economic, and Socio-Cultural Computing, BESC 2017, 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), IEEE, Krakow, Poland, pp. 1-2.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Time series prediction is a challenging task in reality, and various methods have been proposed for it. However, only the historical series of values are exploited in most of existing methods. Therefore, the predictive models might be not effective in some cases, due to: (1) the historical series of values is not sufficient usually, and (2) features from heterogeneous sources such as the intrinsic features of data samples themselves, which could be very useful, are not take into consideration. To address these issues, we proposed a novel method in this paper which learns the predictive model based on the combination of dynamic features extracted from series of historical values and static features of data samples. To evaluate the performance of our proposed method, we compare it with linear regression and boosted trees, and the experimental results validate our method's superiority.
Diesner, J, Ferrari, E & Xu, G 1970, 'Welcome from the ASONAM 2017 program chairs', Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017, p. xviii.
ElShaweesh, O, Hussain, FK, Lu, H, Al-Hassan, M & Kharazmi, S 1970, 'Personalized Web Search Based on Ontological User Profile in Transportation Domain', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 24th International Conference on Neural Information Processing 2017, Springer International Publishing, Guangzhou, China, pp. 239-248.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. Current conventional search engines deliver similar results to all users for the same query. Because of the variety of user interests and preferences, personalized search engines, based on semantics, hold the promise of providing more efficient information that better reflects users’ needs. The main feature of building a personalized web search is to represent user interests in terms of user profiles. This paper proposes a personalized search approach using an ontology-based user profile. The aim of this approach is to build user profiles based on user browsing behavior and semantic knowledge of specific domain ontology to enhance the quality of the search results. The proposed approach utilizes a re-ranked algorithm to sort the results returned by the search engine to provide a search result that best relates to the user query. This algorithm evaluates the similarity between a user query, the retrieved search results and the ontological concepts. This similarity is computed by taking into account a user’s explicit browsing behavior, semantic knowledge of concepts, and synonyms of term-based vectors extracted from the WordNet API. A set of experiments using a case study from a transport service domain validates the effectiveness of the proposed approach and demonstrates promising results.
Gao, F, Musial, K & Gabrys, B 1970, 'A Community Bridge Boosting Social Network Link Prediction Model', Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ASONAM '17: Advances in Social Networks Analysis and Mining 2017, ACM, pp. 683-689.
View/Download from: Publisher's site
View description>>
Link prediction in social networks is a very challenging research problem. The majority of existing approaches are based on the assumption that a given network evolves following a single phenomenon, e.g.”rich get richer” or”friend of my friend is my friend”. However, dynamics of network dynamic changes over time and different parts of the network evolve in different manner. Because of that, we hypothesise that the prediction accuracy can be improved by providing different treatment to different nodes and links. Building on that assumption, we propose a Community Bridge Boosting Prediction Model (CBBPM) that treats certain bridge nodes differently depending on their structural position. For such bridge nodes their similarity score obtained using traditional link-based prediction methods is boosted. By doing so the importance of these nodes is increased and at the same time ensuring that the CBBPM can be used with any existing link prediction method. Our experimental results show that such bridge node similarity boosting mechanism can improve the accuracy of traditional link prediction methods.
Ghamry, AM, Alkalbani, AM, Tran, V, Tsai, Y-C, Hoang, ML & Hussain, FK 1970, 'Towards a Public Cloud Services Registry', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 18th International Conference Web Information Systems Engineering, Springer International Publishing, Puschino, Russia, pp. 290-295.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. Cloud services registry is a cloud services datadase which contains thousands of records of cloud consumers’ reviews and cloud services, such as Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). The data set is harvested from a web portal called www.serchen.com. Each record holds detail information about the service such as service name, service description, categories, key features, service provider link and review list. Each review contains reviewer name, review date and review content. This work is an extension of our previous work Blue Pages data set [6]. The data set is valuable for future research in cloud service identification, discovery, comparison and selection.
Gu, S, Lu, Y, Zhang, L & Zhang, J 1970, 'RGB-D Tracking Based on Kernelized Correlation Filter with Deep Features', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information ProcessingConference on Neural Information Processingtional Conference on Neural Information Processing, Springer International Publishing, Guangzhou, China, pp. 105-113.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. This paper proposes a new RGB-D tracker which is upon Kernelized Correlation Filter(KCF) with deep features. KCF is a high-speed target tracker. However, the HOG feature used in KCF shows some weaknesses, such as not robust to noise. Therefore, we consider using RGB-D deep features in KCF, which refer to deep features of RGB and depth images and the deep features contain abundant and discriminated information for tracking. The mixture of deep features highly improves the performance of the tracker. Besides, KCF is sensitive to scale variations while depth images benefit for handling this problem. According to the principle of similar triangle, the ratio of scale variation can be observed simply. Tested over Princeton RGB-D Tracking Benchmark, Our RGB-D tracker achieves the highest accuracy when no occlusion happens. Meanwhile, we keep the high-speed tracking even if deep features are calculated during tracking and the average speed is 10 FPS.
Guo, J, Yue, B, Xu, G, Yang, Z & Wei, J-M 1970, 'An Enhanced Convolutional Neural Network Model for Answer Selection', Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, the 26th International Conference, ACM Press, Perth, Western Australia, pp. 789-790.
View/Download from: Publisher's site
View description>>
Answer selection is an important task in question answering (QA) from the Web. To address the intrinsic difficulty in encoding sentences with semantic meanings, we introduce a general framework, i.e., Lexical Semantic Feature based Skip Convolution Neural Network (LSF-SCNN), with several op- timization strategies. The intuitive idea is that the granular representations with more semantic features of sentences are deliberately designed and estimated to capture the similar- ity between question-answer pairwise sentences. The experi- mental results demonstrate the effectiveness of the proposed strategies and our model outperforms the state-of-the-art ones by up to 3.5% on the metrics of MAP and MRR.
Han, Y, Wan, Y, Chen, L, Xu, G & Wu, J 1970, 'Exploiting Geographical Location for Team Formation in Social Coding Sites', PAKDD 2017: Advances in Knowledge Discovery and Data Mining, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International Publishing, Jeju Island, Korea, pp. 499-510.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. Social coding sites (SCSs) such as GitHub and BitBucket are collaborative platforms where developers from different background (e.g., culture, language, location, skills) form a team to contribute to a shared project collaboratively. One essential task of such collaborative development is how to form a optimal team where each member makes his/her greatest contribution, which may have a great effect on the efficiency of collaboration. To the best of knowledge, all existing related works model the team formation problem as minimizing the communication cost among developers or taking the workload of individuals into account, ignoring the impact of geographical location of each developer. In this paper, we aims to exploit the geographical proximity factor to improve the performance of team formation in social coding sites. Specifically, we incorporate the communication cost and geographical proximity into a unified objective function and propose a genetic algorithm to optimize it. Comprehensive experiments on a real-world dataset (e.g., GitHub) demonstrate the performance of the proposed model with the comparison of some state-of-the-art ones.
Hu, L, Cao, L, Wang, S, Xu, G, Cao, J & Gu, Z 1970, 'Diversifying Personalized Recommendation with User-session Context', Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Twenty-Sixth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp. 1858-1864.
View/Download from: Publisher's site
View description>>
Recommender systems (RS) have become an integral part of our daily life. However, most current RS often repeatedly recommend items to users with similar profiles. We argue that recommendation should be diversified by leveraging session contexts with personalized user profiles. For this, current session-based RS (SBRS) often assume a rigidly ordered sequence over data which does not fit in many real-world cases. Moreover, personalization is often omitted in current SBRS. Accordingly, a personalized SBRS over relaxedly ordered user-session contexts is more pragmatic. In doing so, deep-structured models tend to be too complex to serve for online SBRS owing to the large number of users and items. Therefore, we design an efficient SBRS with shallow wide-in-wide-out networks, inspired by the successful experience in modern language modelings. The experiments on a real-world e-commerce dataset show the superiority of our model over the state-of-the-art methods.
Hussein, F & Piccardi, M 1970, 'Minimum-Risk Structured Learning of Video Summarization', 2017 IEEE International Symposium on Multimedia (ISM), 2017 IEEE International Symposium on Multimedia (ISM), IEEE, Taichung, Taiwan.
View/Download from: Publisher's site
View description>>
Video summarization is an important multimedia task for applications such as video indexing and retrieval, video surveillance, human-computer interaction and video "storyboarding". In this paper, we present a new approach for automatic summarization of video collections that leverages a structured minimum-risk classifier and efficient submodular inference. To test the accuracy of the predicted summaries we utilize a recently-proposed measure (V-JAUNE) that considers both the content and frame order of the original video. Qualitative and quantitative tests over two action video datasets - the ACE and the MSR DailyActivity3D datasets - show that the proposed approach delivers more accurate summaries than the compared minimum-risk and syntactic approaches.
Jian, S, Cao, L, Pang, G, Lu, K & Gao, H 1970, 'Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning', Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Twenty-Sixth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp. 1937-1943.
View/Download from: Publisher's site
View description>>
Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.
Jiang, Z, Huynh, DQ, Zhang, J & Wu, Q 1970, 'Part-Based Data Association for Visual Tracking', 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Sydney, NSW, Australia, pp. 1-8.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. We present a method that integrates a part-based sparse appearance model in a Bayesian inference framework for tracking targets in video sequences. We formulate the sparse appearance model as a set of smoothed colour histograms corresponding to the object windows detected by the Deformable Part Model (DPM) detector. The data association of each body part between frames is solved based on the position constraint, appearance coherence, and motion consistency. To deal with missing and noisy observations, the part detection window in the following frame is also predicted using an interacting multiple model (IMM) tracker. We have tested our tracking method on all the video sequences that involve people in upright poses from the TB-50 and TB-100 benchmark videos datasets. Our experimental results show that our tracking method outperforms six state-of-the-art tracking techniques.
Kuppili Venkata, S, Musial, K, Mahmoud, S & Keppens, J 1970, 'Demonstration: Multi-agent System for Distributed Cache Maintenance', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer International Publishing, pp. 364-368.
View/Download from: Publisher's site
View description>>
© Springer International Publishing AG 2017. Innovations in science and technology is increasing the demand on huge data transfers and hence number of data caches. In this paper, we consider the community caching solution, CommCache, where many groups of users are working together on related projects distributed all over the world. We demonstrate the use of proactive caches for data placement problem with the help of multi-agent coordination.
Kuppili Venkata, S, Musial, K, Mahmoud, S & Keppens, J 1970, 'Multi-Agent System for Distributed Cache Maintenance', ADVANCES IN PRACTICAL APPLICATIONS OF CYBER-PHYSICAL MULTI-AGENT SYSTEMS: THE PAAMS COLLECTION, PAAMS 2017, 15th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS), Springer International Publishing, Porto, PORTUGAL, pp. 157-169.
View/Download from: Publisher's site
Kusakunniran, W, Qiang Wu, Ritthipravad, P & Zhang, J 1970, 'Three-stages hard exudates segmentation in retinal images', 2017 9th International Conference on Information Technology and Electrical Engineering (ICITEE), 2017 9th International Conference on Information Technology and Electrical Engineering (ICITEE), IEEE, Phuket, Thailand, pp. 1-6.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. This paper proposes a three-stages method of hard exudate segmentation in retinal images. The first stage is the pre-processing. The color transfer is applied to make all retinal images to have the same color characteristics, based on statistical analysis. Then, only a yellow channel of each image is used in the further analysis. The second stage is the blob initialization. The blob detection based on color, size, and shape including circularity and convexity is used to identify initial pixels of hard exudates. The detected blobs must not be inside the optic disk. The third stage is the segmentation. The graph cut is iteratively applied on partitions of the image. The fine-tune segmentation in sub-images is necessary because the portion of hard exudates is significantly less than the portion of non-hard exudates. The proposed method is evaluated using the two well-known datasets, namely e-ophtha and DIARETDB1, in both aspects of pixel-level and image-level. Based on the comprehensive comparisons with the existing works, the proposed method is shown to be very promising. In the image-level, it achieves 96% sensitivity and 94% specificity for the e-ophtha dataset, and 96% sensitivity and 98% specificity for the DIARETDB1 dataset.
Kusakunniran, W, Wu, Q & Zhang, J 1970, 'Action Recognition Based on Correlated Codewords of Body Movements', 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Sydney, Australia, pp. 1-8.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Using spatio-temporal features is popular for action recognition. However, existing methods embed these local features into a global representation. Orders and correlations among local motions of each action are missing. This can make it difficult to distinguish closely related actions. This paper proposes a solution to address this challenge by encoding correlations of movements. Space-time interest points are detected in each action video. Then, feature descriptors are extracted from these key points and clustered into different codewords implicitly representing different characteristics of motions. The final representation of each action video is a combination of a bag of words and correlations between codewords. Then, the support vector machine is used as a classification tool. Based on the experimental results, the proposed method achieves a very promising performance and particularly outperforms the other existing methods that rely on spatio-temporal features.
Lian, D, Liu, R, Ge, Y, Zheng, K, Xie, X & Cao, L 1970, 'Discrete Content-aware Matrix Factorization', Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Halifax, CANADA, pp. 325-334.
View/Download from: Publisher's site
View description>>
© 2017 ACM. Precisely recommending relevant items from massive candidates to a large number of users is an indispensable yet computationally expensive task in many online platforms (e.g., Amazon.com and Netfix.com). A promising way is to project users and items into a Hamming space and then recommend items via Hamming distance. However, previous studies didn't address the cold-start challenges and couldn't make the best use of preference data like implicit feedback. To fill this gap, we propose a Discrete Content-aware Matrix Factorization (DCMF) model, 1) to derive compact yet informative binary codes at the presence of user/item content information; 2) to support the classification task based on a local upper bound of logit loss; 3) to introduce an interaction regularization for dealing with the sparsity issue. We further develop an eficient discrete optimization algorithm for parameter learning. Based on extensive experiments on three real-world datasets, we show that DCFM outperforms the state-of-the-arts on both regression and classification tasks.
Liu, B, Chen, L, Zhu, X, Zhang, Y, Zhang, C & Qiu, W 1970, 'Protecting location privacy in spatial crowdsourcing using encrypted data', Advances in Database Technology - EDBT, International Conference on Extending Database Technology, Open Proceedings, Venice, Italy, pp. 478-481.
View/Download from: Publisher's site
View description>>
In spatial crowdsourcing, spatial tasks are outsourced to a set of workers in proximity of the task locations for efficient assignment. It usually requires workers to disclose their locations, which inevitably raises security concerns about the privacy of the workers’ locations. In this paper, we propose a secure SC framework based on encryption, which ensures that workers’ location information is never released to any party, yet the system can still assign tasks to workers situated in proximity of each task’s location. We solve the challenge of assigning tasks based on encrypted data using homomorphic encryption. Moreover, to overcome the efficiency issue, we propose a novel secure indexing technique with a newly devised SKD-tree to index encrypted worker locations. Experiments on real-world data evaluate various aspects of the performance of the proposed SC platform.
Liu, S, Pang, N, Xu, G & Liu, H 1970, 'Collaborative Filtering via Different Preference Structures', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Knowledge Science, Engineering and Management, Springer International Publishing, Melbourne, Australia, pp. 309-321.
View/Download from: Publisher's site
View description>>
© Springer International Publishing AG 2017. Recently, social network websites start to provide third-parity sign-in options via the OAuth 2.0 protocol. For example, users can login Netflix website using their Facebook accounts. By using this service, accounts of the same user are linked together, and so does their information. This fact provides an opportunity of creating more complete profiles of users, leading to improved recommender systems. However, user opinions distributed over different platforms are in different preference structures, such as ratings, rankings, pairwise comparisons, voting, etc. As existing collaborative filtering techniques assume the homogeneity of preference structure, it remains a challenge task of how to learn from different preference structures simultaneously. In this paper, we propose a fuzzy preference relation-based approach to enable collaborative filtering via different preference structures. Experiment results on public datasets demonstrate that our approach can effectively learn from different preference structures, and show strong resistance to noises and biases introduced by cross-structure preference learning.
Liu, W, Chang, X, Chen, L & Yang, Y 1970, 'Early Active Learning with Pairwise Constraint for Person Re-identification', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, Skopje, Macedonia, pp. 103-118.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. Research on person re-identification (re-id) has attached much attention in the machine learning field in recent years. With sufficient labeled training data, supervised re-id algorithm can obtain promising performance. However, producing labeled data for training supervised re-id models is an extremely challenging and time-consuming task because it requires every pair of images across no-overlapping camera views to be labeled. Moreover, in the early stage of experiments, when labor resources are limited, only a small number of data can be labeled. Thus, it is essential to design an effective algorithm to select the most representative samples. This is referred as early active learning or early stage experimental design problem. The pairwise relationship plays a vital role in the re-id problem, but most of the existing early active learning algorithms fail to consider this relationship. To overcome this limitation, we propose a novel and efficient early active learning algorithm with a pairwise constraint for person re-identification in this paper. By introducing the pairwise constraint, the closeness of similar representations of instances is enforced in active learning. This benefits the performance of active learning for re-id. Extensive experimental results on four benchmark datasets confirm the superiority of the proposed algorithm.
Maggs, CA & Musial-Gabrys, K 1970, 'LINNEAN SYSTEMATICS IN THE AGE OF BIG DATA', PHYCOLOGIA, INT PHYCOLOGICAL SOC, pp. 123-124.
Meng, Q, Catchpoole, D, Skillicom, D & Kennedy, PJ 1970, 'Relational autoencoder for feature extraction', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 364-371.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation Autoencoder model considering both data features and their relationships. We also extend it to work with other major autoencoder models including Sparse Autoencoder, Denoising Autoencoder and Variational Autoencoder. The proposed relational autoencoder models are evaluated on a set of benchmark datasets and the experimental results show that considering data relationships can generate more robust features which achieve lower construction loss and then lower error rate in further classification compared to the other variants of autoencoders.
Meng, Q, Wu, J, Ellis, J & Kennedy, PJ 1970, 'Dynamic island model based on spectral clustering in genetic algorithm', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 1724-1731.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. How to maintain relative high diversity is important to avoid premature convergence in population-based optimization methods. Island model is widely considered as a major approach to achieve this because of its flexibility and high efficiency. The model maintains a group of sub-populations on different islands and allows sub-populations to interact with each other via predefined migration policies. However, current island model has some drawbacks. One is that after a certain number of generations, different islands may retain quite similar, converged sub-populations thereby losing diversity and decreasing efficiency. Another drawback is that determining the number of islands to maintain is also very challenging. Meanwhile initializing many sub-populations increases the randomness of island model. To address these issues, we proposed a dynamic island model (DIM-SP) which can force each island to maintain different sub-populations, control the number of islands dynamically and starts with one sub-population. The proposed island model outperforms the other three state-of-the-art island models in three baseline optimization problems including job shop scheduler, travelling salesmen, and quadratic multiple knapsack.
Mihăiţă, AS, Tyler, P, Menon, A, Wen, T, Ou, Y, Cai, C & Chen, F 1970, 'An investigation of positioning accuracy transmitted by connected heavy vehicles using DSRC', Transportation Research Board - 96th Annual Meeting, Washington, D.C..
Mustapha, S, Braytee, A & Ye, L 1970, 'Detection of surface cracking in steel pipes based on vibration data using a multi-class support vector machine classifier', SPIE Proceedings, SPIE Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring, SPIE, Portland, OR.
View/Download from: Publisher's site
Pan, P, Feng, J, Chen, L & Yang, Y 1970, 'Online compressed robust PCA', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, AK, USA, pp. 1041-1048.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. In this work, we consider the problem of robust principal component analysis (RPCA) for streaming noisy data that has been highly compressed. This problem is prominent when one deals with high-dimensional and large-scale data and data compression is necessary. To solve this problem, we propose an online compressed RPCA algorithm to efficiently recover the low-rank components of raw data. Though data compression incurs severe information loss, we provide deep analysis on the proposed algorithm and prove that the low-rank component can be asymptotically recovered under mild conditions. Compared with other recent works on compressed RPCA, our algorithm reduces the memory cost significantly by processing data in an online fashion and reduces the communication cost by accepting sequential compressed data as input.
Pang, G, Cao, L, Chen, L & Liu, H 1970, 'Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection', Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Twenty-Sixth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp. 2585-2591.
View/Download from: Publisher's site
View description>>
This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
Pang, G, Xu, H, Cao, L & Zhao, W 1970, 'Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data', Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM '17: ACM Conference on Information and Knowledge Management, ACM, Singapore, Singapore, pp. 807-816.
View/Download from: Publisher's site
View description>>
© 2017 Association for Computing Machinery. This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.
Qin, M, Jin, D, He, D, Gabrys, B & Musial, K 1970, 'Adaptive Community Detection Incorporating Topology and Content in Social Networks', Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ASONAM '17: Advances in Social Networks Analysis and Mining 2017, ACM, pp. 675-682.
View/Download from: Publisher's site
View description>>
In social network analysis, community detection is a basic step to understand the structure, function and semantics of networks. Some conventional community detection methods may have limited performance because they merely focus on topological structure of networks. In addition to topology, content information is another significant aspect of social networks. Some state-of-the-art methods started to combine these two aspects of information, but they often assume that topology and content share the same characteristics. However, for some examples of social networks, content may mismatch with topological structure. In order to better cope with such situations, we introduce a novel community detection method under the framework of nonnegative matrix factorization (NMF). Our proposed method integrates topology and content of networks, and introduces a novel adaptive parameter for controlling the contribution of content with respect to the identified mismatch degree between the topological and content information. The case study using real social networks show that our new method can simultaneously obtain community partition and the corresponding semantic descriptions. Experiments on both artificial networks and real social networks further indicate that our method outperforms some state-of-the-art methods while exhibiting more robust behaviour when the mismatch topological and content information is observed.
Salvador, MM, Budka, M & Gabrys, B 1970, 'Modelling multi-component predictive systems as petri nets', 15th International Industrial Simulation Conference 2017, ISC 2017, pp. 17-23.
View description>>
Building reliable data-driven predictive systems requires a considerable amount of human effort, especially in the data preparation and cleaning phase. In many application domains, multiple preprocessing steps need to be applied in sequence, constituting a 'workflow' and facilitating reproducibility. The concatenation of such workflow with a predictive model forms a Multi-Component Predictive System (MCPS). Automatic MCPS composition can speed up this process by taking the human out of the loop, at the cost of model transparency (i.e. not being comprehensible by human experts). In this paper, we adopt and suitably re-define the Well-handled with Regular Iterations Work Flow (WRI-WF) Petri nets to represent MCPSs. The use of such WRI-WF nets helps to increase the transparency of MCPSs required in industrial applications and make it possible to automatically verify the composed workflows. We also present our experience and results of applying this representation to model soft sensors in chemical production plants.
Schwarzer, A, Emmrich, S, Beck, D, Schmidt, F, Ng, M, Adams, FF, Witte, D, Kaebler, S, Wong, JWH, Shah, A, Jammal, R, Maroz, A, Reimer, C, Pimanda, JE, Reinhardt, D, Heckl, D & Klusmann, J-H 1970, 'Mapping the Lncrna-Landscape of Human Hematopoiesis and Leukemia Reveals Stem Cell Non-Coding Gene Expression Programs', ANNALS OF HEMATOLOGY, SPRINGER, pp. S51-S51.
Shen, T, Zhou, T, Long, G, Jiang, J, Pan, S & Zhang, C 1970, 'DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding', 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, AAAI Conference on Artificial Intelligence, AAAI, New Orleans, USA, pp. 5446-5455.
View description>>
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widelyused on NLP tasks to capture the long-term and local dependencies,respectively. Attention mechanisms have recently attracted enormous interestdue to their highly parallelizable computation, significantly less trainingtime, and flexibility in modeling dependencies. We propose a novel attentionmechanism in which the attention between elements from input sequence(s) isdirectional and multi-dimensional (i.e., feature-wise). A light-weight neuralnet, 'Directional Self-Attention Network (DiSAN)', is then proposed to learnsentence embedding, based solely on the proposed attention without any RNN/CNNstructure. DiSAN is only composed of a directional self-attention with temporalorder encoded, followed by a multi-dimensional attention that compresses thesequence into a vector representation. Despite its simple form, DiSANoutperforms complicated RNN models on both prediction quality and timeefficiency. It achieves the best test accuracy among all sentence encodingmethods and improves the most recent best result by 1.02% on the StanfordNatural Language Inference (SNLI) dataset, and shows state-of-the-art testaccuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural languageinference (MultiNLI), Sentences Involving Compositional Knowledge (SICK),Customer Review, MPQA, TREC question-type classification and Subjectivity(SUBJ) datasets.
Shi, Y, Li, W, Gao, Y, Cao, L & Shen, D 1970, 'Beyond IID: Learning to Combine Non-IID Metrics for Vision Tasks', Proceedings of the AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), San Francisco, USA, pp. 1524-1531.
View/Download from: Publisher's site
View description>>
Metric learning has been widely employed, especially in various computer vision tasks, with the fundamental assumption that all samples (e.g., regions/superpixels in images/videos) are independent and identically distributed (IID). However, since the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, they are not IID (non-IID for short), which cannot be directly handled by existing methods. Thus, we propose to learn and integrate non-IID metrics (NIME). To incorporate the non-IID spatial/temporal relations, instead of directly using non-IID features and metric learning as previous methods, NIME first builds several non-IID representations on original (non-IID) features by various graph kernel functions, and then automatically learns the metric under the best combination of various non-IID representations. NIME is applied to solve two typical computer vision tasks: interactive image segmentation and histology image identification. The results show that learning and integrating non-IID metrics improves the performance, compared to the IID methods. Moreover, our method achieves results comparable or better than that of the state-of-the-arts.
Song, H, Wu, Q & Dong, H 1970, 'EMI-based Diagnosis to Grounding Grids by Combining Ensemble Empirical Mode Decomposition and ICA', Proceedings of the 8th International Conference on Computer Modeling and Simulation, ICCMS '17: 8th International Conference on Computer Modeling and Simulation, ACM, Canberra, Australia, pp. 196-200.
View/Download from: Publisher's site
View description>>
© 2017 ACM. Grounding grids have been performed an essential role in electric transformer substations. The nondestructive diagnosis system applies transforms the condition of the undergrounding conductors to the surficial induced electric signal in sensing coil. However, the induced signal cannot be used directly to diagnosis due to the raw measurement is a mixture of responses from signal of interest, strong interference and other unknown noises. Therefore the separation of individual signatures from the mixture is posed as a blind source separation (BSS) problem. To extract the induced signal corrupted by noise, the independent component analysis (ICA) method is considered. By combining the EEMD and FastICA, the single-channel signal is decomposed into its ICs. The desired signal is then reconstructed to visualize the break point of the grounding grid. The results show this approach can be used to effectively diagnosis grounding gird in harsh electromagnetic environment.
Sun, Y, Li, L, Xie, Z, Xie, Q, Li, X & Xu, G 1970, 'Co-training an Improved Recurrent Neural Network with Probability Statistic Models for Named Entity Recognition', Database Systems for Advanced Applications (LNCS), International Conference on Database Systems for Advanced Applications, Springer International Publishing, Suzhou, China, pp. 545-555.
View/Download from: Publisher's site
View description>>
Named Entity Recognition (NER) is a subtask of information extraction in Natural Language Processing (NLP) field and thus being wildly studied. Currently Recurrent Neural Network (RNN) has become a popular way to do NER task, but it needs a lot of train data. The lack of labeled train data is one of the hard problems and traditional co-training strategy is a way to alleviate it. In this paper, we consider this situation and focus on doing NER with co-training using RNN and two probability statistic models i.e. Hidden Markov Model (HMM) and Conditional Random Field (CRF). We proposed a modified RNN model by redefining its activation function. Compared to traditional sigmoid function, our new function avoids saturation to some degree and makes its output scope very close to [0, 1], thus improving recognition accuracy. Our experiments are conducted ATIS benchmark. First, supervised learning using those models are compared when using different train data size. The experimental results show that it is not necessary to use whole data, even small part of train data can also get good performance. Then, we compare the results of our modified RNN with original RNN. 0.5% improvement is obtained. Last, we compare the co-training results. HMM and CRF get higher improvement than RNN after co-training. Moreover, using our modified RNN in co-training, their performances are improved further.
Thuy Do, QN, Hussain, FK & Nguyen, BT 1970, 'A fuzzy approach to detect spammer groups', 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, Naples, Italy, pp. 1-6.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Cloud computing has been advancing at an impressive rate in recent years and is likely to increase more and more in the near future. New services are being developed constantly, such as cloud infrastructure, security and platform as a service, to name just a few. Due to the vast pool of available services, review websites have been created to help customers make decisions for their business. This leads to some reviewers taking advantage of these tools to promote the providers that hire them or to discredit competitors. These reviewers can either act individually or cooperate with each other. When reviewers collude to promote one product or defame another, they are called spammer groups. In this paper, we present an approach to identify spammer groups. First, a network-based method is used to identify individual spam reviewers. Then, a fuzzy k-means clustering algorithm is used to find the group that they belong to. A case study that suggests which group an incorrect review belongs to is provided to further understand the new method.
Venkata, SK, Keppens, J & Musial, K 1970, 'Adaptive Caching Using Sub-query Fragmentation for Reduction in Data Transfers from Distributed Databases', ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXV, 25th Annual Conference on Astronomical Data Analysis Software and Systems (ADASS XXV), ASTRONOMICAL SOC PACIFIC, ARC Ctr Excellence All Sky Astrophys (CAASTRO), Sydney, AUSTRALIA, pp. 85-88.
Verma, S, Liu, W, Wang, C & Zhu, L 1970, 'Extracting highly effective features for supervised learning via simultaneous tensor factorization', 31st AAAI Conference on Artificial Intelligence, AAAI 2017, AAAI Conference on Artificial Intelligence, AAAI, San Francisco, USA, pp. 4995-4996.
View description>>
Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems' performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.
Vo, NNY & Xu, G 1970, 'The volatility of Bitcoin returns and its correlation to financial markets', 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), IEEE, Krakow, Poland.
View/Download from: Publisher's site
View description>>
The 2008 financial crisis had scattered incredulity around the globe regarding traditional financial systems, which made investors and non-financial customers turn to other alternative such as digital banking systems. The existence and development of blockchain technology make cryptocurrency in recent years believably become a complete alternative to traditional ones. Bitcoin is the world's first peer-to-peer and decentralized digital cash system initiated by Nakamoto [1]. Though being the most prominent cryptocurrency, Bitcoin has not been a legal trading currency in various countries. Its exchange rate has appeared to be an exceptionally high-risk portfolio with extreme volatility, which requires a more detailed evaluation before making any decision. This paper utilizes knowledge of statistics for financial time series and machine learning to (i) fit the parametric distribution and (ii) model and forecast the volatility of Bitcoin returns, and (iii) analyze its correlation to other financial market indicators. The fitted parametric time series model significantly outperforms other standard models in explaining the stylized facts and statistical variances in the behavior of Bitcoin returns. The model forecast also outperforms some machine learning methodologies, which would benefit policy makers, banks and financial investors in trading activities for both long-term and short-term strategies.
Wang, D, Xu, G & Deng, S 1970, 'Music recommendation via heterogeneous information graph embedding', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, Alaska, USA, pp. 596-603.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Traditional music recommendation techniques suffer from limited performance due to the sparsity of user-music interaction data, which is addressed by incorporating auxiliary information. In this paper, we study the problem of personalized music recommendation that takes different kinds of auxiliary information into consideration. To achieve this goal, a Heterogeneous Information Graph (HIG) is first constructed to encode different kinds of heterogeneous information, including the interactions between users and music pieces, music playing sequences, and the metadata of music pieces. Based on HIG, a Heterogeneous Information Graph Embedding method (HIGE) is proposed to learn the latent low-dimensional representations of music pieces. Then, we further develop a context-aware music recommendation method. Extensive experiments have been conducted on real-world datasets to compare the proposed method with other state-of-the-art recommendation methods. The results demonstrate that the proposed method significantly outperforms those baselines, especially on sparse datasets.
Wang, J, Huang, S, Zhao, L, Ge, J, He, S, Zhang, C & Wang, X 1970, 'High quality 3D reconstruction of indoor environments using RGB-D sensors', 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), IEEE, Siem Reap, Cambodia, pp. 1739-1744.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. High-quality 3D reconstruction of large-scale indoor scene is the key to combine Simultaneous Localization And Mapping (SLAM) with other applications, such as building inspection and construction monitoring. However, the requirement of global consistency brings challenges to both localization and mapping. In particular, significant localization and mapping error can happen when standard SLAM techniques are used when dealing with the area of featureless walls and roofs. This paper proposed a novel framework aiming to reconstruct a high-quality, globally consistent 3D model for indoor environments using only a RGB-D sensor. We first introduce the sparse and dense feature constraints in the local bundle adjustment. Then, the planar constraints are incorporated in the global bundle adjustment. We fuse the point clouds in a truncated signed distance function volume, from which the high quality mesh can be extracted. Our framework leads to a comprehensive 3D scanning solution for indoor scene, enabling high-quality results and potential applications in building information system. The video of 3D models reconstructed by the method proposed in this paper is available at https://youtu.be/DWMP4YfeNeY.
Wang, J, Huang, S, Zhao, L, Ge, J, He, S, Zhang, C & Wang, X 1970, 'High Quality 3D Reconstruction of Indoor Finvironments using RGB-D Sensors', PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), IEEE, CAMBODIA, Siem Reap, pp. 1739-1744.
Wang, S, Hu, L & Cao, L 1970, 'Perceiving the Next Choice with Comprehensive Transaction Embeddings for Online Recommendation', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer International Publishing, Skopje, Macedonia, pp. 285-302.
View/Download from: Publisher's site
View description>>
© 2017, Springer International Publishing AG. To predict customer’s next choice in the context of what he/she has bought in a session is interesting and critical in the transaction domain especially for online shopping. Precise prediction leads to high quality recommendations and thus high benefit. Such kind of recommendation is usually formalized as transaction-based recommender systems (TBRS). Existing TBRS either tend to recommend popular items while ignore infrequent and newly-released ones (e.g., pattern-based RS) or assume a rigid order between items within a transaction (e.g., Markov Chain-based RS) which does not satisfy real-world cases in most time. In this paper, we propose a neural network-based comprehensive transaction embedding model (NTEM) which can effectively perceive the next choice in a transaction context. Specifically, we learn these comprehensive embeddings of both items and their features from relaxed ordered transactions. The relevance between items revealed by the transactions is encoded into such embeddings. With rich information embedded, such embeddings are powerful to predict the next choices given those already bought items. NTEM is a shallow wide-in-wide-out network, which is more efficient than deep networks considering large numbers of items and transactions. Experimental results on real-world datasets show that NTEM outperforms three typical TBRS models FPMC, PRME and GRU4Rec in terms of recommendation accuracy and novelty. Our implementation is available at https://github.com/shoujin88/NTEM-model.
Wang, Z & Piccardi, M 1970, 'Dissimilarity-based action recognition with the pair hidden Markov support vector machine', 2017 IEEE 19th International Workshop on Multimedia Signal Processing, IEEE International Workshop on Multimedia Signal Processing, IEEE, Luton, UK, pp. 1-6.
View/Download from: Publisher's site
View description>>
Human action recognition in video is highly challenging due to the substantial variations in motion performance, recording settings and inter-personal differences. Most current research focuses on the extraction of effective features and the design of suitable classifiers. Conversely, in this paper we tackle this problem by a dissimilarity-based approach where classification is performed in terms of minimum distance from templates. To measure the dissimilarity between any two action instances, we propose leveraging the Pair Hidden Markov Support Vector Machine (PHMM-SSVM) that was recently proposed for tasks of video alignment. The main advantages of PHMM-SSVM are its ability to learn optimal alignment models from training sets of manually-aligned action pairs and provide alignment scores that can be used for action classification. The experimental results over two popular action datasets show that the proposed approach has been capable of achieving an accuracy higher than many existing methods and comparable to a state-of-the-art algorithm.
Wen, D, Qin, L, Lin, X, Zhang, Y & Chang, L 1970, 'Enumerating k-Vertex Connected Components in Large Graphs.', CoRR, International Conference on Data Engineering, IEEE, Macao, Macao, pp. 52-63.
View/Download from: Publisher's site
View description>>
In social network analysis, structural cohesion (or vertex connectivity) is a fundamental metric in measuring the cohesion of social groups. Given an undirected graph, a k-vertex connected component (k-VCC) is a maximal connected subgraph whose structural cohesion is at least k. A k-VCC has many outstanding structural properties, such as high cohesiveness, high robustness, and subgraph overlapping. In this paper, given a graph G and an integer k, we study the problem of computing all k-VCCs in G. The general idea for this problem is to recursively partition the graph into overlapped subgraphs. We prove the upper bound of the number of partitions, which implies the polynomial running time algorithm for the k-VCC enumeration. However, the basic solution is costly in computing the vertex cut. To improve the algorithmic efficiency, we observe that the key is reducing the number of local connectivity testings. We propose two effective optimization strategies, namely neighbor sweep and group sweep, to significantly reduce the number of local connectivity testings. We conduct extensive performance studies using ten large real datasets to demonstrate the efficiency of our proposed algorithms. The experimental results demonstrate that our approach can achieve a speedup of up to two orders of magnitude compared to the state-of-the-art algorithm.
Wu, R, Xu, G, Chen, E, Liu, Q & Ng, W 1970, 'Knowledge or Gaming?', Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion, the 26th International Conference, ACM Press, Perth, Western Australia, pp. 321-329.
View/Download from: Publisher's site
View description>>
© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Recent decades have witnessed the rapid growth of intelligent tutoring systems (ITS), in which personalized adaptive techniques are successfully employed to improve the learning of each individual student. However, the problem of using cognitive analysis to distill the knowledge and gaming factor from students learning history is still underexplored. To this end, we propose a Knowledge Plus Gaming Response Model (KPGRM) based on multiple-attempt responses. Specifically, we first measure the explicit gaming factor in each multiple-attempt response. Next, we utilize collaborative filtering methods to infer the implicit gaming factor of one-attempt responses. Then we model student learning cognitively by considering both gaming and knowledge factors simultaneously based on a signal detection model. Extensive experiments on two real-world datasets prove that KPGRM can model student learning more effectively as well as obtain a more reasonable analysis.
Wu, W, Li, B, Chen, L & Zhang, C 1970, 'Consistent Weighted Sampling Made More Practical', Proceedings of the 26th International Conference on World Wide Web, WWW '17: 26th International World Wide Web Conference, International World Wide Web Conferences Steering Committee, Perth, Australia, pp. 1035-1043.
View/Download from: Publisher's site
View description>>
© 2017 International World Wide Web Conference Committee (IW3C2) Min-Hash, which is widely used for efficiently estimating similarities of bag-of-words represented data, plays an increasingly important role in the era of big data. It has been extended to deal with real-value weighted sets – Improved Consistent Weighted Sampling (ICWS) is considered as the state-of-the-art for this problem. In this paper, we propose a Practical CWS (PCWS) algorithm. We first transform the original form of ICWS into an equivalent expression, based on which we find some interesting properties that inspire us to make the ICWS algorithm simpler and more efficient in both space and time complexities. PCWS is not only mathematically equivalent to ICWS and preserves the same theoretical properties, but also saves 20% memory footprint and substantial computational cost compared to ICWS. The experimental results on a number of real-world text data sets demonstrate that PCWS obtains the same (even better) classification and retrieval performance as ICWS with 1/5 ∼ 1/3 reduced empirical runtime.
Xin, J-N, Du, X & Zhang, J 1970, 'Deep learning for robust outdoor vehicle visual tracking', 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, Hong Kong, China, pp. 613-618.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Robust visual tracking for outdoor vehicle is still a challenging problem due to large appearance variations caused by illumination variation, occlusion and scale variation, etc. In this paper, a deep-learning-based approach for robust outdoor vehicle tracking is proposed. Firstly, a stacked denoising auto-encoder is pre-trained to learn the feature representation way of images. Then, a k-sparse constraint is added to the stacked denoising auto-encoder and the encoder of k-sparse stacked denoising auto-encoder (kSSDAE) is connected with a classification layer to construct a classification neural network. After fine-tuning, the classification neural network is applied to online tracking under particle filter framework. Extensive tracking experiments are conducted on a challenging single object online tracking evaluation platform benchmark to verify the effectiveness of our tracker. Experiments show that our tracker outperforms most state-of-the-art trackers.
Xu, J, Wei, W & Cao, L 1970, 'Copula-Based High Dimensional Cross-Market Dependence Modeling', 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Tokyo, Japan, pp. 734-743.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Dependence across multiple financial markets, such as stock and foreign exchange rate markets, is high-dimensional, contains various relationships, and often presents complicated dependence structures and characteristics such as asymmetrical dependence. Modeling such dependence structures is very challenging. Although copula has been demonstrated to be effective in describing dependence between variables in recent studies, building effective dependence structures to address the above complexities significantly challenges existing copula models. In this paper, we propose a new D vine-based model with a bottom-up strategy to construct high-dimensional dependence structures. The new modeling outcomes are applied to trade 15 stock market indices and 10 currency rates over 16 years as a case study. Extensive experimental results show that this model and its intrinsic design significantly outperform typical models and industry baselines, as shown by the log-likelihood and Vuong test, and Value at Risk - a widely used industrial benchmark. Our model provides interpretable knowledge and profound insights into the high-dimensional dependence structures across data sources.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J & Tang, Z 1970, 'Robust Gait Recognition under Unconstrained Environments Using Hybrid Descriptions', 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Sydney, Australia, pp. 1-7.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Gait is one of the key biometric features that has been widely applied for human identification. Appearance-based features and motion-based features are the two mainly used presentations in the gait recognition. However, appearance-based features are sensitive to the body shape changes and silhouette extraction from real-world images and videos also remains a challenge. As for motion features, due to the difficulty in extracting the underlying models from gait sequences, the localization of human joints lacks of high reliability and strong robustness. This paper proposes a new approach which utilizes Two-Point Gait (TPG) as the motion feature to remedy the deficiency of the appearance feature based on Gait Energy Image (GEI), in order to increase the robustness of gait recognition under the unconstrained environments with view changes and cloth changes. Another contribution of this paper is that this is the first time that TPG has been applied for view change and cloth change issues since it was proposed. The extensive experiments show that the proposed method is more invariant to the view change and cloth change, and can significantly improve the robustness of gait recognition.
Zhang, J, Wu, Q, Zhang, J, Shen, C & Lu, J 1970, 'Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement', The Thirty-Second AAAI Conference on Artificial Intelligence, The Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, USA, pp. 7550-7557.
View description>>
The number of social images has exploded by the wide adoption of socialnetworks, and people like to share their comments about them. These commentscan be a description of the image, or some objects, attributes, scenes in it,which are normally used as the user-provided tags. However, it is well-knownthat user-provided tags are incomplete and imprecise to some extent. Directlyusing them can damage the performance of related applications, such as theimage annotation and retrieval. In this paper, we propose to learn an imageannotation model and refine the user-provided tags simultaneously in aweakly-supervised manner. The deep neural network is utilized as the imagefeature learning and backbone annotation model, while visual consistency,semantic dependency, and user-error sparsity are introduced as the constraintsat the batch level to alleviate the tag noise. Therefore, our model is highlyflexible and stable to handle large-scale image sets. Experimental results ontwo benchmark datasets indicate that our proposed model achieves the bestperformance compared to the state-of-the-art methods.
Zhang, J, Zhang, J, Wu, Q, Wu, Q, Xu, J, Lu, J, Phua, R, Curr, K & Tang, Z 1970, 'Historical Image Annotation by Exploring the Tag Relevance', 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), IEEE, Nanjing, China, pp. 646-651.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Historical images usually contain enormous historical research value and are highly related to the history objects, events and background stories etc. Therefore, annotating these images always requires selecting tags within a large set. In this paper, we propose to annotate historical images by exploring the tag relevance. We measure the tag relevance from three different perspectives, including its visual relevance, its dependencies with other tags and its relationship with location based meta-data. By using tag relevance as guidance, we generate three tag sub-sets and use them to fulfill the annotation. Experimental results on the benchmark dataset indicate the significance of exploring the tag relevance by comparing with the baseline experiments.
Zhao, M, Zhang, J, Porikli, F, Zhang, C & Zhang, W 1970, 'Learning a perspective-embedded deconvolution network for crowd counting', 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, Hong Kong, China, pp. 403-408.
View/Download from: Publisher's site
View description>>
We present a novel deep learning framework for crowd counting
by learning a perspective-embedded deconvolution network.
Perspective is an inherent property of most surveillance
scenes. Unlike the traditional approaches that exploit the perspective
as a separate normalization, we propose to fuse the
perspective into a deconvolution network, aiming to obtain a
robust, accurate and consistent crowd density map. Through
layer-wise fusion, we merge perspective maps at different resolutions
into the deconvolution network. With the injection of
perspective, our network is driven to learn to combine the underlying
scene geometric constraints adaptively, thus enabling
an accurate interpretation from high-level feature maps to the
pixel-wise crowd density map. In addition, our network allows
generating density map for arbitrary-sized input in an
end-to-end fashion. The proposed method achieves competitive
result on the WorldExpo2010 crowd dataset.
Zhou, Z, Xu, G, Zhu, W, Li, J & Zhang, W 1970, 'Structure embedding for knowledge base completion and analytics', 2017 International Joint Conference on Neural Networks (IJCNN), 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, Anchorage, Alaska, USA.
View/Download from: Publisher's site
View description>>
To explore the latent information of Human Knowledge, the analysis for Knowledge Bases (KBs) (e.g. WordNet, Freebase) is essential. Some previous KB element embedding frameworks are used for KBs structure analysis and completion. These embedding frameworks use low-dimensional vector space representation for large scale of entities and relations in KB. Based on that, the vector space representation of entities and relations which are not contained by KB can be measured. The embedding idea is reasonable, while the current embedding methods have some issues to get proper embeddings for KB elements. The embedding methods use entity-relation-entity triplet, contained by most of current KB, as training data to output the embedding representation of entities and relations. To measure the truth of one triplet (The knowledge represented by triplet is true or false), some current embedding methods such as Structured Embedding (SE) project entity vectors into subspace, the meaning of such subspace is not clear for knowledge reasoning. Some other embedding methods such as TransE use simple linear vector transform to represent relation (such as vector add or minus), which can't deal with the multiple relations match or multiple entities match problem. For example, there are multiple relations between two entities, or there are multiple entities have same relation with one entity. Insipred by previous KB element structured embedding methods, we propose a new method, Bipartite Graph Network Structured Embedding (BGNSE). BGNSE combines the current KB embedding methods with bipartite graph network model, which is widely used in many fields including image data compression, collaborative filtering. BGNSE embeds each entity-relation-entity KB triplet into a bipartite graph network structure model, represents each entity by one bipartite graph layer, represents relation by link weights matrix of bipartite graph network. Based on bipartite graph model, our proposed method has followi...
Zuo, Y, Wu, Q, Zhang, J & An, P 1970, 'Minimum spanning forest with embedded edge inconsistency measurement for color-guided depth map upsampling', 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, Hong Kong, China, pp. 211-216.
View/Download from: Publisher's site
View description>>
© 2017 IEEE. Color-guided depth map up-sampling, such as Markov-Random-Field-based (MRF-based) methods, is a popular depth map enhancement solution, which normally assumes edge consistency between color image and corresponding depth map. It calculates the coefficients of smoothness term in MRF according to such assumption. However, such consistency is not always true which leads to texture-copying artifacts and blurring depth edges. In this paper, we propose a novel coefficient computing scheme for smoothness term in MRF which is based on the distance between pixels in the Minimum Spanning Trees (Forest) to better preserve depth edges. The explicit edge inconsistency measurement is embedded into weights of edges in Minimum Spanning Trees, which significantly mitigates texture-copying artifacts. The proposed method is evaluated on Middlebury datasets and ToF-Mark datasets which demonstrates improved results compared with state-of-the-art methods.