Mark Maloof's Research

Research

A Bayesian approach to concept drift
To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point and used Bayesian model comparison to update this distribution from the predictions of models trained on substrings of the training data. We present a new upper bound on the Kullback-Leibler divergence from a model to an identically structured model trained on a substring of that model's training data, and use it to motivate a pruning method for the models under consideration. Finally, we present the results of an empirical evaluation, in which our approach generally yielded improved accuracy and/or speed over other learning methods. This is joint work with Steve Bach (C '10), who is now studying at the University of Maryland, College Park. Funding came from the Georgetown Undergraduate Research Opportunities Program (GUROP).
Publications: NIPS '10
Paired learners for concept drift
To cope with concept drift, we paired a stable learner with a reactive learner. The stable learner predicts based on all of its experience, whereas the reactive learner predicts based on its experience over a short, recent window of time. If the stable learner predicts incorrectly and the reactive learner predicts correctly sufficiently often, then the method copies the concept representation of the reactive learner to the stable learner, and the pair continue to learn. Using both synthetic problems and real-world data sets, the method of paired learning outperformed or performed comparably to other ensemble methods for concept drift that use an unweighted ensemble of batch learners, a weighted ensemble of batch learners, or a weighted ensemble of online learners. Furthermore, using what we call a retractable learner, which can assert and retract examples, the paired learner was more efficient in time and in time than the other methods considered. This is joint work with Steve Bach (C '10), who is now studying at the University of Maryland, College Park. Funding came from the Georgetown Undergraduate Research Opportunities Program (GUROP).
Publications: ICDM '08
Ensemble methods for concept drift
We have developed two ensemble methods for tracking concept drift: dynamic weighted majority (DWM) and additive experts (AddExp). Generally, these methods use a global algorithm to maintain a weighted collection of experts, each of which is an on-line learning method. If an expert makes a mistake, then the global algorithm decreases its weight. If an expert's weight falls below a threshold, then the global algorithm removes it from the ensemble. If the global algorithm makes a mistake, then it adds a new expert to the ensemble. With on-line versions of naive Bayes and a tree learner as base learners, we have evaluated these algorithms empirically on a variety of data sets, in some cases obtaining the best published results. For AddExp, we proved formal bounds on the number of mistakes it will make and on the loss it will incur during learning. This is joint work with Zico Kolter (C '05), who is now a post-doc at MIT. Funding came from the Georgetown Undergraduate Research Opportunities Program (GUROP).
Publications: ICDM '03, ICML '05, JMLR '07, Java source for DWM
Detecting insider threats
Malicious insiders often pose a greater threat to information than do outsiders. The problem of detecting malicious insiders has subtleties making it different than detecting intruders. For example, one has much more contextual information about insiders than one typically has about intruders. As with intrusion detection, machine learning and data mining methods have an important role to play, especially given the volume and complexity of the activity and information involved. During my sabbatical, I worked with Greg Stephens of the MITRE Corporation on an internally funded research project to develop methods of detecting insider threats. Using real-world data derived from over 16 terabytes of network traffic and consisting of over 90 million events for thousands of users, we investigated machine learning and data mining methods for anomaly detection, clustering, and learning across a variety of insider behaviors, including printing, e-mailing, browsing, and searching.
Publications: RAID '07, IEEE S&P '09
Applications of machine learning to detecting unknown malicious executables
Existing software uses static signatures to identify individual malicious programs. This means we have already detected the malicious program, which often occurs through its activation, and these static signatures, once extracted, are useless for detecting new "malware." In this project, we investigated methods of machine learning and data mining to extract characteristic properties of malicious executables. These new methods may lead to a new generation of software capable of identifying new, previously unknown malware before it activates, propagates, and destroys data and legitimate software. This was joint work with Zico Kolter (C '05) and Nancy Houdek (C '07), both undergraduates in the College. The MITRE Corporation funded this project.
Publications: KDD '04, JMLR '06
Learners with partial instance memory
What happens when inductive learners store some of the examples they encounter, and then use them in future training episodes? Which examples should they keep? Should they keep them forever, or for a period of time? What happens when concepts change? What are the trade-offs in performance between a partial memory learner and learners that store none or all of the examples in the input stream? These are some of the questions that arose while I worked on my dissertation, questions I still pursue. Compared to lesioned systems, these learners have slightly lower predictive accuracies, but store significantly fewer examples, and learn more quickly, and can have simpler concept descriptions. We have also evaluated these systems on the STAGGER Concepts—a synthetic problem involving concept drift—and our learners generally outperformed STAGGER and FLORA2 in terms of predictive accuracy and memory requirements. This project was based on work with Ryszard Michalski.
Publications: ML '00, IJCNN '03, AI '04
Analysis of algorithms using ROC components of variance
In this project, we investigated how the variance of a classifier's error scales with sample size. We used bootstrap techniques to estimate the components of variance of the area under ROC curves. For the case of a single classifier, our results support Fukunaga and Hayes' theory of the effect of sample size; however, for the case of two competing classifiers, we showed that variance comes predominantly from the training set rather than from the test sample, as Fukunaga and Hayes assert. Furthermore, since we examined components of variance rather than total variance, we have been able to provide a much detailed analysis than has previously existed. This was joint work with Sergey Beiden and Bob Wagner.
Publications: PAMI '03
Machine learning for improving building detection in overhead imagery
In this work, we investigated how to use machine learning to improve a rooftop detection task in an existing hierarchical vision system for detecting buildings in overhead imagery. Experimental results demonstrated that a naive Bayesian classifier outperformed a manually constructed linear classifier using area under the ROC curve as the performance metric. Further experimental studies investigated how robust various learning methods were when generalizing to unseen images that differed in aspect and in location. A key issue is how to label large amounts of training data, a problem that arises in vision and in other domains, such as text classification. Therefore, we also examined visual tools to aid in labeling examples and studied the consistency of labeling across individuals. This was joint work with Pat Langley, Tom Binford, and Ram Nevatia.
Publications: ML '03