Publications
|
Recipe Attribute Prediction using Review Text as Supervision
Gregory Druck
In Proceedings of the IJCAI Workshop on Cooking with Computers, 2013.
[abstract] [bib] [pdf]
Online recipes are often accompanied by user reviews. In addition to numeric ratings and descriptions of modifications, these reviews frequently contain detailed information about the cooking process, the taste and texture of the dish, and occasions or situations for which the dish is suited. In this paper, we aim to
leverage this information to build a system that predicts what users would say about a recipe. Specifically, we annotate recipes with attributes that are applied to them in reviews. Then, we train models to predict these attributes using information about the ingredients, preparation steps, and recipe title. For
example, we aim to predict whether a salad would be described as "refreshing" in reviews. We demonstrate that it is possible to make such predictions accurately and that the factors that are important in these predictions are intuitive. We also discuss potential downstream applications of this method to recipe
recommendation, recipe retrieval, and guided recipe modification.
@inproceedings{druck13review,
Author = {Gregory Druck},
Booktitle = {Proceedings of the IJCAI Workshop on Cooking with Computers},
Title = {Recipe Attribute Prediction using Review Text as Supervision},
Year = {2013}}
|
Spice it up? Mining Refinements to Online Instructions from User Generated Content
Gregory Druck and Bo Pang
Proceedings of ACL 2012
[abstract] [bib] [pdf]
There are a growing number of popular web sites where users submit and review instructions for completing tasks as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable
refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly
identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. In
experiments in the recipe domain, our model provides 90.1% F1 for predicting refinements at the review level, and 77.0% F1 for predicting refinement segments within reviews.
@inproceedings{recipes_acl12,
Author = {Gregory Druck and Bo Pang},
Booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012)},
Title = {Spice it up? Mining Refinements to Online Instructions from User Generated Content},
Year = {2012}
Pages = {545--553}}
|
Generalized Expectation Criteria for Lightly Supervised Learning
Gregory Druck
Ph.D. Thesis, 2011
[bib] [pdf] [note]
@phdthesis{druck11thesis,
title = {Generalized Expectation Criteria for Lightly Supervised Learning},
author = {Gregory Druck},
school = {University of Massachusetts Amherst},
month = {September},
year = {2011}}
This version differs from the official version. It contains updated experiments in Chapters 8 and 10, and several corrections, edits, and formatting improvements.
|
Toward Interactive Training and Evaluation
Gregory Druck, Andrew McCallum.
In Proceedings of CIKM 2011
[abstract] [bib] [pdf]
Machine learning often relies on costly labeled data, and this impedes its application to new classification and information extraction problems. This has motivated
the development of methods for leveraging abundant prior knowledge about these problems, including methods for lightly supervised learning using model expectation
constraints. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine
expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample of the
unlabeled data for the practitioner to inspect. To address these problems, we propose stratified sampling methods that use model expectations as a proxy for latent output
variables. In classification and sequence labeling experiments, these sampling strategies reduce accuracy evaluation effort by as much as 53%, provide more reliable
estimates of F1 for rare labels, and aid in the specification and refinement of constraints.
@inproceedings{druck11cikm,
Author = {Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the ACM Conference on Information and Knowledge Management (CIKM)},
Title = {Toward Interactive Training and Evaluation},
Pages = {947--956},
Year = {2011}}
|
TUTORIAL: Rich Prior Knowledge in Learning for Natural Language Processing
Gregory Druck, Kuzman Ganchev, João Graça.
Presented at ACL 2011, Interspeech 2011
[abstract]
|
High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models
Gregory Druck, Andrew McCallum.
In Proceedings of ICML 2010
[abstract] [bib] [pdf]
We develop a semi-supervised learning method that constrains the posterior distribution of latent variables under a generative model to satisfy a rich set of feature expectation constraints estimated with labeled data. This approach
encourages the generative model to discover latent structure that is relevant to a prediction task. We estimate parameters with a coordinate ascent algorithm, one step of which involves training a discriminative log-linear model with
an embedded generative model. This hybrid model can be used for test time prediction. Unlike other high-performance semi-supervised methods, the proposed algorithm converges to a stationary point of a single objective function, and
affords additional flexibility, for example to use different latent and output spaces. We conduct experiments on three sequence labeling tasks, achieving the best reported results on two of them, and showing promising results on
CoNLL03 NER.
@inproceedings{druck10high,
Author = {Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the International Conference on Machine Learning (ICML 2010)},
Title = {High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models},
Pages = {319--326},
Year = {2010}}
|
Computing Conditional Feature Covariance in Non-Projective Tree Conditional Random Fields
Gregory Druck, David Smith.
University of Massachusetts Technical Report # UM-CS-2009-060.
[abstract] [bib] [pdf]
We present an O(N^4) time algorithm for computing conditional feature covariance in edge-factored conditional random fields (CRFs) over non-projective dependency trees. Applications of this
algorithm include more efficient Generalized Expectation (GE) parameter estimation.
@techreport{druck09covariance,
Author = {Gregory Druck and David Smith},
Institution = {University of Massachusetts},
Number = {UM-CS-2009-060},
Title = {Computing Conditional Feature Covariance in Non-Projective Tree Conditional Random Fields},
Year = {2009}}
|
Active Learning by Labeling Features
Gregory Druck, Burr Settles, Andrew McCallum.
In Proceedings of EMNLP 2009.
[abstract] [bib] [pdf]
Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we
propose an active learning approach in which the machine solicits "labels" on features rather than instances. In both simulated and real user experiments on two sequence labeling
tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances. Preliminary experiments suggest that novel
interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation.
@inproceedings{druck09active,
Author = {Gregory Druck and Burr Settles and Andrew McCallum},
Booktitle = {Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP 2009)},
Title = {Active Learning by Labeling Features},
Pages = {81--90},
Year = {2009}}
|
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria.
Gregory Druck, Gideon Mann, Andrew McCallum.
In Proceedings of ACL 2009.
This technical report describes a more effecient training algorithm.
[abstract] [bib] [pdf]
In this paper, we propose a novel method for semi-supervised learning of non-projective log-linear dependency parsers using directly expressed linguistic prior knowledge (e.g. a noun's
parent is often a verb). Model parameters are estimated using a generalized expectation (GE) objective function that penalizes the mismatch between model predictions and linguistic
expectation constraints. In a comparison with two prominent "unsupervised" learning methods that require indirect biasing toward the correct syntactic structure, we show that GE can attain
better accuracy with as few as 20 intuitive constraints. We also present positive experimental results on longer sentences in multiple languages.
@inproceedings{druck09semi,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language
Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 09)},
Title = {Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria },
Pages = {360--368},
Year = {2009}}
|
Alternating Projections for Learning with Expectation Constraints.
Kedar Bellare, Gregory Druck, Andrew McCallum.
In Proceedings of UAI 2009.
[abstract] [bib] [pdf]
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between
information and moment projections. Our method provides an alternate interpretation of the posterior regularization framework (Graca et al., 2008), maintains uncertainty
during optimization unlike constraint-driven learning (Chang et al., 2007), and is more efficient than generalized expectation criteria (Mann and McCallum, 2008).
Applications of this framework include minimally
supervised learning, semi-supervised learning, and learning with constraints that are more expressive than the underlying model. In experiments, we demonstrate comparable accuracy to
generalized expectation criteria for minimally supervised learning, and use expressive structural constraints to guide semi-supervised learning, providing a 3%-6% improvement over
state-of-the-art constraint-driven learning.
@inproceedings{bellare09alternating,
Author = {Kedar Bellare and Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 09)},
Title = {Alternating Projections for Learning with Expectation Constraints},
Pages = {35--42},
Year = {2009}}
|
Learning from Labeled Features using Generalized Expectation Criteria.
Gregory Druck, Gideon Mann, Andrew McCallum.
In Proceedings of SIGIR 2008.
A version of this paper appeared in the Proceedings of NESCAI 2008.
A version of this paper appeared as U. of Massachusetts Amherst Tech. Report UM-CS-2007-62.
An implementation of this method is now part of
MALLET. See the tutorial.
[abstract] [bib] [pdf]
It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in
the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the
presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for
training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use
labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter
estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we
develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading
parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled
features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute
of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of
only 77%.
@inproceedings{druck08learning,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
Pages = {595--602},
Title = {Learning from Labeled Features using Generalized Expectation Criteria},
Year = {2008}}
|
Learning to Predict the Quality of Contributions to Wikipedia.
Gregory Druck, Gerome Miklau, Andrew McCallum.
In Proceedings of the AAAI Workshop on Wikipedia and AI, 2008.
[abstract] [bib] [pdf]
Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality --- vandalism, bias, and errors can be problems.
Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and "good Samaritan" users. As
Wikipedia continues to grow, however, it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist
users in creating and maintaining quality. In this paper, we propose metrics that quantify the quality of contributions to Wikipedia through implicit feedback from the community. We then learn
discriminative probabilistic models that predict the quality of a new edit using features of the changes made, the author of the edit, and the article being edited. Through estimating parameters
for these models, we also gain an understanding of factors that influence quality. We advocate using edit quality predictions and information gleaned from model analysis not to place
restrictions on editing, but to instead alert users to potential quality problems, and to facilitate the development of additional incentives for contributors. We evaluate the edit quality
prediction models on the Spanish Wikipedia. Experiments demonstrate that the models perform better when given access to content-based features of the edit, rather than only features of
contributing user. This suggests that a user-based solution to the Wikipedia quality problem may not be sufficient.
@inproceedings{druck08wikiai,
Author = {Gregory Druck and Gerome Miklau and Andrew McCallum},
Booktitle = {Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 08)},
Pages = {7--12},
Title = {Learning to Predict the Quality of Contributions to Wikipedia},
Year = {2008}}
|
Leveraging Existing Resources using Generalized Expectation Criteria.Gregory Druck, Gideon Mann, Andrew
McCallum. In NIPS Workshop on Learning Problem Design, 2007
Updated: 12/17/07
[abstract] [bib] [pdf]
It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that
are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the
use of generalized expectation (GE) criteria to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a
model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler
divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no
conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a
supervised classifier requires 4,000 labeled instances.
@inproceedings{druck07leveraging,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Learning Problem Design},
Title = {Leveraging Existing Resources using Generalized Expectation Criteria},
Year = {2007}}
|
Generalized Expectation Criteria.Andrew McCallum, Gideon Mann, Gregory Druck.
U. of Massachusetts Amherst Tech. Report UM-CS-2007-60
This working note has not been updated recently. The 2008 SIGIR, and 2009 ACL and EMNLP papers provide up-to-date descriptions of GE.
[abstract] [bib] [pdf]
This note describes generalized expectation (GE) criteria, a
framework for incorporating preferences about model expectations into
parameter estimation objective functions. We discuss relations to
other methods, various learning paradigms it supports, and
applications that can leverage its flexibility.
@techreport{mccallum07generalized,
Author = {Andrew McCallum and Gideon Mann and Gregory Druck},
Institution = {University of Massachusetts Amherst},
Number = {UM-CS-2007-60},
Title = {Generalized Expectation Criteria},
Year = {2007}}
|
Semi-Supervised Classification with Hybrid Generative/Discriminative Methods.Gregory Druck, Chris Pal, Xiaojin Zhu, Andrew
McCallum. In Proceedings of KDD 2007.
[abstract] [bib] [pdf]
We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semi-supervised classification. In both cases we explore the
tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semi-supervised learning methods assume low
density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semi-supervised learning in the presence of
strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches
within naively structured Markov random field models and provide a thorough empirical comparison with two well-known semi-supervised learning methods on six text classification tasks. A
semi-supervised hybrid generative/discriminative method provides the best accuracy in 75% of the experiments, and the multi-conditional learning hybrid approach achieves the highest
overall mean accuracy across all tasks.
@inproceedings{druck07semi,
Author = {Gregory Druck and Chris Pal and Andrew McCallum and Xiaojin Zhu},
Booktitle = {Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 07)},
Pages = {280--289},
Title = {Semi-supervised classification with hybrid generative/discriminative methods},
Year = {2007}}
|
Learning A* Underestimates: Using Inference to Guide Inference. Gregory Druck, Mukund Narasimhan, Paul Viola.
In Proceedings of AISTATS 2007
[abstract] [bib] [pdf]
We present a technique for speeding up inference of structured variables using a priority-driven search algorithm rather than the more conventional dynamic programing. A priority-driven
search algorithm is guaranteed to return the optimal answer if the priority function is an underestimate of the true cost function. We introduce the notion of a probable approximate
underestimate, and show that it can be used to compute a probable approximate solution to the inference problem when used as a priority function. We show that we can learn probable
approximate underestimate functions which have the functional form of simpler, easy to decode models. These models can be learned from unlabeled data by solving a linear/quadratic
optimization problem. As a result, we get a priority function that can be computed quickly, and results in solutions that are (provably) almost optimal most of the time. Using these ideas,
discriminative classifiers such as semi-Markov CRFs and discriminative parsers can be sped up using a generalization of the A* algorithm. Further, this technique resolves one of the biggest
obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function. Applying this technique results in a algorithm that is more than 3
times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.
@inproceedings{druck07learning,
Author = {Gregory Druck and Mukund Narasimhan and Paul Viola},
Booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07)},
Pages = {99--106},
Title = {Learning A* underestimates: Using inference to guide inference},
Year = {2007}}
|
Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Gregory Druck, Xuerui Wang.
In Proceedings of AAAI 2006.
[abstract] [bib] [pdf]
This paper presents multi-conditional learning (MCL), a training
criterion based on a product of multiple conditional likelihoods.
When combining the traditional conditional probability of "label given
input" with a generative probability of "input given label" the
later acts as a surprisingly effective regularizer. When applied to
models with latent variables, MCL combines the structure-discovery
capabilities of generative topic models, such as latent Dirichlet
allocation and the exponential family harmonium, with the accuracy and robustness of
discriminative classifiers, such as logistic regression and
conditional random fields. We present results on several standard
text data sets showing significant reductions in classification error
due to MCL regularization, and substantial gains in precision and
recall due to the latent structure discovered under MCL.
@inproceedings{mccallum06multi,
Author = {Andrew McCallum and Chris Pal and Gregory Druck and Xuerui Wang},
Booktitle = {Proceedings of the American Association for Artificial Intelligence National Conference on Artificial Intelligence (AAAI 06)},
Pages = {433--439},
Title = {Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification},
Year = {2006}}
|
|