I am the Chief AI Officer at Graphite, where I lead a team of scientists and engineers building AI tools for organic growth, including agentic AI, LLMs, RAG, MCPs, NLP, ML, web search, recommender systems, and data science.
I also research how AI is reshaping marketing.
Previously, I was Chief Data Scientist at Yummly, where I led a team of ten scientists and engineers in the research and development of NLP and computer vision systems for the smart kitchen.
Prior to Yummly, I was an academic researcher in the NLP and search research group at Yahoo Research, with internships at Google and MS. I earned a Ph.D. in 2011 from the University of Massachusetts Amherst, where I worked on semi-supervised and active machine learning for text data with Andrew McCallum.
Online recipes are often accompanied by user reviews. In addition to numeric ratings and descriptions of modifications, these reviews frequently contain detailed information about the cooking process, the taste and texture of the dish, and occasions or situations for which the dish is suited. In this paper, we aim to leverage this information to build a system that predicts what users would say about a recipe. Specifically, we annotate recipes with attributes that are applied to them in reviews. Then, we train models to predict these attributes using information about the ingredients, preparation steps, and recipe title. For example, we aim to predict whether a salad would be described as "refreshing" in reviews. We demonstrate that it is possible to make such predictions accurately and that the factors that are important in these predictions are intuitive. We also discuss potential downstream applications of this method to recipe recommendation, recipe retrieval, and guided recipe modification.
@inproceedings{druck13review,
Author = {Gregory Druck},
Booktitle = {Proceedings of the IJCAI Workshop on Cooking with Computers},
Title = {Recipe Attribute Prediction using Review Text as Supervision},
Year = {2013}}
There are a growing number of popular web sites where users submit and review instructions for completing tasks as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. In experiments in the recipe domain, our model provides 90.1% F1 for predicting refinements at the review level, and 77.0% F1 for predicting refinement segments within reviews.
@inproceedings{recipes_acl12,
Author = {Gregory Druck and Bo Pang},
Booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012)},
Title = {Spice it up? Mining Refinements to Online Instructions from User Generated Content},
Year = {2012}
Pages = {545--553}}
@phdthesis{druck11thesis,
title = {Generalized Expectation Criteria for Lightly Supervised Learning},
author = {Gregory Druck},
school = {University of Massachusetts Amherst},
month = {September},
year = {2011}}
This version differs from the official version. It contains updated experiments in Chapters 8 and 10, and several corrections, edits, and formatting improvements.
Machine learning often relies on costly labeled data, and this impedes its application to new classification and information extraction problems. This has motivated the development of methods for leveraging abundant prior knowledge about these problems, including methods for lightly supervised learning using model expectation constraints. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample of the unlabeled data for the practitioner to inspect. To address these problems, we propose stratified sampling methods that use model expectations as a proxy for latent output variables. In classification and sequence labeling experiments, these sampling strategies reduce accuracy evaluation effort by as much as 53%, provide more reliable estimates of F1 for rare labels, and aid in the specification and refinement of constraints.
@inproceedings{druck11cikm,
Author = {Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the ACM Conference on Information and Knowledge Management (CIKM)},
Title = {Toward Interactive Training and Evaluation},
Pages = {947--956},
Year = {2011}}
We develop a semi-supervised learning method that constrains the posterior distribution of latent variables under a generative model to satisfy a rich set of feature expectation constraints estimated with labeled data. This approach encourages the generative model to discover latent structure that is relevant to a prediction task. We estimate parameters with a coordinate ascent algorithm, one step of which involves training a discriminative log-linear model with an embedded generative model. This hybrid model can be used for test time prediction. Unlike other high-performance semi-supervised methods, the proposed algorithm converges to a stationary point of a single objective function, and affords additional flexibility, for example to use different latent and output spaces. We conduct experiments on three sequence labeling tasks, achieving the best reported results on two of them, and showing promising results on CoNLL03 NER.
@inproceedings{druck10high,
Author = {Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the International Conference on Machine Learning (ICML 2010)},
Title = {High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models},
Pages = {319--326},
Year = {2010}}
We present an O(N^4) time algorithm for computing conditional feature covariance in edge-factored conditional random fields (CRFs) over non-projective dependency trees. Applications of this algorithm include more efficient Generalized Expectation (GE) parameter estimation.
@techreport{druck09covariance,
Author = {Gregory Druck and David Smith},
Institution = {University of Massachusetts},
Number = {UM-CS-2009-060},
Title = {Computing Conditional Feature Covariance in Non-Projective Tree Conditional Random Fields},
Year = {2009}}
Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we propose an active learning approach in which the machine solicits "labels" on features rather than instances. In both simulated and real user experiments on two sequence labeling tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances. Preliminary experiments suggest that novel interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation.
@inproceedings{druck09active,
Author = {Gregory Druck and Burr Settles and Andrew McCallum},
Booktitle = {Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP 2009)},
Title = {Active Learning by Labeling Features},
Pages = {81--90},
Year = {2009}}
In this paper, we propose a novel method for semi-supervised learning of non-projective log-linear dependency parsers using directly expressed linguistic prior knowledge (e.g. a noun's parent is often a verb). Model parameters are estimated using a generalized expectation (GE) objective function that penalizes the mismatch between model predictions and linguistic expectation constraints. In a comparison with two prominent "unsupervised" learning methods that require indirect biasing toward the correct syntactic structure, we show that GE can attain better accuracy with as few as 20 intuitive constraints. We also present positive experimental results on longer sentences in multiple languages.
@inproceedings{druck09semi,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 09)},
Title = {Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria },
Pages = {360--368},
Year = {2009}}
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alternate interpretation of the posterior regularization framework (Graca et al., 2008), maintains uncertainty during optimization unlike constraint-driven learning (Chang et al., 2007), and is more efficient than generalized expectation criteria (Mann and McCallum, 2008). Applications of this framework include minimally supervised learning, semi-supervised learning, and learning with constraints that are more expressive than the underlying model. In experiments, we demonstrate comparable accuracy to generalized expectation criteria for minimally supervised learning, and use expressive structural constraints to guide semi-supervised learning, providing a 3%-6% improvement over state-of-the-art constraint-driven learning.
@inproceedings{bellare09alternating,
Author = {Kedar Bellare and Gregory Druck and Andrew McCallum},
Booktitle = {Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 09)},
Title = {Alternating Projections for Learning with Expectation Constraints},
Pages = {35--42},
Year = {2009}}
It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteria --- terms in a parameter estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of only 77%.
@inproceedings{druck08learning,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval},
Pages = {595--602},
Title = {Learning from Labeled Features using Generalized Expectation Criteria},
Year = {2008}}
Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality --- vandalism, bias, and errors can be problems. Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and "good Samaritan" users. As Wikipedia continues to grow, however, it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist users in creating and maintaining quality. In this paper, we propose metrics that quantify the quality of contributions to Wikipedia through implicit feedback from the community. We then learn discriminative probabilistic models that predict the quality of a new edit using features of the changes made, the author of the edit, and the article being edited. Through estimating parameters for these models, we also gain an understanding of factors that influence quality. We advocate using edit quality predictions and information gleaned from model analysis not to place restrictions on editing, but to instead alert users to potential quality problems, and to facilitate the development of additional incentives for contributors. We evaluate the edit quality prediction models on the Spanish Wikipedia. Experiments demonstrate that the models perform better when given access to content-based features of the edit, rather than only features of contributing user. This suggests that a user-based solution to the Wikipedia quality problem may not be sufficient.
@inproceedings{druck08wikiai,
Author = {Gregory Druck and Gerome Miklau and Andrew McCallum},
Booktitle = {Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 08)},
Pages = {7--12},
Title = {Learning to Predict the Quality of Contributions to Wikipedia},
Year = {2008}}
It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the use of generalized expectation (GE) criteria to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a supervised classifier requires 4,000 labeled instances.
@inproceedings{druck07leveraging,
Author = {Gregory Druck and Gideon Mann and Andrew McCallum},
Booktitle = {Proceedings of the Neural Information Processing Systems (NIPS) Workshop on Learning Problem Design},
Title = {Leveraging Existing Resources using Generalized Expectation Criteria},
Year = {2007}}
This note describes generalized expectation (GE) criteria, a framework for incorporating preferences about model expectations into parameter estimation objective functions. We discuss relations to other methods, various learning paradigms it supports, and applications that can leverage its flexibility.
@techreport{mccallum07generalized,
Author = {Andrew McCallum and Gideon Mann and Gregory Druck},
Institution = {University of Massachusetts Amherst},
Number = {UM-CS-2007-60},
Title = {Generalized Expectation Criteria},
Year = {2007}}
We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semi-supervised classification. In both cases we explore the tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semi-supervised learning methods assume low density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semi-supervised learning in the presence of strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches within naively structured Markov random field models and provide a thorough empirical comparison with two well-known semi-supervised learning methods on six text classification tasks. A semi-supervised hybrid generative/discriminative method provides the best accuracy in 75% of the experiments, and the multi-conditional learning hybrid approach achieves the highest overall mean accuracy across all tasks.
@inproceedings{druck07semi,
Author = {Gregory Druck and Chris Pal and Andrew McCallum and Xiaojin Zhu},
Booktitle = {Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 07)},
Pages = {280--289},
Title = {Semi-supervised classification with hybrid generative/discriminative methods},
Year = {2007}}
We present a technique for speeding up inference of structured variables using a priority-driven search algorithm rather than the more conventional dynamic programing. A priority-driven search algorithm is guaranteed to return the optimal answer if the priority function is an underestimate of the true cost function. We introduce the notion of a probable approximate underestimate, and show that it can be used to compute a probable approximate solution to the inference problem when used as a priority function. We show that we can learn probable approximate underestimate functions which have the functional form of simpler, easy to decode models. These models can be learned from unlabeled data by solving a linear/quadratic optimization problem. As a result, we get a priority function that can be computed quickly, and results in solutions that are (provably) almost optimal most of the time. Using these ideas, discriminative classifiers such as semi-Markov CRFs and discriminative parsers can be sped up using a generalization of the A* algorithm. Further, this technique resolves one of the biggest obstacles to the use of A* as a general decoding procedure, namely that of coming up with a admissible priority function. Applying this technique results in a algorithm that is more than 3 times as fast as the Viterbi algorithm for decoding semi-Markov Conditional Markov Models.
@inproceedings{druck07learning,
Author = {Gregory Druck and Mukund Narasimhan and Paul Viola},
Booktitle = {Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07)},
Pages = {99--106},
Title = {Learning A\* underestimates: Using inference to guide inference},
Year = {2007}}
This paper presents multi-conditional learning (MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of "label given input" with a generative probability of "input given label" the later acts as a surprisingly effective regularizer. When applied to models with latent variables, MCL combines the structure-discovery capabilities of generative topic models, such as latent Dirichlet allocation and the exponential family harmonium, with the accuracy and robustness of discriminative classifiers, such as logistic regression and conditional random fields. We present results on several standard text data sets showing significant reductions in classification error due to MCL regularization, and substantial gains in precision and recall due to the latent structure discovered under MCL.
@inproceedings{mccallum06multi,
Author = {Andrew McCallum and Chris Pal and Gregory Druck and Xuerui Wang},
Booktitle = {Proceedings of the American Association for Artificial Intelligence National Conference on Artificial Intelligence (AAAI 06)},
Pages = {433--439},
Title = {Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification},
Year = {2006}}