Timezone: »

Model Misspecification in Multiple Weak Supervision
Salva Rühling Cachay

Mon Dec 07 09:21 AM -- 09:31 AM (PST) @ None

"Data programming has proven to be an attractive alternative to costly hand-labeling of data. In this paradigm, users encode domain knowledge into \emph{labeling functions}, heuristics that label a subset of the data noisily and may have complex dependencies. The effects on test set performance of a downstream classifier caused by label model misspecification are understudied--presenting a serious knowledge gap to practitioners, in particular since LF dependencies are frequently ignored. In this paper, we focus on modeling errors due to structure over-specification. Based on novel theoretical bounds on the modeling error, we empirically show that this error can be substantial, even when modeling a seemingly sensible structure."

Author Information

Salva Rühling Cachay (Technical University of Darmstadt)

More from the Same Authors