Allen School Data Science Lab

Logo

Data Science researchers within the Paul G. Allen School of Computer Science & Engineering at the University of Washington.

People

Publications

Publications

Countrywide natural experiment reveals impact of built environment on physical activity
Tim Althoff, Boris Ivanovic, J. Hicks, Scott L. Delp, Abby C. King, J. Leskovec
arXiv.org 2024

Optimizing Dataflow Systems for Scalable Interactive Visualization
Junran Yang, Kevin Hyekang Joo, Sai Yerramreddy, Dominik Moritz, L. Battle
Proc. ACM Manag. Data 2024

Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction
Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, David Wadden, Khendra G. Lucas, Adam S. Miner, et al.
Annual Meeting of the Association for Computational Linguistics 2023

How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study
Ken Gu, Madeleine Grunde-McLaughlin, Andrew M. McNutt, Jeffrey Heer, Tim Althoff
International Conference on Human Factors in Computing Systems 2023

Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan, Margaret Li, M. Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, et al.
arXiv.org 2023

How Do Analysts Understand and Verify AI-Assisted Data Analyses?
Ken Gu, Ruoxi Shang, Tim Althoff, Chenglong Wang, S. Drucker
International Conference on Human Factors in Computing Systems 2023

Approximation and Progressive Display of Multiverse Analyses
Yang Liu, Tim Althoff, Jeffrey Heer
arXiv.org 2023

What Exactly is an Insight? A Literature Review
L. Battle, Alvitta Ottley
Visual .. 2023

Lodestar: Supporting rapid prototyping of data science workflows through data-driven analysis recommendations
Deepthi Raghunandan, Zhe Cui, Kartik Krishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, et al.
Information Visualization 2023

WhaleVis: Visualizing the History of Commercial Whaling
Ameya S. Patil, Zoe R Rand, T. Branch, L. Battle
Visual .. 2023

Using Graphical Perception in Visualization Recommendation
Zehua Zeng, L. Battle
Interactions 2023

Measuring How Data Science Notebooks Evolve Over Time
Deepthi Raghunandan, N. Elmqvist, L. Battle
Interactions 2023

User-Driven Support for Rapid Visualization Prototyping in D3
Hannah K. Bako, Alisha Varma, Anuoluwapo Faboro, Mahreen Haider, Favour Nerrise, B. Kenah, et al.
2023

Toward a Scalable Census of Dashboard Designs in the Wild: A Case Study with Tableau Public
Joanna Purich, Arjun Srinivasan, M. Correll, L. Battle, V. Setlur, Anamaria Crisan
arXiv.org 2023

Visualizing Historical Whaling Voyages over Time
L. Battle, Ameya Patil, T. Branch, Zoe R Rand
Interactions 2023

Too Many Cooks: Exploring How Graphical Perception Studies Influence Visualization Recommendations in Draco
Zehua Zeng, Junran Yang, Dominik Moritz, Jeffrey Heer, L. Battle
IEEE Transactions on Visualization and Computer Graphics 2023

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh, Sibei Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander J. Ratner, Chen-Yu Lee, et al.
arXiv.org 2023

On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training
Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei Koh, Alexander J. Ratner
Neural Information Processing Systems 2023

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander J. Ratner, Ranjay Krishna, et al.
Neural Information Processing Systems 2023

MaskSearch: Querying Image Masks at Scale
Dong He, Jieyu Zhang, Maureen Daum, Alexander J. Ratner, M. Balazinska
arXiv.org 2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander J. Ratner, et al.
Annual Meeting of the Association for Computational Linguistics 2023

DataComp: In search of the next generation of multimodal datasets
S. Gadre, Gabriel Ilharco, Alex Fang, J. Hayase, Georgios Smyrnis, Thao Nguyen, et al.
Neural Information Processing Systems 2023

Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets
Michael Merrill, Tim Althoff
ACM Conference on Health, Inference, and Learning 2022

Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support
Ashish Sharma, Inna Wanyin Lin, Adam S. Miner, David C. Atkins, Tim Althoff
Nature Machine Intelligence 2022

GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
Xuhai Xu, Han Zhang, Yasaman S. Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, et al.
Neural Information Processing Systems 2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Margaret Li, Suchin Gururangan, Tim Dettmers, M. Lewis, Tim Althoff, Noah A. Smith, et al.
arXiv.org 2022

Disparate impacts on online information access during the Covid-19 pandemic
J. Suh, E. Horvitz, Ryen W. White, Tim Althoff
Nature Communications 2022

Grand Challenges for Personal Informatics and AI
Lena Mamykina, Daniel A. Epstein, P. Klasnja, Donna Sprujt-Metz, J. Meyer, M. Czerwinski, et al.
CHI Extended Abstracts 2022

Leveraging Mobile Technology for Public Health Promotion: A Multidisciplinary Perspective.
J. Hicks, M. Boswell, Tim Althoff, A. Crum, J. P. Ku, J. Landay, et al.
Annual Review of Public Health 2022

A computational approach to measure the linguistic characteristics of psychotherapy timing, responsiveness, and consistency
Adam S. Miner, S. Fleming, Albert Haque, Jason Alan Fries, Tim Althoff, D. Wilfley, et al.
npj Mental Health Research 2022

An Integrative Engagement Model of Digital Psychotherapy: Exploratory Focus Group Findings
Jamie M. Zech, Morgan Johnson, M. Pullmann, T. D. Hull, Tim Althoff, Sean A Munson, et al.
JMIR Formative Research 2022

Gendered Mental Health Stigma in Masked Language Models
Inna Wanyin Lin, Lucille Njoo, Anjalie Field, Ashish Sharma, K. Reinecke, Tim Althoff, et al.
Conference on Empirical Methods in Natural Language Processing 2022

Large-scale diet tracking data reveal disparate associations between food environment and diet
Tim Althoff, H. Nilforoshan, J. Hua, J. Leskovec
Nature Communications 2022

Mapping User Engagement in Digital Psychotherapy: An Integrative Engagement Model
Jamie M. Zech, Morgan Johnson, M. Pullman, T. D. Hull, Tim Althoff, Sean A Munson, et al.
2022

Understanding and Supporting Debugging Workflows in Multiverse Analysis
Ken Gu, Eunice Jun, Tim Althoff
International Conference on Human Factors in Computing Systems 2022

Estimating the Burden of Influenza-like Illness on Daily Activity at the Population Scale Using Commercial Wearable Sensors
A. Mezlini, A. Shapiro, E. Daza, E. Caddigan, E. Ramirez, Tim Althoff, et al.
JAMA Network Open 2022

A Programmatic Approach to Applying Visualization Taxonomies to Interaction Logs
Sneha Gathani, S. Monadjemi, Alvitta Ottley, L. Battle
arXiv.org 2022

The DB Community vis-à-vis Environmental, Health, and Societal Grand Challenges: Innovation Engine, Plumber, or Bystander?
A. Ailamaki, L. Battle, J. Gehrke, Masaru Kitsuregawa, David Maier, Christopher Ré, et al.
SIGMOD Conference 2022

A Programmatic Definition of Visualization Tasks, Insights, and Objectives
L. Battle, Alvitta Ottley
arXiv.org 2022

Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
Deepthi Raghunandan, Aayushi Roy, Shenzhi Shi, N. Elmqvist, L. Battle
International Conference on Human Factors in Computing Systems 2022

Demonstration of VegaPlus: Optimizing Declarative Visualization Languages
Junran Yang, Kevin Hyekang Joo, Sai Yerramreddy, Siyao Li, Dominik Moritz, L. Battle
SIGMOD Conference 2022

What Do We Mean When We Say “Insight”? A Formal Synthesis of Existing Theory
L. Battle, Alvitta Ottley
IEEE Transactions on Visualization and Computer Graphics 2022

User-Driven Support for Rapid Visualization Programming in D3
Hannah K. Bako, Alisha Varma, Anuoluwapo Faboro, Mahreen Haider, Favour Nerrise, John P. Dickerson, et al.
2022

Behavior-driven testing of big data exploration tools
L. Battle
Interactions 2022

Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations
Deepthi Raghunandan, Zhe Cui, K. Sivaramakrishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, et al.
arXiv.org 2022

Letter from the Rising Star Award Winner
L. Battle
IEEE Data Engineering Bulletin 2022

Toward an Interaction-Driven Framework for Modeling Big Data Visualization Systems
D. Benvenuti, G. Fiordeponti, Hao Cheng, T. Catarci, Jean-Daniel Fekete, G. Santucci, et al.
Eurographics Conference on Visualization 2022

Understanding how Designers Find and Use Data Visualization Examples
Hannah K. Bako, Xinyi Liu, L. Battle, Zhicheng Liu
IEEE Transactions on Visualization and Computer Graphics 2022

A Grammar-Based Approach for Applying Vis Taxonomies to Interaction Logs
Tobias Schreck, Sneha Gathani, S. Monadjemi, Alvitta Ottley, L. Battle
2022

Analyzing online programming communities to enhance visualization languages
L. Battle
Interactions 2022

Recommendations for Visualization Recommendations: Exploring Preferences and Priorities in Public Health
Calvin Bao, Siyao Li, Sarah Flores, M. Correll, L. Battle
International Conference on Human Factors in Computing Systems 2022

How Do Data Science Workers Communicate Intermediate Results?
Rock Yuren Pang, Ruotong Wang, Joely Nelson, L. Battle
2022 IEEE Visualization in Data Science (VDS) 2022

Are We There Yet? A Systematic Review of Visual Perception Knowledge for Visualization Recommendation Systems
Zehua Zeng, L. Battle
2022

A Grammar‐Based Approach for Applying Visualization Taxonomies to Interaction Logs
Sneha Gathani, S. Monadjemi, Alvitta Ottley, L. Battle
Computer graphics forum (Print) 2022

An Adaptive Benchmark for Modeling User Exploration of Large Datasets
Joanna Purich, H. Mahmood, Diana Chou, Chidi Udeze, L. Battle
arXiv.org 2022

Testing theories of task in visual analytics
L. Battle, Alvitta Ottley
Interactions 2022

Kyrix-J: Visual Discovery of Connected Datasets in a Data Lake
Wenbo Tao, Adam Sah, L. Battle, Remco Chang, M. Stonebraker
Conference on Innovative Data Systems Research 2022

Scalable Vega: Optimizing Declarative Visualization Languages
Junran Yang, Kevin Hyekang Joo, Sai Yerramreddy, Siyao Li, Dominik Moritz, L. Battle
arXiv.org 2022

Streamlining Visualization Authoring in D3 Through User-Driven Templates
Hannah K. Bako, Alisha Varma, Anuoluwapo Faboro, Mahreen Haider, Favour Nerrise, B. Kenah, et al.
Visual .. 2022

A Survey on Programmatic Weak Supervision
Jieyu Zhang, Cheng-Yu Hsieh, Yue Yu, Chao Zhang, Alexander J. Ratner
arXiv.org 2022

Understanding Programmatic Weak Supervision via Source-aware Influence Function
Jieyu Zhang, Hong Wang, Cheng-Yu Hsieh, Alexander J. Ratner
Neural Information Processing Systems 2022

Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision
Jieyu Zhang, Linxin Song, Alexander J. Ratner
International Conference on Artificial Intelligence and Statistics 2022

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming
Cheng-Yu Hsieh, Jieyu Zhang, Alexander J. Ratner
Proceedings of the VLDB Endowment 2022

Binary Classification with Positive Labeling Sources
Jieyu Zhang, Yujing Wang, Yaming Yang, Yang Luo, Alexander J. Ratner
International Conference on Information and Knowledge Management 2022

What Makes Online Communities ‘Better’? Measuring Values, Consensus, and Conflict across Thousands of Subreddits
Galen Cassebeer Weld, Amy X. Zhang, Tim Althoff
International Conference on Web and Social Media 2021

Online Mobile App Usage as an Indicator of Sleep Behavior and Job Performance
Chunjong Park, Morelle S. Arian, Xin Liu, Leon Sasson, Jeffrey Kahn, Shwetak N. Patel, et al.
The Web Conference 2021

Leveraging Community and Author Context to Explain the Performance and Bias of Text-Based Deception Detection Models
Galen Cassebeer Weld, Ellyn Ayton, Tim Althoff, M. Glenski
NLP4IF 2021

Efficient and Explainable Risk Assessments for Imminent Dementia in an Aging Cohort Study
Nicasia Beebe-Wang, Alex Okeson, Tim Althoff, Su-In Lee
IEEE journal of biomedical and health informatics 2021

Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
Ashish Sharma, Inna Wanyin Lin, Adam S. Miner, David C. Atkins, Tim Althoff
The Web Conference 2021

Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis
Martin Schweinsberg, Michael Feldman, N. Staub, O. V. D. Akker, R. V. Aert, M. V. Assen, et al.
Organizational Behavior and Human Decision Processes 2021

Political Bias and Factualness in News Sharing Across more then 100, 000 Online Communities
Galen Cassebeer Weld, M. Glenski, Tim Althoff
International Conference on Web and Social Media 2021

MULTIVERSE: Mining Collective Data Science Knowledge from Code on the Web to Suggest Alternative Analysis Approaches
Michael Merrill, Ashley Ge Zhang, Tim Althoff
Knowledge Discovery and Data Mining 2021

Daily, weekly, seasonal and menstrual cycles in women’s mood, behaviour and vital signs
E. Pierson, Tim Althoff, Daniel Thomas, P. Hillard, J. Leskovec
Nature Human Behaviour 2021

Widening Disparities in Online Information Access during the COVID-19 Pandemic
J. Suh, E. Horvitz, R. White, Tim Althoff
medRxiv 2021

Estimating the Burden of Influenza on Daily Activity at Population Scale Using Commercial Wearable Sensors
A. Mezlini, A. Shapiro, E. Daza, E. Caddigan, E. Ramirez, Tim Althoff, et al.
medRxiv 2021

Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets
Michael Merrill, Tim Althoff
arXiv.org 2021

Making Online Communities ‘Better’: A Taxonomy of Community Values on Reddit
Galen Cassebeer Weld, Amy X. Zhang, Tim Althoff
International Conference on Web and Social Media 2021

Vis Ex Machina: An Analysis of Trust in Human versus Algorithmically Generated Visualization Recommendations
Rachael Zehrung, A. Singhal, M. Correll, L. Battle
International Conference on Human Factors in Computing Systems 2021

A Review and Collation of Graphical Perception Knowledge for Visualization Recommendation
Zehua Zeng, L. Battle
International Conference on Human Factors in Computing Systems 2021

Guided Hyperparameter Tuning Through Visualization and Inference
Kevin Hyekang Joo, Calvin Bao, Ishan Sen, Furong Huang, L. Battle
arXiv.org 2021

Are We There Yet? A Review on Existing Perceptual Theory and Experiment Support for Visualization Recommendation Systems
Zehua Zeng, Minhui Xie, Matthew Gouzoulis, L. Battle
arXiv.org 2021

Exploring D3 Implementation Challenges on Stack Overflow
L. Battle, Danni Feng, Kelli Webber
Visual .. 2021

User-Driven Support for Visualization Prototyping in D3
Hannah K. Bako, Alisha Varma, Anuoluwapo Faboro, Mahreen Haider, Favour Nerrise, B. Kenah, et al.
International Conference on Intelligent User Interfaces 2021

An Evaluation-Focused Framework for Visualization Recommendation Algorithms
Zehua Zeng, Phoebe Moh, F. Du, J. Hoffswell, Tak Yeon Lee, Sana Malik, et al.
IEEE Transactions on Visualization and Computer Graphics 2021

Exploring Visualization Implementation Challenges Faced by D3 Users Online
L. Battle, Danni Feng, Kelli Webber
arXiv.org 2021

WRENCH: A Comprehensive Benchmark for Weak Supervision
Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, et al.
NeurIPS Datasets and Benchmarks 2021

Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)
Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alexander J. Ratner, D. Klakow
arXiv.org 2021

Creating Training Sets via Weak Indirect Supervision
Jieyu Zhang, Bohan Wang, Xiangchen Song, Yujing Wang, Yaming Yang, Jing Bai, et al.
International Conference on Learning Representations 2021

Characterizing COVID-19 and Influenza Illnesses in the Real World via Person-Generated Health Data
A. Shapiro, N. Marinsek, I. Clay, B. Bradshaw, E. Ramirez, J. Min, et al.
Patterns 2020

Assessing the relationship between routine and schizophrenia symptoms with passively sensed measures of behavioral stability
Joy He-Yueya, B. Buck, Andrew T. Campbell, Tanzeem Choudhury, J. Kane, Dror Ben-Zeev, et al.
npj Schizophrenia 2020

A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support
Ashish Sharma, Adam S. Miner, David C. Atkins, Tim Althoff
Conference on Empirical Methods in Natural Language Processing 2020

Data-Driven Implications for Translating Evidence-Based Psychotherapies into Technology-Delivered Interventions
J. Schroeder, J. Suh, Chelsey R. Wilks, M. Czerwinski, Sean A Munson, J. Fogarty, et al.
International Conference on Pervasive Computing Technologies for Healthcare 2020

Population-Scale Study of Human Needs During the COVID-19 Pandemic: Analysis and Implications
J. Suh, E. Horvitz, Ryen W. White, Tim Althoff
Web Search and Data Mining 2020

Boba: Authoring and Visualizing Multiverse Analyses
Yang Liu, Alex Kale, Tim Althoff, Jeffrey Heer
IEEE Transactions on Visualization and Computer Graphics 2020

Boba: Supplemental Material
Yang Liu, Alex Kale, Tim Althoff, Jeffrey Heer
2020

The Effect of Moderation on Online Mental Health Conversations
David Wadden, Tal August, Qisheng Li, Tim Althoff
International Conference on Web and Social Media 2020

CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis
Ashley Ge Zhang, Michael Merrill, Yang Liu, Jeffrey Heer, Tim Althoff
EPJ Data Science 2020

Measuring COVID-19 and Influenza in the Real World via Person-Generated Health Data
N. Marinsek, A. Shapiro, I. Clay, B. Bradshaw, E. Ramirez, J. Min, et al.
medRxiv 2020

Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference
Galen Cassebeer Weld, Peter West, M. Glenski, D. Arbour, Ryan A. Rossi, Tim Althoff
International Conference on Web and Social Media 2020

Decision points and selective reporting in end-to-end data analysis: an interview study
Yang Liu, Tim Althoff, Jeffrey Heer
2020

Engagement Patterns of Peer-to-Peer Interactions on Mental Health Platforms
Ashish Sharma, M. Choudhury, Tim Althoff, Amit Sharma
International Conference on Web and Social Media 2020

How Food Environment Impacts Dietary Consumption and Body Weight: A Country-wide Observational Study of 2.3 Billion Food Logs
Tim Althoff, H. Nilforoshan, J. Hua, J. Leskovec
medRxiv 2020

If you want more women in your workforce, here’s how to recruit.
E. Pierson, Elissa M. Redmiles, L. Battle, J. Hullman
Nature 2020

The Role of Latency and Task Complexity in Predicting Visual Search Behavior
L. Battle, R. Crouser, Audace Nakeshimana, Ananda Montoly, Remco Chang, M. Stonebraker
IEEE Transactions on Visualization and Computer Graphics 2020

A Structured Review of Data Management Technology for Interactive Visualization and Analysis
L. Battle, C. Scheidegger
IEEE Transactions on Visualization and Computer Graphics 2020

Debugging Database Queries: A Survey of Tools, Techniques, and Users
Sneha Gathani, Peter Lim, L. Battle
International Conference on Human Factors in Computing Systems 2020

Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data
Wenbo Tao, Xinli Hou, Adam Sah, L. Battle, Remco Chang, M. Stonebraker
IEEE Transactions on Visualization and Computer Graphics 2020

Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data
L. Battle, P. Eichmann, M. Angelini, T. Catarci, G. Santucci, Yukun Zheng, et al.
SIGMOD Conference 2020

Extracting chemical reactions from text using Snorkel
Emily K. Mallory, Matthieu de Rochemonteix, Alexander J. Ratner, Ambika Acharya, Christopher Ré, R. Bright, et al.
BMC Bioinformatics 2020

AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature
J. Birgmeier, M. Haeussler, C. A. Deisseroth, E. Steinberg, K. Jagadeesh, Alexander J. Ratner, et al.
Science Translational Medicine 2020

Paths Explored, Paths Omitted, Paths Obscured: Decision Points & Selective Reporting in End-to-End Data Analysis
Yang Liu, Tim Althoff, Jeffrey Heer
International Conference on Human Factors in Computing Systems 2019

Human in Focus: Future Research and Applications of Ubiquitous User Monitoring
N. Lau, Michael Hildebrandt, Tim Althoff, L. Boyle, Shamsi T. Iqbal, John D. Lee, et al.
Proceedings of the Human Factors and Ergonomics Society Annual Meeting 2019

Goal-setting And Achievement In Activity Tracking Apps: A Case Study Of MyFitnessPal
Mitchell L. Gordon, Tim Althoff, J. Leskovec
The Web Conference 2019

Best practices for analyzing large-scale health data from wearables and smartphone apps
J. Hicks, Tim Althoff, R. Sosič, Peter Kuhar, B. Bostjancic, A. King, et al.
npj Digital Medicine 2019

The menstrual cycle is a primary contributor to cyclic variation in women’s mood, behavior, and vital signs
E. Pierson, Tim Althoff, Daniel Thomas, P. Hillard, J. Leskovec
bioRxiv 2019

Passively-sensed Behavioral Correlates of Discrimination Events in College Students
Yasaman S. Sefidgar, Woosuk Seo, K. Kuehn, Tim Althoff, Anne Browning, E. Riskin, et al.
Proc. ACM Hum. Comput. Interact. 2019

Leveraging Routine Behavior and Contextually-Filtered Features for Depression Detection among College Students
Xuhai Xu, Prerna Chikersal, Afsaneh Doryab, Daniella K. Villalba, Janine M. Dutcher, Michael J. Tumminia, et al.
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 2019

Characterizing Exploratory Visual Analysis: A Literature Review and Evaluation of Analytic Provenance in Tableau
L. Battle, Jeffrey Heer
Computer graphics forum (Print) 2019

Smile: A System to Support Machine Learning on EEG Data at Scale
Lei Cao, Wenbo Tao, Sungtae An, Jing Jin, Yizhou Yan, Xiaoyu Liu, et al.
Proceedings of the VLDB Endowment 2019

A novel approach to task abstraction to make better sense of provenance data
C. Bors, S. Attfield, L. Battle, Michelle Dowling, A. Endert, Steffen Koch, et al.
2019

crossfilter-benchmark-submission
L. Battle, Yukun Zheng
2019

Kyrix: Interactive Pan/Zoom Visualizations at Scale
Wenbo Tao, Xiaoyu Liu, Yedi Wang, L. Battle, Çağatay Demiralp, Remco Chang, et al.
Computer graphics forum (Print) 2019

International Workshop on Human-In-the-Loop Data Analytics (HILDA)
L. Battle, S. Chaudhuri, Arnab Nandi
SIGMOD Conference 2019

A Provenance Task Abstraction Framework
C. Bors, John E. Wenskovitch, Michelle Dowling, S. Attfield, L. Battle, A. Endert, et al.
IEEE Computer Graphics and Applications 2019

Towards a Customizable Framework for Evaluating Visualization Recommendations
Kelsey R. Fulton, Debjani Saha, Katherine Scola, L. Battle
2019

Doubly Weak Supervision of Deep Learning Models for Head CT
Khaled Saab, Jared A. Dunnmon, R. Goldman, Alexander J. Ratner, H. Sagreiya, Christopher Ré, et al.
International Conference on Medical Image Computing and Computer-Assisted Intervention 2019

Learning Dependency Structures for Weak Supervision Models
P. Varma, Frederic Sala, A. He, Alexander J. Ratner, C. Ré
International Conference on Machine Learning 2019

Improving Sample Complexity with Observational Supervision
Khaled Saab, Jared A. Dunnmon, Alexander J. Ratner, D. Rubin, Christopher Ré
2019

Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Jared A. Dunnmon, Alexander J. Ratner, Nishith Khandwala, Khaled Saab, Matthew Markert, H. Sagreiya, et al.
Patterns 2019

A machine-compiled database of genome-wide association studies
Volodymyr Kuleshov, Jialin Ding, Christopher Vo, Braden Hancock, Alexander J. Ratner, Yang Li, et al.
Nature Communications 2019

SysML: The New Frontier of Machine Learning Systems
Alexander J. Ratner, Dan Alistarh, G. Alonso, D. Andersen, Peter D. Bailis, Sarah Bird, et al.
arXiv.org 2019

MLSys: The New Frontier of Machine Learning Systems
Alexander J. Ratner, Dan Alistarh, G. Alonso, D. Andersen, Peter D. Bailis, Sarah Bird, et al.
2019

Accelerating machine learning with training data management
Alexander J. Ratner
2019

The Role of Massively Multi-Task and Weak Supervision in Software 2.0
Alexander J. Ratner, Braden Hancock, C. Ré
Conference on Innovative Data Systems Research 2019

Snorkel: rapid training data creation with weak supervision
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Fries, Sen Wu, C. Ré
The VLDB journal 2019

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code
E. Bringer, Abraham Israeli, Y. Shoham, Alexander J. Ratner, C. Ré
DEEM@SIGMOD 2019

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
V. Chen, Sen Wu, Zhenzhen Weng, Alexander J. Ratner, C. Ré
Neural Information Processing Systems 2019

AMELIE 2 speeds up Mendelian diagnosis by matching patient phenotype & genotype to primary literature
J. Birgmeier, M. Haeussler, C. A. Deisseroth, E. Steinberg, K. Jagadeesh, Alexander J. Ratner, et al.
bioRxiv 2019

Data science for human well-being
Tim Althoff
2018

I’ll Be Back: On the Multiple Lives of Users of a Mobile Activity Tracking Application
Zhiyuan Jerry Lin, Tim Althoff, J. Leskovec
The Web Conference 2018

Modeling Interdependent and Periodic Real-World Action Sequences
Takeshi Kurashima, Tim Althoff, J. Leskovec
The Web Conference 2018

Psychomotor function measured via online activity predicts motor vehicle fatality risk
Tim Althoff, E. Horvitz, Ryen W. White
npj Digital Medicine 2018

Learning Individualized Cardiovascular Responses from Large-scale Wearable Sensors Data
H. T. Hallgrímsson, Filip Jankovic, Tim Althoff, L. Foschini
arXiv.org 2018

Evaluating Visual Data Analysis Systems: A Discussion Report
L. Battle, M. Angelini, Carsten Binnig, T. Catarci, P. Eichmann, Jean-Daniel Fekete, et al.
HILDA@SIGMOD 2018

2017 Reviewer Thank you
G. Acton, S. Aguiñaga, Sami Al-Rawashdeh, D. Allen, S. Allen, Stephanie Alley, et al.
Western Journal of Nursing Research 2018

A Kernel Theory of Modern Data Augmentation
Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, C. Ré
International Conference on Machine Learning 2018

Snorkel MeTaL: Weak Supervision for Multi-Task Learning
Alexander J. Ratner, Braden Hancock, Jared A. Dunnmon, R. Goldman, C. Ré
DEEM@SIGMOD 2018

Knowledge Base Construction in the Machine-learning Era
Alexander J. Ratner, C. Ré
Queue 2018

Training Complex Models with Multi-Task Weak Supervision
Alexander J. Ratner, Braden Hancock, Jared A. Dunnmon, Frederic Sala, Shreyash Pandey, C. Ré
AAAI Conference on Artificial Intelligence 2018

Research for practice
Alexander J. Ratner, Christopher Ré, Peter D. Bailis
Communications of the ACM 2018

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale
Stephen H. Bach, Daniel Rodríguez, Yintao Liu, Chong Luo, Haidong Shao, Cassandra Xia, et al.
SIGMOD Conference 2018

Modeling Individual Cyclic Variation in Human Behavior
E. Pierson, Tim Althoff, J. Leskovec
The Web Conference 2017

How Gamification Affects Physical Activity: Large-scale Analysis of Walking Challenges in a Mobile Application
A. Shameli, Tim Althoff, A. Saberi, J. Leskovec
The Web Conference 2017

Large-scale physical activity data reveal worldwide activity inequality
Tim Althoff, R. Sosič, J. Hicks, A. King, S. Delp, J. Leskovec
Nature 2017

Harnessing the Web for Population-Scale Physiological Sensing: A Case Study of Sleep and Performance
Tim Althoff, E. Horvitz, Ryen W. White, J. Zeitzer
The Web Conference 2017

Position statement: The case for a visualization performance benchmark
L. Battle, Remco Chang, Jeffrey Heer, M. Stonebraker
Workshop on Data Systems for Interactive Analysis 2017

Behavior-driven optimization techniques for scalable data exploration
L. Battle
2017

Beagle: Automated Extraction and Interpretation of Visualizations from the Web
L. Battle, Peitong Duan, Zachery Miranda, Dana Mukusheva, Remco Chang, M. Stonebraker
International Conference on Human Factors in Computing Systems 2017

Snorkel: Rapid Training Data Creation with Weak Supervision
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Fries, Sen Wu, C. Ré
Proceedings of the VLDB Endowment 2017

Learning to Compose Domain-Specific Transformations for Data Augmentation
Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared A. Dunnmon, C. Ré
Neural Information Processing Systems 2017

Learning the Structure of Generative Models without Labeled Data
Stephen H. Bach, Bryan D. He, Alexander J. Ratner, C. Ré
International Conference on Machine Learning 2017

Snorkel: A System for Lightweight Extraction
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Fries, Sen Wu, C. Ré
Conference on Innovative Data Systems Research 2017

AMELIE accelerates Mendelian patient diagnosis directly from the primary literature
J. Birgmeier, M. Haeussler, C. A. Deisseroth, K. Jagadeesh, Alexander J. Ratner, H. Guturu, et al.
bioRxiv 2017

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data
Jason Alan Fries, Sen Wu, Alexander J. Ratner, Christopher Ré
arXiv.org 2017

Snorkel: Fast Training Set Generation for Information Extraction
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, C. Ré
SIGMOD Conference 2017

Influence of Pokémon Go on Physical Activity: Study and Implications
Tim Althoff, Ryen W. White, E. Horvitz
Journal of Medical Internet Research 2016

Online Actions with Offline Impact: How Online Social Networks Influence Online and Offline User Behavior
Tim Althoff, Pranav Jindal, J. Leskovec
Web Search and Data Mining 2016

Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health
Tim Althoff, Kevin Clark, J. Leskovec
Transactions of the Association for Computational Linguistics 2016

Natural Language Processing for Mental Health: Large Scale Discourse Analysis of Counseling Conversations
Tim Althoff, Kevin Clark, J. Leskovec
arXiv.org 2016

Making Sense of Temporal Queries with Interactive Visualization
L. Battle, Danyel Fisher, R. Deline, Mike Barnett, Badrish Chandramouli, J. Goldstein
International Conference on Human Factors in Computing Systems 2016

Data Warehouse and OLAP : Data Cube : A Relational Aggregation Operator Generalizing Group-By , Cross-Tab , and Sub-Totals
Jennie Duggan, Olga Papaemmanouil, L. Battle, Michael Stonebraker
2016

Dynamic Prefetching of Data Tiles for Interactive Visualization
L. Battle, Remco Chang, M. Stonebraker
SIGMOD Conference 2016

Data programming with DDLite: putting humans in a different part of the loop
Henry R. Ehrenberg, Jaeho Shin, Alexander J. Ratner, Jason Alan Fries, C. Ré
HILDA ‘16 2016

Data Programming: Creating Large Training Sets, Quickly
Alexander J. Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, C. Ré
Neural Information Processing Systems 2016

a ) PaleoDeepDive Dataflow ( b ) Quality Assessment ( c ) Scientific Application : Biodiversity
Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, et al.
2016

DeepDive: Declarative Knowledge Base Construction
Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, et al.
SGMD 2016

TimeMachine: Timeline Generation for Knowledge-Base Entities
Tim Althoff, X. Dong, K. Murphy, Safa Alai, Van Dang, Wei Zhang
Knowledge Discovery and Data Mining 2015

Donor Retention in Online Crowdfunding Communities: A Case Study of DonorsChoose.org
Tim Althoff, J. Leskovec
The Web Conference 2015

Skew-Aware Join Optimization for Array Databases
Jennie Duggan, Olga Papaemmanouil, L. Battle, M. Stonebraker
SIGMOD Conference 2015

Incremental Knowledge Base Construction Using DeepDive
Christopher De Sa, Alexander J. Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, et al.
The VLDB journal 2015

Authorship Attribution in Multi-author Documents
Tim Althoff, D. Britz, Zifei Shan
2014

Pervasive Health
Tim Althoff
Human–Computer Interaction Series 2014

How to Ask for a Favor: A Case Study on the Success of Altruistic Requests
Tim Althoff, Cristian Danescu-Niculescu-Mizil, D. Jurafsky
International Conference on Web and Social Media 2014

Indexing Cost Sensitive Prediction
L. Battle, Edward Benson, Aditya G. Parameswaran, Eugene Wu
arXiv.org 2014

Dynamic Generation and Prefetching of Data Chunks for Exploratory Visualization
L. Battle, Remco Chang, M. Stonebraker
2014

The Case for Data Visualization Management Systems
Eugene Wu, L. Battle, S. Madden
Proceedings of the VLDB Endowment 2014

Leveraging Document Structure for Better Classification of Complex Legal Documents
Alexander J. Ratner
2014

Random Acts of Pizza : Success Factors of Online Requests
Tim Althoff, Niloufar Salehi, Tuan Nguyen
2013

Analysis and forecasting of trending topics in online media streams
Tim Althoff, Damian Borth, Jörn Hees, A. Dengel
ACM Multimedia 2013

Exploiting and Introducing Parallelism for Efficient Object Detection
Hyun-Oh Song, S. Zickler, Tim Althoff, Ross B. Girshick, Christopher Geyer, Mario Fritz, et al.
2013

SciDB DBMS Research at M.I.T
M. Stonebraker, Jennie Duggan, L. Battle, Olga Papaemmanouil
IEEE Data Engineering Bulletin 2013

Interactive visualization of big data leveraging databases for scalable computation
L. Battle
2013

SciDB DBMS Research at
M. Stonebraker, Jennie Duggan, L. Battle, Olga Papaemmanouil
2013

Dynamic reduction of query result sets for interactive visualizaton
L. Battle, M. Stonebraker, Remco Chang
BigData Congress [Services Society] 2013

Sparselet Models for Efficient Multiclass Object Detection
Hyun Oh Song, S. Zickler, Tim Althoff, Ross B. Girshick, Mario Fritz, Christopher Geyer, et al.
European Conference on Computer Vision 2012

Don ‘ t Look Back : Post-hoc Category Detection via Sparse Reconstruction
Hyun Oh Song, Mario Fritz, Tim Althoff, Trevor Darrell
2012

CS246: Mining Massive Datasets
J. Leskovec, Tim Althoff
2012

Detection bank: an object detection based video representation for multimedia event recognition
Tim Althoff, Hyun Oh Song, Trevor Darrell
ACM Multimedia 2012

Balanced Clustering for Content-based Image Browsing
Tim Althoff, A. Ulges
Informatiktage 2011

Automatic example queries for ad hoc databases
Bill Howe, Garrett Cole, Nodira Khoussainova, L. Battle
ACM SIGMOD Conference 2011

Database-as-a-Service for Long-Tail Science
Bill Howe, Garrett Cole, Emad Souroush, Paraschos Koutris, A. Key, Nodira Khoussainova, et al.
International Conference on Statistical and Scientific Database Management 2011

Building the Internet of Things Using RFID: The RFID Ecosystem Experience
Evan Welbourne, L. Battle, Garrett Cole, K. Gould, Kyle Rector, Samuel Raymer, et al.
IEEE Internet Computing 2009

Building the Internet of Things Using Rfid
Evan Welbourne, L. Battle, Garrett Cole, K. Gould, Kyle Rector, Samuel Raymer, et al.
2009

Machine Learning for Health ( ML 4 H )-What Parts of Healthcare are Ripe for Disruption by Machine Learning Right Now ?
Andrew Beam, M. Fiterau, Peter F. Schulam, J. Fries, Michael C. Hughes, Alexander B. Wiltschko, et al.\

Bulletin of the Technical Committee on Data Engineering Special Issue on Scientific Data Management Conference and Journal Notices Editorial Board Editor-in-chief
Michael Stonebraker, Jennie Duggan, L. Battle, Olga Papaemmanouil, Colin Talbert, Marian Talbert, et al.\

Human-Centered Approaches for Provenance in Automated Data Science
∗. AnamariaCrisan, ∗. LarsKotthoff, ∗. MarcStreit, ∗. KaiXu, A. Endert, Alexander Lex, et al.\