Skip to main content
Dr Kobi Leins - Recommended Read - Image by Stas Knop
Dr Kobi Leins - Recommended Read - Image by Engin Akyurt

Playing with the Data: What Legal Scholars Should Learn About Machine Learning

David Lehr & Paul Ohm
Playing with the Data: What Legal Scholars Should Learn About Machine Learning

Legal scholars have begun to focus intently on machine learning — the name for a large family of techniques used for sophisticated new forms of data analysis that are becoming key tools of prediction and decision-making. We think this burgeoning scholarship has tended to treat machine learning too much as a monolith and an abstraction, largely ignoring some of its most consequential stages. As a result, many potential harms and benefits of automated decision-making have not yet been articulated, and policy solutions for addressing those impacts remain underdeveloped.


To fill these gaps in legal scholarship, in this Article we provide a rich breakdown of the process of machine learning. We divide this process roughly into eight steps: problem definition, data collection, data cleaning, summary statistics review, data partitioning, model selection, model training, and model deployment. Far from a straight linear path, most machine learning dances back and forth across these steps, whirling through successive passes of model building and refinement.