Using public data to measure diversity in computer science research communities: A critical data governance perspective
Encouraging and supporting diversity and inclusion in computer science research communities is a critical issue for many reasons, including the ethical and robust design, delivery and publication of research that addresses real-world situations ranging from the use of digital tools in health to predictive policing to workplace hiring practices, just to name a few. One way to measure diversity is to apply analytical research methods to data sourced from the public domain for use in research. However, attempts to measure diversity using public data may themselves raise legal and ethical questions about the provenance of the data, research methods adopted, and treatment of diversity in the publication of results. This article interrogates the challenges of measuring diversity using public data, examining an illustrative case study framed around an academic research project at an Australian university using a public data set to identify gender representation in computer science communities. Employing a critical data governance perspective, we point to a range of ethical and legal concerns and recommend greater regulatory guardrails to better balance public interests in research and the privacy, data protection and other ethical interests of research subjects.