CS 5024 Ethics and Professionalism in Computer (Data) Science

Fall, 2020
Instructor: Chang-Tien Lu
Meeting Time: Monday 4:00-6:45 PM
Office: NVC 312
Office Hour: Tuesday 11AM-noon, Thursday 4-5PM, or by appointment

Course Synopsis: As computing, data science, and machine learning technologies become more pervasive, we must grapple with the ethical implications and consequences of algorithmic decision making. This is an opportune time to be studying ethical issues in data science because seldom a week goes by without an egregious instance of ethical violations and abuse by Silicon Valley companies coming to light. This course will cover frameworks to help you study ethical issues in data science, case studies of current happenings, and ideas on practicing professionalism in data science. As in similar thorny issues, there is sometimes no right or wrong answer and this course will help you reason about the underlying issues better.

Topics covered include: privacy, disclosure, and security implications; confidentiality and declassification of data; record linkage algorithms; data uniqueness and profiling; privacy-preserving data mining; unintended consequences of data mining; ethical boundaries in conducting large scale studies and making inferences; responsible behavior as a data scientist; case studies and critical analysis of scenarios from science, engineering, social media, law, business, humanities, and other practical contexts.

Tentative Course Schedule

Week Date Lecture Topics Due
1 8/24 Introduction  
2 8/31

Defining our scope of study/ Ethical Frameworks
R. Abebe, S. Barocas, J. Kleinberg, K. Levy, M. Raghavan, D. Robinson. Roles for Computing in Social Change. Proc. ACM Conference on Fairness, Accountability, and Transparency (FAT*), 2020.
Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights, Executive Office of the President, May 2016.
M. Luca, J. Kleinberg, and S. Mullainathan, Algorithms need Managers Too, Harvard Business Review, Jan-Feb 2016.
Available at: https://hbr.org/2016/01/algorithms-need-managers-too
E. Vayena, U. Gasser, A. Wood, D. O'Brien, and M. Altman. 2016. Elements of a New Ethical Framework for Big Data Research. Washington and Lee Law Review 72 (3): Article 5.
Tene and J. Polonetsky, Taming the Golem: Challenges of Ethical Algorithmic Decision Making, North Carolina Journal of Law and Technology, Jun 2017.

Research Presentation:
Veale, Michael, Max Van Kleek, and Reuben Binns. "Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making." Proceedings of the 2018 chi conference on human factors in computing systems. 2018.

Bostrom, Nick, and Eliezer Yudkowsky. "The ethics of artificial intelligence." The Cambridge handbook of artificial intelligence 1 (2014): 316-334.
Jordan, M. I. (2019). Artificial Intelligence - The Revolution Hasn’t Happened Yet. Harvard Data Science Review, 1(1).
Cath, Corinne. "Governing artificial intelligence: ethical, legal and technical opportunities and challenges." (2018): 20180080.

Q1, Q2
3 9/7 Labor Day (No Class)  
4 9/14

Background about data science/overview of CS/ML scenarios
Sacrificial Dilemmas, Moral Judgements, The Trolley Problem (e.g., see: http://moralmachine.mit.edu/)
Edmond Awad, Sohan Dsouza,  Azim Shariff, Iyad Rahwan, and Jean-François Bonnefon, Universals and variations in moral decisions made in 42 countries by 70,000 participants, PNAS, Jan 2020.
P. Domingos, “A Few Useful Things to Know About Machine Learning”, Communications of the ACM, Vol. 55, No. 10, Oct 2012.

Research Presentation:
Zhu, Xiaojin, et al. "An overview of machine teaching." arXiv preprint arXiv:1801.05927 (2018).
Javaid Nabi. “Machine Learning — Fundamentals.”  Medium, towards data science, 2018.
Sangeet Moy Das. Machine Learning 101 for Dummies like Me. Medium, 2019.
Qiu, Junfei, et al. "A survey of machine learning for big data processing." EURASIP Journal on Advances in Signal Processing 2016.1 (2016): 67.

5 9/21

Discrimination in Data Science, Protected Categories, and Potential Solutions
C.C. Miller, When algorithms discriminate, New York Times, July 9, 2015.
Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in neural information processing systems (pp. 3315-3323). 
S. DeoDeo, Wrong side of the tracks: Big Data and Protected Categories, https://arxiv.org/abs/1412.4643
M. Wattenberg, F. Viegas, M. Hardt, Attacking discrimination with smarter machine learning, Google Research Blog post, https://research.google.com/bigpicture/attacking-discrimination-in-ml/

Research Presentation
Veale, Michael, and Reuben Binns. "Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data." Big Data & Society 4.2 (2017): 2053951717743530.
Lepri, Bruno, et al. "Fair, transparent, and accountable algorithmic decision-making processes." Philosophy & Technology 31.4 (2018): 611-627.
Erica Kochi. How to Prevent Discriminatory Outcomes in Machine Learning. Medium, 2018.
Lee, N. T. (2018). Detecting racial bias in algorithms and machine learning. Journal of Information, Communication and Ethics in Society.

Project Proposal
6 9/28

Predictive Policing
A.Mardigal, Predictive Policing: the future of crime-fighting, or the future of racial profiling?
S. Brayne, A. Rosenblat, and D. Boyd, Predictive Policing,
Available at: http://www.datacivilrights.org/pubs/2015-1027/Predictive_Policing.pdf
K. Lum and W. Isaac, To Predict and Serve, Significance, Oct 2016. Available at:

Research Presentation:
Kutnowski, Moish. "The ethical dangers and merits of predictive policing." Journal of community safety and well-being 2.1 (2017).
Mara Hvistendahi. “Can ‘predictive policing’ prevent crime before it happens?” AAAS, 2016.
Nushi, Besmira, Ece Kamar, and Eric Horvitz. "Towards accountable AI: Hybrid human-machine analyses for characterizing system failure." arXiv preprint arXiv:1809.07424 (2018).
Asaro, Peter M. "AI ethics in predictive policing: From models of threat to an ethics of care."
IEEE Technology and Society Magazine 38.2 (2019): 40-53.

7 10/5

Criminal Sentencing and Bail Decisions
J. Angwin, J. Larson, S. Mattu, and L. Kirchner, Machine BiasS
J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan, Human Decisions and Machine Predictions
Available at: http://www.cs.cornell.edu/home/kleinber/w23180.pdf
J. Dressel and H. Farid, The accuracy, fairness, and limits of predicting recidivism, Science Advances, 17 Jan 2018.

Research Presentation:
Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1.5 (2019): 206-215.
Berk, Richard, and Jordan Hyatt. "Machine learning forecasts of risk to inform sentencing decisions." Federal Sentencing Reporter 27.4 (2015).
Kleinberg, Jon, et al. "Human decisions and machine predictions." The quarterly journal of economics 133.1 (2018): 237-293.

8 10/12

Privacy and Anonymity
B. Schneir, Anonymity and the Netflix Dataset, Wired.com,
N. Ramakrishnan, B. J Keller, B. Mirza, A. Grama, G.  Karypis, Privacy Risks in Recommender Systems, IEEE Internet Computing, November 2001. 
A.Tockar, Differential Privacy: The Basics,
A.Tockar, Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset

Research Presentation:
Tianqing Zhu. Explainer: what is differential privacy and how can it protect your data? The Conversation, 2018.
Mathew, Binny, et al. "Deep Dive into Anonymity: Large Scale Analysis of Quora Questions." International Conference on Social Informatics. Springer, Cham, 2019.
E. Horvitz, D. Mulligan. "Data, privacy, and the greater good" Science,17 Jul 2015, Vol. 349, Issue 6245, pp. 253-255. DOI: 10.1126/science.aac4520
Mooney, Stephen J., and Vikas Pejaver. "Big data in public health: terminology, machine learning, and privacy." Annual review of public health 39 (2018): 95-112.
A. Narayanan and V. Shmatikov, "Robust De-anonymization of Large Sparse Datasets," 2008 IEEE Symposium on Security and Privacy (sp 2008), 2008, pp. 111-125, doi: 10.1109/SP.2008.33.

9 10/19

Gender Bias and Potential Solutions
A.Caliskan, J.J. Bryson, and A. Narayanan, Semantics Derived Automatically from Language Corpora Contain Human-like Biases, Science, 14 Apr 2017. Available at http://science.sciencemag.org/content/356/6334/183.full
T. Bolukbasi, K-W. Chang, J. Zou, V. Saligrama, and A. Kalai, Man Is to Computer Programmer as Woman Is to Homemaker? Available at https://arxiv.org/abs/1607.06520
J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K-W. Chang, Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. Available at https://arxiv.org/abs/1707.09457

Research Presentation:
Jeffrey Dastin. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, 2018.
Sun, Tony, et al. "Mitigating gender bias in natural language processing: Literature review." Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Month: July, 2019.
Zhao, Jieyu, et al. "Gender bias in coreference resolution: Evaluation and debiasing methods." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2, June 2018.
J. Feine, U. Gnewuch, S. Morana, A. Maedche. "Gender Bias in Chatbot Design." Chatbot Research and Design: Third International Workshop, CONVERSATIONS 2019, Amsterdam, The Netherlands, November 19-20, 2019, Revised Selected Papers. Vol. 11970. Springer Nature, 2020.

Project Checkpoint
10 10/26 Project Checkpoint Presentation
11 11/2

The Surveillance Economy
S. Muthiah et al., EMBERS at 4 years: Experiences operating an Open Source Indicators Forecasting System
Available at: http://people.cs.vt.edu/naren/papers/embersExp_kdd16.pdf.
R. Botsman, Big data meets Big Brother as China moves to rate its citizens, Wired Magazine, Available at: http://www.wired.co.uk/article/chinese-government-social-credit-score-privacy-invasion
L. Sweeney, Discrimination in Online Ad Delivery,
Available at: https://arxiv.org/abs/1301.6822.

Research Presentation:
VIVUFO. The United States Mass Surveillance is a Necessary Evil. 2017.
Allen, Chris, et al. "Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza." PloS one 11.7 (2016): e0157734.
Santillana, Mauricio, et al. "Combining search, social media, and traditional data sources to improve influenza surveillance." PLoS Comput Biol 11.10 (2015): e1004513.

CITI, VT Training (Released 10/26)
12 11/9 Social Media, Auditing Internet Platforms, Terms of Service
R.M. Bond, A 61-million-person experiment in social influence
and political mobilization, Nature. Available at: http://fowler.ucsd.edu/massive_turnout.pdf
A.D.I. Kramer et al., Experimental evidence of massive-scale emotional contagion through social networks, PNAS
Available at: http://www.pnas.org/content/111/24/8788.full
C. Sandvig, K. Hamilton, K. Karahailos, and C. Langbort, Auditing Algorithms: 
Research Methods for Detecting Discrimination on Internet Platforms

E. Felten, On the Ethics of A/B Testing. Available at: https://freedom-to-tinker.com/2014/07/08/on-the-ethics-of-ab-testing/
S. Zhang, Scientists are just as confused about the ethics of big data research as you, Wired
Available at: https://www.wired.com/2016/05/scientists-just-confused-ethics-big-data-research/

Research Presentation:
Jay McGregor. Reddit Is Being Manipulated by Big Financial Services Companies. Forbes, 2017.
Zubiaga, Arkaitz, et al. "Analysing how people orient to and spread rumours in social media by looking at conversational threads." PloS one 11.3 (2016): e0150989.
Aigerim Berzinya. Data Privacy in Social Media: Who Takes Responsibility and Data Protection as a Priority Feature. Turtler, 2018.
13 11/16 Best Practices/Law
Guest Lecture 1: Raimundo Dos Santos (U.S. Army Corps of Engineers)
Guest Lecture 2: Abdulaziz Alhamadani (Virgina Tech)
Guest Lecture 3: Shailik Sarkar (Virginia Tech)
FATML Consortium, Principles for Accountable Algorithms
Available at: https://www.fatml.org/resources/principles-for-accountable-algorithms
M. Zook et al., Ten Simple Rules for Responsible Big Data Research, PLoS Computational Biology, March 30, 2017
Available at: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005399

Research Presentation:
Ethical ML Network (BETA). “The Responsible Machine Learning Principles.” The Institute for Ethical AI & Machine Learning. 
Morley, Jessica, et al. "From What to How: An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices," Science and Engineering Ethics, Vol. 26, pp. 21412168, 2020.
D. Greene, A. L. Hoffmann, and L. Stark. "Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning." Proceedings of the 52nd Hawaii International Conference on System Sciences. 2019.
14 11/23 Thanksgiving holiday (No Class)  
15 11/30 Project Presentation - Part I
SPOT Evaluation
16 12/7 Project Presentation - Part II CITI, VT Training
(Due 12/7)
Project Report (Due 12/14, 4PM)


20% graduate school required reading and quizzes
30% discussion participation on Canvas
15% in-class research presentation
35% final project (10%: final presentation, final report: 25%)
5% in-class discussion and participation

Online class participation: 

For each week, online class participation is scored based on the number of “substantial interactions” (defined below) on Canvas pertaining to that week’s material. A ‘week” here is defined as beginning from Monday 7pm of week x (i.e., just after class) to Monday 8am of week x+1. Students are expected to discuss week x+1’s material on the Discussion board in Canvas. Since this is an evaluative component, the instructor will not participate in the online Canvas discussion. If you are the first person to contribute to the discussion, please begin a new thread for that week’s discussion and others should add to the same thread (i.e., not open a new thread). You can interact anytime during that week (note the definition of week above). The first scored week runs from Aug 24 to Aug 31.

Interactions should be substantial, not superficial. A superficial interaction is one that simply agrees with that week’s material, regurgitates or summarizes content, or otherwise appears like an AI could have done it. A substantial interaction is one for which notable thought and effort has been put in. It can take various forms, e.g., i) it takes a stance or contributes an opinion or judges the week’s material either in a positive or negative light with supporting information, ii) it identifies external URLs, papers, examples, current happenings, references related to the week’s material and argues for their relevance, or iii) builds upon another contributor’s interaction in a substantial manner.

No judgment is being made on the stances you take – you are merely evaluated on your ability to reason about ethical and professional issues and provide evidence for your opinions. For instance, n-1 students can agree with a paper’s view and the nth student might disagree. The n-1 students might think some idea is risky but the nth student might view that the risks are worth the rewards, and as long as everybody argues diligently, they are given equal credit.

Scoring happens every week on a scale of 0-5. Scores for all weeks are added together. The final score is then scaled to meet the 40% online class participation above. An absence of participation for a week will constitute as a zero for that week.

In-class presentation: 

Students should sign-up one presentations to present that week’s reading material + discussions during the class session from 5:20-6:45pm. A Google spreadsheet for this purpose is at: https://tinyurl.com/cs5024signups.

Final project:

The final project is intended to be done in groups of 2-3 people. Students must choose a topic that is related to current happenings and that has not been discussed in class or that will not be discussed in class in the future, per the schedule. The project report must ideally involve a discussion of the ethical considerations, some data analysis and some recommendations. 


Any student that is in need of special accommodations due to a disability, as recognized by the Americans with Disabilities Act, should contact the Services for Students with Disabilities (SSD) in the Dean of Students Office. Students with disabilities are responsible for self-identification.  To be eligible for services, documentation of the disability from a qualified professional must be presented to SSD upon request. Academic adjustments may include, but are not limited to: priority registration, auxiliary aids, program and course adjustment, exam modifications, oral or sign language interpreters, cassette taping of text/materials, note takers/readers, or assistive technology.

If you need adaptation or accommodations because of a disability (learning disability, attention deficit disorder, psychological, physical, etc.), if you have emergency medical information to share with me, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible. If you need captioning for videos, please let me know no later than two weeks in advance of date on syllabus for reviewing.