CAS CS 591 - Fall 2019 - The Data Science of Electronic Commerce
Syllabus
Course Overview:
Surprisingly, until relatively recently, the data science of electronic commerce
has received relatively little attention in academia.
Beginning with the advent of Internet platforms like eBay that employ online
auctions, are many fascinating and innovative new markets: pay-per-click advertising markets,
prediction markets, and two-sided platforms such as Uber and Airbnb. All of these
application domains draw deeply on established methodologies that are highly familiar
to computer scientists, notably graph theory and algorithms. However, they also build
on theoretical foundations that is often unfamiliar territory to computer scientists,
such as auction design, mechanism design, and the theory of matching markets.
At the same time, these markets provide a unique opportunity: unlike traditional markets,
many aspects of the electronic commerce marketplace are not only publicly
observable, but are readily available for online measurement and data collection.
Therefore, research on questions such as the prevalence of "sniping" on eBay,
the effectiveness of Groupon personalizing its daily deals for subscribers, or
the study of how landlords learn how to price inventory on Airbnb, can all be
evaluated via large-scale measurements, enabling studies that were not previously
possible.
In this class, we will consider the data science of electronic
commerce from a broad and inter-disciplinary perspective,
drawing primarily from insights from the Computer Science,
Economics, and Marketing communities.
This course is designed for students who are potentially
interested in pursuing a career in or conducting research related
to electronic commerce and online platforms.
Our goal will be to focus on quantitative
evaluation of the e-commerce marketplace, and to enable students to conduct
research in this area. Please note that this course is not about
entrepreneurship per se, but will provide useful background for prospective entrepreneurs.
A core competency that we will develop is fluency with big
data: experimental methods; best practices and techniques for data collection, data
mining, and statistical analysis; effective presentation of findings; as well as the
ethics of data collection. The capstone project of the course will be a research
project, conducted in teams of up to three, in which students conduct a quantitative
measurement-driven analysis of a computational aspect of an e-commerce firm or of consumer
behavior with respect to an e-commerce marketplace.
Prerequisites:
This course is designed for CS seniors who have completed all required coursework
except for electives, as well as Masters students and entering Ph.D. students.
While students' backgrounds will vary, it is expected that students have completed or
are nearing completion of an undergraduate major in CS or in an area closely related
to the course topics (such as Economics, or Marketing).
Seniors who are not CS majors should seek the instructor's permission to enroll.
Instructor:
Prof. John W. Byers
Email: byers @ cs . bu . edu [preferred]
Phone: 617-353-8925 [please do not leave voice-mail; use e-mail instead]
Office Hours:
Room: MCS 101C
Open hours: Tues 10 - 11:30
By prior appointment only: Wed 2 - 3:30 [depending on the week, either in MCS 101C, or CAS 115]
Instructor Bio:
Academic: John is Professor of Computer Science at Boston University. His academic research interests are broadly focused on algorithmic and economic aspects of e-commerce, networking, and large-scale data management. His work strikes a balance between
theoretical foundations and rigorous data-driven experimentation.
Entrepreneurial: John is founding Chief Scientist and Board Member at Cogo Labs, a start-up based in Cambridge, MA, where
he has had an executive role since the company's founding in 2005. Cogo leverages a unique proprietary
technology platform for algorithmic marketing, data mining, and quantitative business analytics to guide incubated
portfolio companies from inception to profitability and beyond.
Additional Staff:
Senior Ph.D. student Harshal Chaudhari
has graciously volunteered to be a resource to students in the class, later in the course.
He'll be helping out to advise students on brainstorming, thinking about research directions,
giving technical guidance, and to assist me with project meetings and evaluations.
Class meeting time: Tues/Thurs 2-3:15, MCS B29.
*We hope to use the Hariri Institute Conference Room, MCS 180, when available, and especially
for guest speakers*
Course Requirements and Grading:
There will be three components of the grade in the class:
- In-class discussion of the homework assignments and readings in the course (20%).
- Quizzes and homework assignments (30%).
- Comprehensive, semester-long research project (50%).
For class, we will be drawing on some material from the Easley-Kleinberg textbook (see below), but more often, we will be reading and discussing research papers. I will also be giving some lectures on technical background material for methods used in the papers. In the paper-reading portion of the course, students will be required to read and digest
approximately two papers per week, prior to lecture.
Students will submit short summaries and provide answers to basic questions about the papers prior to discussion.
For each major topic of the course, a group of students chosen in advance will serve as specialists on the topic --
they will be experts on the papers we are discussing, and will be expected to help facilitate the discussion,
brainstorm about research directions, and help with the presentation of the material (or with supplemental material).
We will have periodic short assignments, two in-class quizzes comprising short answer problems, and perhaps a few
longer homework problems.
The capstone project for the course will be a semester-long research project, culminating in a writeup in the
style of a conference paper, and a presentation to the class, which most likely will take the form of a poster
at a class-wide poster session. The topic of the research project will be for students to conduct a
quantitative measurement-driven analysis of a computational aspect of an e-commerce firm or of consumer behavior
with respect to an e-commerce marketplace. Students may work alone or in teams of two, with the expected output
of the teams to be commensurately larger. Suggested project topics and project deadlines will be announced
after the first few weeks of the course. I will expect students in this class to take the project very
seriously and there will be regular interaction with the instructor outside of class to work on the projects ---
ideally, several of the projects in the class will eventually lead to publishable papers. A strong venue
for Computer Science students to target could be the experimental track of
the ACM Symposium on Economics and Computation.
For economics students, the goal of the project would be to write a paper that could develop into a chapter of
the dissertation and potentially a job market paper. Ideally, the ideas in the paper could be developed into
work publishable at a top field or general interest journal.
Course Topics
- Course overview. Technical backdrop; e-commerce challenges and opportunities.
- Data analytics and useful statistical methods.
- Methods and practice of large-scale data collection.
- Auction design and mechanism design.
- User-generated content: mining, valuing, securing.
- Reputation and branding.
- Recommender systems and personalization.
- Social networks and user data.
- Ad auctions and search engine advertising.
- Cookie tracking, targeted advertising.
Readings
-
We will be reading several chapters from the textbook
"Networks, Crowds, and Markets: Reasoning About a Highly Connected World",
David Easley and Jon Kleinberg. Cambridge University Press, 2010. Especially:
Chapter 9: Auctions
Chapter 10: Matching Markets
Chapter 15: Search Engine Advertising
-
A/B testing and running controlled experiments.
"Controlled experiments on the web: survey and practical guide",
R. Kohavi, R. Longbotham, D. Sommerfield, R. Henne, DMKD (18) 140-181, 2009.
- Dan Ariely, Axel Ockenfels, and Alvin E. Roth,
"
An Experimental
Analysis of Ending Rules in Internet Auctions", Rand Journal of Economics, 36, 4, Winter 2005, 891-908.
-
Background on linear regression methods. Theoretical basis for ordinary least squares (OLS) and maximum likelihood estimation (MLE) approaches. Handling binary, categorical and ordinal variables.
-
Michael Luca, "Reviews, Reputation, and Revenue:
The Case of Yelp.com." Harvard Business School Working Paper, No. 12-016, September 2011.
-
The Groupon Effect on
Yelp Ratings: A Root Cause Analysis,
John W. Byers, Michael Mitzenmacher and Georgios Zervas.
In Proc. of the 13th ACM Conference
on Electronic Commerce (EC '12), Valencia, Spain, June 2012.
-
Michael Luca and Georgios Zervas, "Fake
It Till You Make It: Reputation, Competition, and Yelp Review Fraud,"
Management Science 62(12), pp. 3412–3427, 2016.
-
"An Empirical Analysis of Search Engine
Advertising: Sponsored Search in Electronic Markets", by Ghose and Yang, Management Science, 55(10),
2009, pp. 1605-22.
-
"Measuring
and Fingerprinting Click-Spam in Ad Networks", by Dave, Guha, and Zhang,
Proc. of ACM SIGCOMM 2012.
Authors'
talk slides.
- Recommender systems: overview of objectives and basic methods.
The 2005 survey
by Adomavicius and Tuzhilin, IEEE TKDE 17(6), June 2005, still serves as a useful
basic reference.
- Methods used by the BellKor team to win the Netflix
Prize, starting from their KDD '07 paper:
"Modeling
Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems",
by R. Bell, Y. Koren, and C. Volinsky.
Several good slide decks about the Netflix prize are out there, including from
Smyth (UCI)
and
Leskovec (Stanford).
-
"The Rise of the Sharing Economy:
Estimating the Impact of Airbnb on the Hotel Industry", by G. Zervas, D. Proserpio and J. W. Byers,
Journal of Marketing Research, 2017.
- "The
Visible Hand: Race and Online Market Outcomes", by J. Doleac and L. Stein.
-
Prediction Markets.
Using Prediction Markets to Track Information Flows,
by Cowgill, Wolfers, and Zitzewitz.
Material Covered
- [Tues, 9/3]: Course syllabus and overview of E-commerce.
Assignment 1.
- [Thurs, 9/5]: Basics of auction theory, following Easley-Kleinberg Chapter 9. Ascending and descending-price auctions, first and second-price sealed-bid auctions. Auctions as a game: players, payoffs, strategies. Truthfulness as a dominant strategy in 2nd price auctions.
- [Tues, 9/10]: Discussion of slide decks and review of selected HW #1 submissions from the class. Revenue equivalence across auction formats. Deconstructing a research paper, using the Ariely et al paper:
"An Experimental
Analysis of Ending Rules in Internet Auctions".
- [Thurs, 9/12]: Close reading and discussion of the Ariely et al paper. We covered the motivation, the experimental design, the main findings, the implications, and the basis for the probit regression in Appendix I.
- [Tues, 9/17]: Discussion of the
"A/B testing and running controlled experiments" paper, linked above.
Please have a copy handy in class.
- [Thurs, 9/19]: Useful statistical tests, some included in the A/B testing paper, some not.
t-test, z score, experiment duration, Chernoff bounds.
Assignment 2.
(Website A data,
website B data).
- [Tues, 9/24]: Discussion of linear regressions, motivation and why they are
widely used. Simple and multivariate linear regressions, ordinary least squares fit,
interpreting model parameters, examples.
Slide deck posted under Resources on our Piazza page.
- [Thurs, 9/26] The next paper is "Reviews, Reputation, and Revenue:
The Case of Yelp.com", by Michael Luca, linked above. We're reading this one back-to-front, using
the tables and figures as a basis for understanding the contributions.
- [Tues, 10/1] We carefully studied the regression discontinuity design in the Luca paper, then
motiviated the Groupon Effect paper, linked above.
Here's a local version of the paper:
handout.
- [Thurs, 10/3] Groupon effect paper. Slide deck available on Piazza.
- [Tues, 10/8] Course project discussion, based on this
handout.
Start of Chapter 10, Easley-Kleinberg.
- [Thurs, 10/10] Matching markets: basic definitions and setup, market-clearing prices (MCPs),
efficient algorithm to compute MCPs, solution properties. Chapter 10, Easley-Kleinberg.
- [Tues, 10/15] BU on a Monday schedule due to Columbus Day. No class.
- [Thurs, 10/17] Intro to ad auctions. Chapter 15, Easley-Kleinberg.
- [Tues, 10/22] In-class quiz covering class discussions up through 10/3.
Class average: 29/40, stdev = 5.4
- [Thurs, 10/24] Quiz answers.
Ad auctions, continued, from Chapter 15, Easley-Kleinberg. Formalizing VCG and GSP pricing rules, equilibria and outcomes. Additional considerations used by search engines in practice.
- [Tues, 10/29] Discussion of the next paper, describing the buy side of SEM:
"An Empirical Analysis of Search Engine
Advertising: Sponsored Search in Electronic Markets", by Ghose and Yang, Management Science, 55(10),
2009, pp. 1605-22.
- [Thurs, 10/31] Initial project proposals due. Discussion of click spam::
"Measuring
and Fingerprinting Click-Spam in Ad Networks", by Dave, Guha, and Zhang,
Proc. of ACM SIGCOMM 2012.
- [Tues, 11/4] Class time used for project proposal discussions with students.
- [Thurs, 11/6] Project proposal presentations and in-class discussion.
- [Tues, 11/12] I'll be presenting this recent paper:
"The Rise of the Sharing Economy: Estimating the Impact of Airbnb on the Hotel Industry", Georgios Zervas, Davide Proserpio and John W. Byers,
Journal of Marketing Research (JMR), October 2017, Vol. 54, No. 5, pp. 687-705.
Here's a local version of the paper:
airbnb.pdf.
- [Thurs, 11/14] Class was cancelled.
- [Tues, 11/19] Professor Zervas from Questrom will be visiting us and giving a guest lecture on his
recent working paper "The
Welfare Impact of Consumer Reviews: A Case Study of the Hotel Industry".
- [Thurs, 11/21] We'll be wrapping up the Airbnb paper and moving on to recommender systems.
Overview of objectives and basic methods follow this
survey paper
by Adomavicius and Tuzhilin. PDF of paper is in Resources on class Piazza page.
- [Tues, 11/26] Recommender systems.
- [Thurs, 11/28] Thanksgiving.
- [Tues, 12/3] Recommender systems wrapup [Netflix prize slides, on Piazza]. Course evaluations.
Guest lecture on "How to make a great poster". Draft project report due.
- [Thurs, 12/5] In-class quiz 2, covering class discussions up through 11/23, but not material on Quiz 1.
Class average: 33/40, stdev = 4.8
- [Tues, 12/10] Last day of class. We will not have lecture, but will use this time for extended office hours
and to let students finish up and submit posters for printing at FedEx. Reimbursement forms available in CS Dept office.
- [Thurs, 12/12] Poster session in MCS 101-107. Poster stands will be available in John's office, MCS 101C.
- [Tues, 12/17] Our final exam slot. Class has no final exam, but project writeup will be due.
Academic Conduct
Academic standards and the code of academic conduct are taken very seriously
by our university, by the College of Arts and Sciences, and by the Department of
Computer Science. Course participants must adhere to the
CAS Academic
Conduct Code -- please take the time to review this document if you are unfamiliar
with its contents.
Collaboration Policy
The collaboration policy for this class is as follows.
- You are encouraged to
collaborate with one another in studying the textbook and lecture material.
- As long as it satisfies the following conditions, collaboration on the homework assignments is permitted and will not reduce your grade:
- Before discussing each homework problem with anyone
else, you must give it an honest half-hour of serious thought.
- You may discuss ideas and approaches with other students in the class, but not share any
written solutions. In other words, the writeups you submit must be entirely your own work.
You must also acknowledge clearly in the appropriate portion of your solutions
(e.g., at the top of your writeups) people with whom you discussed ideas for that portion.
- You may not work with people outside this class (but come and talk to us if you
have a tutor), seek on-line solutions, get someone else to do it for you, etc.
- You are not permitted to collaborate on exams.
|