CAS CS 591/791 - Fall 2012 - Electronic Commerce

Assignment 2 Instructions (Updated and Simplified)

Datasets:

The datafiles for websites A and B are located at WebsiteA.csv and WebsiteB.csv, respectively.

Each of the datafiles is a CSV (set of comma-separated entries, one per line), where each row corresponds to a customer visit. Website A has 10000 rows, and Website B has 8000.

Each row is a tuple of the format (timestamp, dwell_time, conversion?, revenue) where timestamp is an arrival time in seconds, dwell time is the time the user spent on site in seconds, conversion? is a boolean indicating whether or not the user converted, and revenue is a revenue amount (in cents) for converting users. I kept everything integral for simplicity, but your routines may use floating point, of course.

Example Sage notebook:

Recall that we took a look at an example Sage notebook in class. Just download this .sws file, fire up Sage Notebook, click the File option, and upload the worksheet.

Getting started:

To quickly import the data into your Sage notebook, create a new notebook, and use the Data... pulldown to upload WebsiteA.csv and WebsiteB.csv, which will then automatically be linked to your worksheet. Now in the first worksheet cell, try the following to scan through your data:

import csv

o = csv.reader(open(DATA+'WebsiteA.csv'))

for row in o:
   print row

Python reference manual and tutorial available at docs.python.org.