Datasets:
The datafiles for websites A and B are located at
WebsiteA.csv and WebsiteB.csv,
respectively.
Each of the datafiles is a CSV (set of comma-separated entries, one per line), where
each row corresponds to a customer visit.
Website A has 10000 rows, and Website B has 8000.
Each row is a tuple of the format (timestamp, dwell_time, conversion?, revenue) where
timestamp is an arrival time in seconds, dwell time is the time the user spent on site
in seconds,
conversion? is a boolean indicating whether or not the user converted, and
revenue is a revenue amount (in cents) for converting users.
I kept everything integral for simplicity, but your routines may use floating point,
of course.
Example Sage notebook:
Recall that we took a look at an
example Sage notebook in class.
Just download this .sws file, fire up Sage Notebook, click the File option,
and upload the worksheet.
Getting started:
To quickly import the data into your Sage notebook, create a new notebook,
and use the Data... pulldown to upload WebsiteA.csv and WebsiteB.csv, which
will then automatically be linked to your worksheet. Now in the first
worksheet cell, try the following to scan through your data:
import csv o = csv.reader(open(DATA+'WebsiteA.csv')) for row in o: print row