{ "cells": [ { "cell_type": "markdown", "id": "5178af69-b066-4729-a89c-2633cae8cded", "metadata": { "id": "5178af69-b066-4729-a89c-2633cae8cded" }, "source": [ "# CS640 Project Writeup\n", "\n", "In this project, you will work on a Kaggle competition titled [ISIC 2024 - Skin Cancer Detection with 3D-TBP](https://www.kaggle.com/competitions/isic-2024-challenge/overview). This is a binary classification task in which you need to predict if the patient has skin cancer. The competition is already close, but we will explore the rich (training) dataset it provides in this class.\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "bb7ed497-90a5-4d68-8210-ac90d630e2fd", "metadata": { "id": "bb7ed497-90a5-4d68-8210-ac90d630e2fd" }, "source": [ "## Data\n", "\n", "We will be using the training dataset from the original competition for this class project. The dataset has been downloaded and preprocessed. You can find it on SCC at */projectnb/cs640grp/materials/ISIC-2024_CS640*. **You should use this downloaded dataset only, not the original one on the website.**\n", "\n", "The directory looks like the following." ] }, { "cell_type": "code", "execution_count": null, "id": "579e20cd-7ee7-4df8-8e01-e37d5e5969d1", "metadata": { "id": "579e20cd-7ee7-4df8-8e01-e37d5e5969d1", "outputId": "7017652e-c18f-4023-c45d-c8fa99d7fa1e" }, "outputs": [ { "data": { "text/plain": [ "['test_metadata.csv',\n", " 'submission.csv',\n", " 'train_metadata.csv',\n", " 'train_image',\n", " 'test_image']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "\n", "project_dir = os.path.join(os.sep, 'projectnb', 'cs640grp', 'materials', 'ISIC-2024_CS640')\n", "os.listdir(project_dir)" ] }, { "cell_type": "markdown", "id": "8f03792d-b675-4c2f-a424-54b3a4cb1921", "metadata": { "id": "8f03792d-b675-4c2f-a424-54b3a4cb1921" }, "source": [ "The CSV files store a list of attributes for each sample, and the image folders store a JPEG image per sample. The image names are sample IDs which can be found in the correpsonding CSV files. The submission.csv file is a template of your submission.\n", "\n", "Let's first take a peek into the CSV files and a few sample images.\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "2358a162-9760-4798-85ec-9630c819b87b", "metadata": { "id": "2358a162-9760-4798-85ec-9630c819b87b" }, "source": [ "### Training Metadata\n", "\n", "Note that in the metadata file, the **target** column is the label column." ] }, { "cell_type": "code", "execution_count": null, "id": "b6262145-3a06-4397-ac0c-ddb1d4ebf0d1", "metadata": { "id": "b6262145-3a06-4397-ac0c-ddb1d4ebf0d1", "outputId": "8e47a84c-220d-4714-8914-1c05bc39fba6" }, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "target | \n", "age_approx | \n", "sex | \n", "anatom_site_general | \n", "clin_size_long_diam_mm | \n", "tbp_tile_type | \n", "tbp_lv_A | \n", "tbp_lv_Aext | \n", "tbp_lv_B | \n", "... | \n", "tbp_lv_norm_color | \n", "tbp_lv_perimeterMM | \n", "tbp_lv_radial_color_std_max | \n", "tbp_lv_stdL | \n", "tbp_lv_stdLExt | \n", "tbp_lv_symm_2axis | \n", "tbp_lv_symm_2axis_angle | \n", "tbp_lv_x | \n", "tbp_lv_y | \n", "tbp_lv_z | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "55.0 | \n", "male | \n", "upper extremity | \n", "2.58 | \n", "3D: white | \n", "21.989610 | \n", "18.149720 | \n", "26.138980 | \n", "... | \n", "3.207238 | \n", "7.162229 | \n", "1.181736 | \n", "2.552678 | \n", "2.169827 | \n", "0.230000 | \n", "45 | \n", "-439.338600 | \n", "1230.412000 | \n", "22.647890 | \n", "
1 | \n", "1 | \n", "0 | \n", "50.0 | \n", "female | \n", "posterior torso | \n", "2.90 | \n", "3D: XP | \n", "21.153528 | \n", "17.243578 | \n", "28.471102 | \n", "... | \n", "2.749542 | \n", "7.242474 | \n", "1.014255 | \n", "2.979940 | \n", "1.937938 | \n", "0.292453 | \n", "65 | \n", "-59.504822 | \n", "1047.626465 | \n", "109.244873 | \n", "
2 | \n", "2 | \n", "0 | \n", "40.0 | \n", "female | \n", "lower extremity | \n", "4.38 | \n", "3D: XP | \n", "20.569130 | \n", "14.896040 | \n", "24.978840 | \n", "... | \n", "4.339059 | \n", "14.451710 | \n", "1.233737 | \n", "5.317332 | \n", "1.839798 | \n", "0.158025 | \n", "0 | \n", "-223.811100 | \n", "770.993000 | \n", "29.067170 | \n", "
3 | \n", "3 | \n", "0 | \n", "50.0 | \n", "female | \n", "upper extremity | \n", "2.76 | \n", "3D: white | \n", "23.365559 | \n", "18.483379 | \n", "30.853418 | \n", "... | \n", "1.650849 | \n", "7.870664 | \n", "0.496438 | \n", "2.770145 | \n", "2.381648 | \n", "0.254237 | \n", "90 | \n", "-440.008942 | \n", "1140.614502 | \n", "-14.935974 | \n", "
4 | \n", "4 | \n", "0 | \n", "60.0 | \n", "male | \n", "posterior torso | \n", "3.31 | \n", "3D: XP | \n", "23.061540 | \n", "18.730060 | \n", "29.790280 | \n", "... | \n", "4.174303 | \n", "10.950840 | \n", "1.521283 | \n", "1.608716 | \n", "1.997881 | \n", "0.461111 | \n", "0 | \n", "-108.822000 | \n", "1215.113000 | \n", "-101.404500 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
320842 | \n", "320842 | \n", "0 | \n", "70.0 | \n", "NaN | \n", "posterior torso | \n", "3.60 | \n", "3D: XP | \n", "19.985520 | \n", "15.393206 | \n", "35.482277 | \n", "... | \n", "1.317773 | \n", "9.854949 | \n", "0.359343 | \n", "1.018512 | \n", "2.205429 | \n", "0.307692 | \n", "115 | \n", "30.655060 | \n", "1204.034302 | \n", "165.979797 | \n", "
320843 | \n", "320843 | \n", "0 | \n", "45.0 | \n", "male | \n", "posterior torso | \n", "5.88 | \n", "3D: white | \n", "17.846150 | \n", "11.566220 | \n", "24.022090 | \n", "... | \n", "6.996178 | \n", "16.388990 | \n", "1.827318 | \n", "8.247600 | \n", "2.282445 | \n", "0.208081 | \n", "140 | \n", "-109.284200 | \n", "1228.212000 | \n", "155.480600 | \n", "
320844 | \n", "320844 | \n", "0 | \n", "40.0 | \n", "male | \n", "anterior torso | \n", "11.41 | \n", "3D: XP | \n", "16.364410 | \n", "6.870663 | \n", "20.882192 | \n", "... | \n", "3.671125 | \n", "28.208751 | \n", "1.136926 | \n", "3.310037 | \n", "1.509960 | \n", "0.181329 | \n", "0 | \n", "-170.062561 | \n", "1129.213257 | \n", "28.841248 | \n", "
320845 | \n", "320845 | \n", "0 | \n", "40.0 | \n", "male | \n", "lower extremity | \n", "4.02 | \n", "3D: XP | \n", "13.500010 | \n", "10.076300 | \n", "23.654770 | \n", "... | \n", "2.443795 | \n", "11.177810 | \n", "0.847317 | \n", "2.623507 | \n", "3.329334 | \n", "0.401914 | \n", "10 | \n", "249.819500 | \n", "254.294600 | \n", "55.758790 | \n", "
320846 | \n", "320846 | \n", "0 | \n", "50.0 | \n", "male | \n", "anterior torso | \n", "3.15 | \n", "3D: XP | \n", "17.482757 | \n", "12.255344 | \n", "34.196833 | \n", "... | \n", "2.113237 | \n", "8.872541 | \n", "0.645124 | \n", "3.542404 | \n", "2.282705 | \n", "0.525926 | \n", "20 | \n", "-128.891724 | \n", "1038.359863 | \n", "-63.770935 | \n", "
320847 rows × 41 columns
\n", "\n", " | id | \n", "target | \n", "age_approx | \n", "sex | \n", "anatom_site_general | \n", "clin_size_long_diam_mm | \n", "tbp_tile_type | \n", "tbp_lv_A | \n", "tbp_lv_Aext | \n", "tbp_lv_B | \n", "... | \n", "tbp_lv_norm_color | \n", "tbp_lv_perimeterMM | \n", "tbp_lv_radial_color_std_max | \n", "tbp_lv_stdL | \n", "tbp_lv_stdLExt | \n", "tbp_lv_symm_2axis | \n", "tbp_lv_symm_2axis_angle | \n", "tbp_lv_x | \n", "tbp_lv_y | \n", "tbp_lv_z | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "NaN | \n", "30.0 | \n", "male | \n", "upper extremity | \n", "2.52 | \n", "3D: white | \n", "20.739760 | \n", "17.346250 | \n", "23.604410 | \n", "... | \n", "2.013941 | \n", "9.113276 | \n", "0.793600 | \n", "1.368380 | \n", "3.130576 | \n", "0.392593 | \n", "85 | \n", "-352.631000 | \n", "1024.501000 | \n", "21.431270 | \n", "
1 | \n", "1 | \n", "NaN | \n", "75.0 | \n", "male | \n", "upper extremity | \n", "2.63 | \n", "3D: white | \n", "21.498600 | \n", "17.128050 | \n", "26.919320 | \n", "... | \n", "3.554856 | \n", "6.968501 | \n", "1.322546 | \n", "2.980941 | \n", "2.610491 | \n", "0.342857 | \n", "150 | \n", "317.008100 | \n", "1296.112000 | \n", "85.410520 | \n", "
2 | \n", "2 | \n", "NaN | \n", "30.0 | \n", "male | \n", "lower extremity | \n", "18.31 | \n", "3D: XP | \n", "21.261867 | \n", "15.949655 | \n", "36.927874 | \n", "... | \n", "3.685572 | \n", "67.921989 | \n", "1.323685 | \n", "1.912243 | \n", "3.394053 | \n", "0.385400 | \n", "145 | \n", "-185.792664 | \n", "680.623718 | \n", "-21.791901 | \n", "
3 | \n", "3 | \n", "NaN | \n", "45.0 | \n", "female | \n", "upper extremity | \n", "3.55 | \n", "3D: XP | \n", "21.087236 | \n", "15.657230 | \n", "31.419333 | \n", "... | \n", "2.082827 | \n", "10.582854 | \n", "0.691356 | \n", "1.349557 | \n", "1.570233 | \n", "0.250000 | \n", "155 | \n", "443.583984 | \n", "1213.412598 | \n", "39.409851 | \n", "
4 | \n", "4 | \n", "NaN | \n", "55.0 | \n", "male | \n", "anterior torso | \n", "7.06 | \n", "3D: white | \n", "22.121790 | \n", "14.444030 | \n", "30.308130 | \n", "... | \n", "3.691011 | \n", "19.856620 | \n", "0.989644 | \n", "3.126280 | \n", "2.467318 | \n", "0.227068 | \n", "70 | \n", "-162.127900 | \n", "1043.082000 | \n", "-44.661830 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
80207 | \n", "80207 | \n", "NaN | \n", "75.0 | \n", "male | \n", "posterior torso | \n", "2.88 | \n", "3D: white | \n", "20.565030 | \n", "15.228920 | \n", "30.234170 | \n", "... | \n", "1.458585 | \n", "8.111398 | \n", "0.618510 | \n", "2.274461 | \n", "1.914292 | \n", "0.232000 | \n", "10 | \n", "-72.637900 | \n", "1487.536000 | \n", "138.852900 | \n", "
80208 | \n", "80208 | \n", "NaN | \n", "50.0 | \n", "male | \n", "upper extremity | \n", "4.20 | \n", "3D: white | \n", "16.314590 | \n", "14.611030 | \n", "25.403000 | \n", "... | \n", "1.941789 | \n", "11.952720 | \n", "0.599103 | \n", "1.422653 | \n", "2.196585 | \n", "0.331522 | \n", "45 | \n", "-477.687100 | \n", "1121.040000 | \n", "-38.915830 | \n", "
80209 | \n", "80209 | \n", "NaN | \n", "40.0 | \n", "female | \n", "upper extremity | \n", "2.90 | \n", "3D: XP | \n", "21.597580 | \n", "17.705739 | \n", "27.266721 | \n", "... | \n", "3.355798 | \n", "8.872541 | \n", "1.076741 | \n", "3.248064 | \n", "1.624508 | \n", "0.359477 | \n", "25 | \n", "442.464355 | \n", "1128.834351 | \n", "31.510681 | \n", "
80210 | \n", "80210 | \n", "NaN | \n", "75.0 | \n", "male | \n", "posterior torso | \n", "3.32 | \n", "3D: white | \n", "22.596327 | \n", "20.186998 | \n", "30.480790 | \n", "... | \n", "0.000000 | \n", "9.033031 | \n", "0.000000 | \n", "1.321416 | \n", "2.082772 | \n", "0.495050 | \n", "10 | \n", "-110.265747 | \n", "1429.494385 | \n", "156.874146 | \n", "
80211 | \n", "80211 | \n", "NaN | \n", "70.0 | \n", "male | \n", "posterior torso | \n", "3.14 | \n", "3D: white | \n", "19.856977 | \n", "16.932049 | \n", "22.454427 | \n", "... | \n", "3.738181 | \n", "9.340242 | \n", "1.366712 | \n", "3.948254 | \n", "2.571864 | \n", "0.313609 | \n", "55 | \n", "-126.982124 | \n", "1117.368408 | \n", "166.916687 | \n", "
80212 rows × 41 columns
\n", "\n", " | id | \n", "target | \n", "
---|---|---|
0 | \n", "0 | \n", "NaN | \n", "
1 | \n", "1 | \n", "NaN | \n", "
2 | \n", "2 | \n", "NaN | \n", "
3 | \n", "3 | \n", "NaN | \n", "
4 | \n", "4 | \n", "NaN | \n", "
... | \n", "... | \n", "... | \n", "
80207 | \n", "80207 | \n", "NaN | \n", "
80208 | \n", "80208 | \n", "NaN | \n", "
80209 | \n", "80209 | \n", "NaN | \n", "
80210 | \n", "80210 | \n", "NaN | \n", "
80211 | \n", "80211 | \n", "NaN | \n", "
80212 rows × 2 columns
\n", "