ML/AI Image Classifier for Skin Cancer Detection
Skin cancer is one of the most active types of cancer in the present decade. As the skin is the body’s largest organ, the point of considering skin cancer as the most common type of cancer among humans is understandable. It is generally classified into two major categories: nonmelanoma (benign)
and melanoma (malignant) skin cancer
Melanoma type of cancers can only be cured if diagnosed early; otherwise, they spread to other body parts and lead to the victim’s painful death.
Therefore, the critical factor in skin cancer treatment is early diagnosis.
Yet, diagnoses is still a visual process, which relies on the long-winded procedure of clinical screenings, followed by dermoscopic analysis, and then a biopsy and finally a histopathological examination. This process easily takes months and the need for many medical professionals and still is only ~77% accurate.
Current methods using AI and Deep Learning to diagnose lesions show potential to spare time and mitigate errors- saving millions of lives in the long run.
Using TensorFlow library in Python, we can implement an image recognition skin disease classifier that tries to distinguish between benign (nevus and seborrheic keratosis) and malignant (melanoma) skin diseases from only photographic 2D RGB images, as shown above.
Step 1: Installing and Importing Essential Libraries
!pip3 install tensorflow tensorflow_hub matplotlib seaborn numpy pandas sklearn imblearn
Collecting tensorflow Using cached tensorflow-2.8.0-cp39-cp39-win_amd64.whl (438.0 MB) Collecting tensorflow_hub Downloading tensorflow_hub-0.12.0-py2.py3-none-any.whl (108 kB) Requirement already satisfied: matplotlib in c:\users\adrou\anaconda3\lib\site-packages (3.4.3) Requirement already satisfied: seaborn in c:\users\adrou\anaconda3\lib\site-packages (0.11.2) Requirement already satisfied: numpy in c:\users\adrou\anaconda3\lib\site-packages (1.20.3) Requirement already satisfied: pandas in c:\users\adrou\anaconda3\lib\site-packages (1.4.2) Collecting sklearn Downloading sklearn-0.0.tar.gz (1.1 kB) Collecting imblearn Downloading imblearn-0.0-py2.py3-none-any.whl (1.9 kB) Collecting termcolor>=1.1.0 Using cached termcolor-1.1.0-py3-none-any.whl Collecting opt-einsum>=2.3.2 Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB) Collecting gast>=0.2.1 Using cached gast-0.5.3-py3-none-any.whl (19 kB) Collecting astunparse>=1.6.0 Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB) Collecting tensorboard<2.9,>=2.8 Using cached tensorboard-2.8.0-py3-none-any.whl (5.8 MB) Requirement already satisfied: setuptools in c:\users\adrou\anaconda3\lib\site-packages (from tensorflow) (58.0.4) Collecting flatbuffers>=1.12 Using cached flatbuffers-2.0-py2.py3-none-any.whl (26 kB) Requirement already satisfied: typing-extensions>=3.6.6 in c:\users\adrou\anaconda3\lib\site-packages (from tensorflow) (3.10.0.2) Collecting tensorflow-io-gcs-filesystem>=0.23.1 Downloading tensorflow_io_gcs_filesystem-0.25.0-cp39-cp39-win_amd64.whl (1.5 MB) Collecting libclang>=9.0.1 Downloading libclang-14.0.1-py2.py3-none-win_amd64.whl (14.2 MB) Collecting google-pasta>=0.1.1 Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB) Collecting absl-py>=0.4.0 Using cached absl_py-1.0.0-py3-none-any.whl (126 kB) Requirement already satisfied: wrapt>=1.11.0 in c:\users\adrou\anaconda3\lib\site-packages (from tensorflow) (1.12.1) Requirement already satisfied: six>=1.12.0 in c:\users\adrou\anaconda3\lib\site-packages (from tensorflow) (1.16.0) Collecting keras<2.9,>=2.8.0rc0 Using cached keras-2.8.0-py2.py3-none-any.whl (1.4 MB) Collecting tf-estimator-nightly==2.8.0.dev2021122109 Using cached tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB) Collecting grpcio<2.0,>=1.24.3 Using cached grpcio-1.44.0-cp39-cp39-win_amd64.whl (3.4 MB) Collecting protobuf>=3.9.2 Downloading protobuf-3.20.1-cp39-cp39-win_amd64.whl (904 kB) Collecting keras-preprocessing>=1.1.1 Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB) Requirement already satisfied: h5py>=2.9.0 in c:\users\adrou\anaconda3\lib\site-packages (from tensorflow) (3.2.1) Requirement already satisfied: pillow>=6.2.0 in c:\users\adrou\anaconda3\lib\site-packages (from matplotlib) (8.4.0) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\adrou\anaconda3\lib\site-packages (from matplotlib) (3.0.4) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\adrou\anaconda3\lib\site-packages (from matplotlib) (1.3.1) Requirement already satisfied: python-dateutil>=2.7 in c:\users\adrou\anaconda3\lib\site-packages (from matplotlib) (2.8.2) Requirement already satisfied: cycler>=0.10 in c:\users\adrou\anaconda3\lib\site-packages (from matplotlib) (0.10.0) Requirement already satisfied: scipy>=1.0 in c:\users\adrou\anaconda3\lib\site-packages (from seaborn) (1.7.1) Requirement already satisfied: pytz>=2020.1 in c:\users\adrou\anaconda3\lib\site-packages (from pandas) (2021.3) Requirement already satisfied: scikit-learn in c:\users\adrou\appdata\roaming\python\python39\site-packages (from sklearn) (1.0.2) Collecting imbalanced-learn Downloading imbalanced_learn-0.9.0-py3-none-any.whl (199 kB) Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\users\adrou\anaconda3\lib\site-packages (from astunparse>=1.6.0->tensorflow) (0.37.0) Requirement already satisfied: werkzeug>=0.11.15 in c:\users\adrou\anaconda3\lib\site-packages (from tensorboard<2.9,>=2.8->tensorflow) (2.0.2) Collecting google-auth<3,>=1.6.3 Downloading google_auth-2.6.6-py2.py3-none-any.whl (156 kB) Collecting tensorboard-plugin-wit>=1.6.0 Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB) Requirement already satisfied: requests<3,>=2.21.0 in c:\users\adrou\anaconda3\lib\site-packages (from tensorboard<2.9,>=2.8->tensorflow) (2.26.0) Collecting markdown>=2.6.8 Using cached Markdown-3.3.6-py3-none-any.whl (97 kB) Collecting tensorboard-data-server<0.7.0,>=0.6.0 Using cached tensorboard_data_server-0.6.1-py3-none-any.whl (2.4 kB) Collecting google-auth-oauthlib<0.5,>=0.4.1 Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB) Collecting pyasn1-modules>=0.2.1 Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB) Collecting cachetools<6.0,>=2.0.0 Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB) Collecting rsa<5,>=3.1.4 Using cached rsa-4.8-py3-none-any.whl (39 kB) Collecting requests-oauthlib>=0.7.0 Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB) Requirement already satisfied: importlib-metadata>=4.4 in c:\users\adrou\anaconda3\lib\site-packages (from markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow) (4.8.1) Requirement already satisfied: zipp>=0.5 in c:\users\adrou\anaconda3\lib\site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow) (3.6.0) Collecting pyasn1<0.5.0,>=0.4.6 Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB) Requirement already satisfied: certifi>=2017.4.17 in c:\users\adrou\anaconda3\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in c:\users\adrou\anaconda3\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow) (3.2) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\adrou\anaconda3\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow) (1.26.7) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\adrou\anaconda3\lib\site-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow) (2.0.4) Collecting oauthlib>=3.0.0 Using cached oauthlib-3.2.0-py3-none-any.whl (151 kB) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\adrou\anaconda3\lib\site-packages (from imbalanced-learn->imblearn) (2.2.0) Requirement already satisfied: joblib>=0.11 in c:\users\adrou\anaconda3\lib\site-packages (from imbalanced-learn->imblearn) (1.1.0) Building wheels for collected packages: sklearn Building wheel for sklearn (setup.py): started Building wheel for sklearn (setup.py): finished with status 'done' Created wheel for sklearn: filename=sklearn-0.0-py2.py3-none-any.whl size=1309 sha256=a5495a96301ac8f85d594938a22b291a5552f52313b34a9efecf972d691f944a Stored in directory: c:\users\adrou\appdata\local\pip\cache\wheels\e4\7b\98\b6466d71b8d738a0c547008b9eb39bf8676d1ff6ca4b22af1c Successfully built sklearn Installing collected packages: pyasn1, rsa, pyasn1-modules, oauthlib, cachetools, requests-oauthlib, google-auth, tensorboard-plugin-wit, tensorboard-data-server, protobuf, markdown, grpcio, google-auth-oauthlib, absl-py, tf-estimator-nightly, termcolor, tensorflow-io-gcs-filesystem, tensorboard, opt-einsum, libclang, keras-preprocessing, keras, imbalanced-learn, google-pasta, gast, flatbuffers, astunparse, tensorflow-hub, tensorflow, sklearn, imblearn Successfully installed absl-py-1.0.0 astunparse-1.6.3 cachetools-5.0.0 flatbuffers-2.0 gast-0.5.3 google-auth-2.6.6 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.44.0 imbalanced-learn-0.9.0 imblearn-0.0 keras-2.8.0 keras-preprocessing-1.1.2 libclang-14.0.1 markdown-3.3.6 oauthlib-3.2.0 opt-einsum-3.3.0 protobuf-3.20.1 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-oauthlib-1.3.1 rsa-4.8 sklearn-0.0 tensorboard-2.8.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.8.0 tensorflow-hub-0.12.0 tensorflow-io-gcs-filesystem-0.25.0 termcolor-1.1.0 tf-estimator-nightly-2.8.0.dev2021122109
#import libraries
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tensorflow.keras.utils import get_file
from sklearn.metrics import roc_curve, auc, confusion_matrix
from imblearn.metrics import sensitivity_score, specificity_score
import os
import glob
import zipfile
import random
# to get consistent results after multiple runs
tf.random.set_seed(7)
np.random.seed(7)
random.seed(7)
# 0 for benign, 1 for malignant
class_names = ["benign", "malignant"]
Step 2: Reading and Processing Data
We'll be using only a small part of ISIC archive dataset, the below function downloads and extract the dataset into a new
data
folder:def download_and_extract_dataset():
# dataset from https://github.com/udacity/dermatologist-ai
# 5.3GB
train_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/train.zip"
# 824.5MB
valid_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/valid.zip"
# 5.1GB
test_url = "https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/test.zip"
for i, download_link in enumerate([valid_url, train_url, test_url]):
temp_file = f"temp{i}.zip"
data_dir = get_file(origin=download_link, fname=os.path.join(os.getcwd(), temp_file))
print("Extracting", download_link)
with zipfile.ZipFile(data_dir, "r") as z:
z.extractall("data")
# remove the temp file
os.remove(temp_file)
# call the above function to download the dataset
download_and_extract_dataset()
Downloading data from https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/valid.zip 864542720/864538487 [==============================] - 66s 0us/step 864550912/864538487 [==============================] - 66s 0us/step Extracting https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/valid.zip Downloading data from https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/train.zip 5736562688/5736557430 [==============================] - 381s 0us/step 5736570880/5736557430 [==============================] - 381s 0us/step Extracting https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/train.zip Downloading data from https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/test.zip 5528641536/5528640507 [==============================] - 389s 0us/step 5528649728/5528640507 [==============================] - 389s 0us/step Extracting https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/skin-cancer/test.zip
# preparing data
# generate CSV metadata file to read img paths and labels from it
def generate_csv(folder, label2int):
folder_name = os.path.basename(folder)
labels = list(label2int)
# generate CSV file
df = pd.DataFrame(columns=["filepath", "label"])
i = 0
for label in labels:
print("Reading", os.path.join(folder, label, "*"))
for filepath in glob.glob(os.path.join(folder, label, "*")):
df.loc[i] = [filepath, label2int[label]]
i += 1
output_file = f"{folder_name}.csv"
print("Saving", output_file)
df.to_csv(output_file)
# generate CSV files for all data portions, labeling nevus and seborrheic keratosis
# as 0 (benign), and melanoma as 1 (malignant)
# you should replace "data" path to your extracted dataset path
# don't replace if you used download_and_extract_dataset() function
generate_csv("data/train", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/valid", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
generate_csv("data/test", {"nevus": 0, "seborrheic_keratosis": 0, "melanoma": 1})
Reading data/train\nevus\* Reading data/train\seborrheic_keratosis\* Reading data/train\melanoma\* Saving train.csv Reading data/valid\nevus\* Reading data/valid\seborrheic_keratosis\* Reading data/valid\melanoma\* Saving valid.csv Reading data/test\nevus\* Reading data/test\seborrheic_keratosis\* Reading data/test\melanoma\* Saving test.csv
# loading data
train_metadata_filename = "train.csv"
valid_metadata_filename = "valid.csv"
# load CSV files as DataFrames
df_train = pd.read_csv(train_metadata_filename)
df_valid = pd.read_csv(valid_metadata_filename)
n_training_samples = len(df_train)
n_validation_samples = len(df_valid)
print("Number of training samples:", n_training_samples)
print("Number of validation samples:", n_validation_samples)
train_ds = tf.data.Dataset.from_tensor_slices((df_train["filepath"], df_train["label"]))
valid_ds = tf.data.Dataset.from_tensor_slices((df_valid["filepath"], df_valid["label"]))
Number of training samples: 2000 Number of validation samples: 150
# preprocess data
def decode_img(img):
# convert the compressed string to a 3D uint8 tensor
img = tf.image.decode_jpeg(img, channels=3)
# Use `convert_image_dtype` to convert to floats in the [0,1] range.
img = tf.image.convert_image_dtype(img, tf.float32)
# resize the image to the desired size.
return tf.image.resize(img, [299, 299])
def process_path(filepath, label):
# load the raw data from the file as a string
img = tf.io.read_file(filepath)
img = decode_img(img)
return img, label
valid_ds = valid_ds.map(process_path)
train_ds = train_ds.map(process_path)
# test_ds = test_ds
for image, label in train_ds.take(1):
print("Image shape:", image.shape)
print("Label:", label.numpy())
Image shape: (299, 299, 3) Label: 0
The input is an image (which is basically a matrix) having dimension (299x299x3) while the output will be an integer – 0 for benign and 1 for malignant.
# training parameters
batch_size = 64
optimizer = "rmsprop"
def prepare_for_training(ds, cache=True, batch_size=64, shuffle_buffer_size=1000):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
# shuffle the dataset
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
# Repeat forever
ds = ds.repeat()
# split to batches
ds = ds.batch(batch_size)
# `prefetch` lets the dataset fetch batches in the background while the model
# is training.
ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
return ds
valid_ds = prepare_for_training(valid_ds, batch_size=batch_size, cache="valid-cached-data")
train_ds = prepare_for_training(train_ds, batch_size=batch_size, cache="train-cached-data")
batch = next(iter(valid_ds))
def show_batch(batch):
plt.figure(figsize=(12,12))
for n in range(25):
ax = plt.subplot(5,5,n+1)
plt.imshow(batch[0][n])
plt.title(class_names[batch[1][n].numpy()].title())
plt.axis('off')
show_batch(batch)
That is a part of our training image dataset. Let's start building the model. We'll be using transfer learning with TensorFlow Hub library to download and load the InceptionV3 architecture along with its ImageNet pre-trained weights.
Step 3: Building the Model
# building the model
# InceptionV3 model & pre-trained weights
module_url = "https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/4"
m = tf.keras.Sequential([
hub.KerasLayer(module_url, output_shape=[2048], trainable=False),
tf.keras.layers.Dense(1, activation="sigmoid")
])
m.build([None, 299, 299, 3])
m.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
m.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= keras_layer (KerasLayer) (None, 2048) 21802784 dense (Dense) (None, 1) 2049 ================================================================= Total params: 21,804,833 Trainable params: 2,049 Non-trainable params: 21,802,784 _________________________________________________________________
model_name = f"benign-vs-malignant_{batch_size}_{optimizer}"
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=os.path.join("logs", model_name))
# saves model checkpoint whenever we reach better weights
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(model_name + "_{val_loss:.3f}.h5", save_best_only=True, verbose=1)
history = m.fit(train_ds, validation_data=valid_ds,
steps_per_epoch=n_training_samples // batch_size,
validation_steps=n_validation_samples // batch_size, verbose=1, epochs=100,
callbacks=[tensorboard, modelcheckpoint])
Epoch 1/100 31/31 [==============================] - ETA: 0s - loss: 0.4600 - accuracy: 0.7707 Epoch 1: val_loss improved from inf to 0.57842, saving model to benign-vs-malignant_64_rmsprop_0.578.h5 31/31 [==============================] - 123s 3s/step - loss: 0.4600 - accuracy: 0.7707 - val_loss: 0.5784 - val_accuracy: 0.7734 Epoch 2/100 31/31 [==============================] - ETA: 0s - loss: 0.4077 - accuracy: 0.8070 Epoch 2: val_loss improved from 0.57842 to 0.50465, saving model to benign-vs-malignant_64_rmsprop_0.505.h5 31/31 [==============================] - 82s 3s/step - loss: 0.4077 - accuracy: 0.8070 - val_loss: 0.5047 - val_accuracy: 0.7969 Epoch 3/100 31/31 [==============================] - ETA: 0s - loss: 0.3871 - accuracy: 0.8251 Epoch 3: val_loss improved from 0.50465 to 0.47657, saving model to benign-vs-malignant_64_rmsprop_0.477.h5 31/31 [==============================] - 80s 3s/step - loss: 0.3871 - accuracy: 0.8251 - val_loss: 0.4766 - val_accuracy: 0.8203 Epoch 4/100 31/31 [==============================] - ETA: 0s - loss: 0.3714 - accuracy: 0.8261 Epoch 4: val_loss did not improve from 0.47657 31/31 [==============================] - 79s 3s/step - loss: 0.3714 - accuracy: 0.8261 - val_loss: 0.4893 - val_accuracy: 0.7812 Epoch 5/100 31/31 [==============================] - ETA: 0s - loss: 0.3565 - accuracy: 0.8322 Epoch 5: val_loss improved from 0.47657 to 0.45809, saving model to benign-vs-malignant_64_rmsprop_0.458.h5 31/31 [==============================] - 76s 2s/step - loss: 0.3565 - accuracy: 0.8322 - val_loss: 0.4581 - val_accuracy: 0.7891 Epoch 6/100 31/31 [==============================] - ETA: 0s - loss: 0.3497 - accuracy: 0.8402 Epoch 6: val_loss did not improve from 0.45809 31/31 [==============================] - 72s 2s/step - loss: 0.3497 - accuracy: 0.8402 - val_loss: 0.4607 - val_accuracy: 0.7969 Epoch 7/100 31/31 [==============================] - ETA: 0s - loss: 0.3469 - accuracy: 0.8407 Epoch 7: val_loss improved from 0.45809 to 0.44589, saving model to benign-vs-malignant_64_rmsprop_0.446.h5 31/31 [==============================] - 72s 2s/step - loss: 0.3469 - accuracy: 0.8407 - val_loss: 0.4459 - val_accuracy: 0.7969 Epoch 8/100 31/31 [==============================] - ETA: 0s - loss: 0.3390 - accuracy: 0.8422 Epoch 8: val_loss improved from 0.44589 to 0.43190, saving model to benign-vs-malignant_64_rmsprop_0.432.h5 31/31 [==============================] - 71s 2s/step - loss: 0.3390 - accuracy: 0.8422 - val_loss: 0.4319 - val_accuracy: 0.8125 Epoch 9/100 31/31 [==============================] - ETA: 0s - loss: 0.3491 - accuracy: 0.8417 Epoch 9: val_loss did not improve from 0.43190 31/31 [==============================] - 71s 2s/step - loss: 0.3491 - accuracy: 0.8417 - val_loss: 0.4413 - val_accuracy: 0.8047 Epoch 10/100 31/31 [==============================] - ETA: 0s - loss: 0.3164 - accuracy: 0.8639 Epoch 10: val_loss did not improve from 0.43190 31/31 [==============================] - 71s 2s/step - loss: 0.3164 - accuracy: 0.8639 - val_loss: 0.4825 - val_accuracy: 0.7
Step 4: Model Evaluation
# evaluation
# load testing set
test_metadata_filename = "test.csv"
df_test = pd.read_csv(test_metadata_filename)
n_testing_samples = len(df_test)
print("Number of testing samples:", n_testing_samples)
test_ds = tf.data.Dataset.from_tensor_slices((df_test["filepath"], df_test["label"]))
def prepare_for_testing(ds, cache=True, shuffle_buffer_size=1000):
if cache:
if isinstance(cache, str):
ds = ds.cache(cache)
else:
ds = ds.cache()
ds = ds.shuffle(buffer_size=shuffle_buffer_size)
return ds
test_ds = test_ds.map(process_path)
test_ds = prepare_for_testing(test_ds, cache="test-cached-data")
Number of testing samples: 600# convert testing set to numpy array to fit in memory (don't do that when testing # set is too large) y_test = np.zeros((n_testing_samples,)) X_test = np.zeros((n_testing_samples, 299, 299, 3)) for i, (img, label) in enumerate(test_ds.take(n_testing_samples)): # print(img.shape, label.shape) X_test[i] = img y_test[i] = label.numpy() print("y_test.shape:", y_test.shape)
y_test.shape: (600,)
# load the weights with the least loss
m.load_weights("Yourpath/benign-vs-malignant_64_rmsprop_0.371.h5")
Evaluating the model... Loss: 0.45362651348114014 Accuracy: 0.7983333468437195
def get_predictions(threshold=None):
"""
Returns predictions for binary classification given `threshold`
For instance, if threshold is 0.3, then it'll output 1 (malignant) for that sample if
the probability of 1 is 30% or more (instead of 50%)
"""
y_pred = m.predict(X_test)
if not threshold:
threshold = 0.5
result = np.zeros((n_testing_samples,))
for i in range(n_testing_samples):
# test melanoma probability
if y_pred[i][0] >= threshold:
result[i] = 1
# else, it's 0 (benign)
return result
threshold = 0.23
# get predictions with 23% threshold
# which means if the model is 23% sure or more that is malignant,
# it's assigned as malignant, otherwise it's benign
y_pred = get_predictions(threshold)
def plot_confusion_matrix(y_test, y_pred):
cmn = confusion_matrix(y_test, y_pred)
# Normalise
cmn = cmn.astype('float') / cmn.sum(axis=1)[:, np.newaxis]
# print it
print(cmn)
fig, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cmn, annot=True, fmt='.2f',
xticklabels=[f"pred_{c}" for c in class_names],
yticklabels=[f"true_{c}" for c in class_names],
cmap="Blues"
)
plt.ylabel('Actual')
plt.xlabel('Predicted')
# plot the resulting confusion matrix
plt.show()
plot_confusion_matrix(y_test, y_pred)
[[0.63768116 0.36231884] [0.32478632 0.67521368]]
sensitivity = sensitivity_score(y_test, y_pred)
specificity = specificity_score(y_test, y_pred)
print("Melanoma Sensitivity:", sensitivity)
print("Melanoma Specificity:", specificity)
Melanoma Sensitivity: 0.6752136752136753 Melanoma Specificity: 0.6376811594202898
def plot_roc_auc(y_true, y_pred):
"""
This function plots the ROC curves and provides the scores.
"""
# prepare for figure
plt.figure()
fpr, tpr, _ = roc_curve(y_true, y_pred)
# obtain ROC AUC
roc_auc = auc(fpr, tpr)
# print score
print(f"ROC AUC: {roc_auc:.3f}")
# plot ROC curve
plt.plot(fpr, tpr, color="blue", lw=2,
label='ROC curve (area = {f:.2f})'.format(d=1, f=roc_auc))
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curves')
plt.legend(loc="lower right")
plt.show()
plot_roc_auc(y_test, y_pred)
ROC AUC: 0.656
plt.hist(y_test)
plt.hist(y_pred)
Thus, the sensitivity (i.e. the probability of a positive test given that the patient has the desease) is 67%, whereas the specificity (i.e. the probability of a negative test given that the patient is well) is 63% for the threshold=0.23. The Area Under Curve ROC (ROC AUC) is 0.66 an area of 1 means the ideal model.
We can improve the model by increasing the number of training samples. We can also tweak the hyperparameters such as the threshold we set earlier, and see if we can get better sensitivity and specificity scores.
Comments
Post a Comment