![](https://crypto4nerd.com/wp-content/uploads/2023/11/1g7IXkKiHrBvZ6Tj6xEXPLQ-1024x768.jpeg)
Detecting breast cancer early is crucial for effective treatment. In the previous project, I used logistic regression and achieved an accuracy of 92.9%. While this was a solid outcome, the ever-evolving field of machine learning prompted me to explore more advanced techniques. In this new project, I’ve turned to neural networks to improve predictive accuracy and gain deeper insights. Let’s compare the results of logistic regression with the capabilities of neural networks, aiming to enhance our ability to predict breast cancer accurately.
In the realm of breast cancer prediction, the choice of a suitable machine learning algorithm is pivotal to achieving accurate and reliable results. While logistic regression served as a robust tool in a previous project, the exploration of neural networks was motivated by the desire to leverage the power of deep learning for enhanced predictive capabilities.
Advantages of Neural Networks
1. Non-Linearity and Complex Patterns: Neural networks excel in capturing intricate, non-linear relationships within breast cancer data due to their layered architecture and activation functions.
2. Feature Learning and Representation: Neural networks autonomously learn hierarchical features, automatically extracting relevant information from raw data, particularly beneficial for diverse medical datasets.
3. Scalability and Adaptability: Neural networks efficiently handle large datasets and adapt to various data distributions, making them well-suited for the variability in medical datasets.
4.Performance on Image Data: Neural networks, especially Convolutional Neural Networks (CNNs), excel in analyzing medical images like mammograms, enhancing predictive power in breast cancer diagnosis.
5. Improved Generalization: Appropriately configured neural networks can generalize well on diverse datasets, a crucial adaptability in medical applications with heterogeneous patient data.
Dataset Overview:
The dataset employed in both projects is the breast cancer dataset available in the sklearn library. This dataset comprises features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass, and it is commonly used for binary classification tasks related to breast cancer.
In this project, we explored the entire set of features available in the breast cancer dataset, aiming to leverage the power of neural networks in capturing complex patterns. The preprocessing steps remained consistent with the logistic regression project, including the same train-test split ratio.
#Importing the Dependencies
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score#Data Collection and Processing
breast_cancer_data=sklearn.datasets.load_breast_cancer()
print(breast_cancer_data)
#loading the dataset to a dataframe
data=pd.DataFrame(breast_cancer_data.data, columns=breast_cancer_data.feature_names)
#adding the target column to the dataframe
data['label']=breast_cancer_data.target
#checking the distribution of target variables
data['label'].value_counts()
data.groupby('label').mean()
x=data.drop(columns='label',axis=1)
y=data['label']
#Splitting data into training and testing data
x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.2,random_state=2)
print(x.shape,x_train.shape,x_test.shape)
#Standardise the data
from sklearn.preprocessing import StandardScaler
scaler= StandardScaler()
x_train_std=scaler.fit_transform(x_train)
x_test_std=scaler.transform(x_test)
I used logistic regression to predict breast cancer with the sklearn dataset. Here are the main points:
- Model: Logistic Regression
- Training Accuracy: Achieved 94.7% accuracy on the training data.
- Test Accuracy: Maintained strong performance with 92.98% accuracy on the test data.
- Focus: Emphasized accuracy as the primary measure of success.
The logistic regression model performed well, especially considering its simplicity. The selected features played a crucial role in accurately predicting breast cancer, providing a solid foundation for comparison with the neural network project. click here for more details.
The neural network employed in this project is a feedforward neural network, carefully crafted to capture complex relationships within the breast cancer dataset. The architecture consists of the following layers:
Input Layer:
Neurons: The number of input features in the dataset.
Purpose: Responsible for receiving the feature values of each data point.
Hidden Layers:
Number of Layers: Two densely connected hidden layers.
Neurons: The number of neurons in each hidden layer was determined through experimentation to find a balance between model complexity and performance.
Activation Function: Rectified Linear Unit (ReLU) was chosen as the activation function for the hidden layers, enabling the model to learn non-linear patterns in the data.
Output Layer:
Neurons: One neuron, as we are dealing with a binary classification task (presence or absence of breast cancer).
Activation Function: Sigmoid activation function was employed to produce a probability score between 0 and 1, indicating the likelihood of the presence of breast cancer.
Training and Optimization:
The model was trained using the Adam optimization algorithm, and the binary cross-entropy loss function was utilized to quantify the difference between predicted and actual outcomes. The training process involved iterative adjustments of weights and biases to minimize the loss.
#importing tensorflow and keras
import tensorflow as tf
tf.random.set_seed(3)
from tensorflow import keras# Setting up the layers of the Neural Network
model = keras.Sequential([
keras.layers.Flatten(input_shape=(30,)), # Fix the typo here
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(2, activation='sigmoid')
])
# compiling the Neural Network
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# training the Neural Network
history = model.fit(x_train_std, y_train, validation_split=0.1, epochs=10)
loss, accuracy = model.evaluate(x_test_std, y_test)
print(accuracy)
The neural network exhibited remarkable performance, achieving an accuracy of 95.6%, a notable improvement compared to the logistic regression model’s accuracy of 92.9%.
In conclusion, our exploration into breast cancer prediction unveiled two distinct paths: logistic regression and neural networks. The initial endeavor with logistic regression yielded a commendable 92.9% accuracy, showcasing the strength of traditional methods. Shifting gears to neural networks, we witnessed a remarkable leap in accuracy to 95.6%, emphasizing their ability to discern intricate patterns for more precise predictions. While not without challenges, navigating the complexities of neural networks provided valuable insights for medical diagnostics. Looking ahead, the future holds potential for combining models or exploring advanced architectures to further enhance predictive capabilities. This journey underscores the dynamic interplay of machine learning in healthcare, where the pursuit of accuracy remains a driving force for innovation. Thank you for joining me on this exploration; the intersection of machine learning and healthcare promises exciting advancements on the horizon.