Analysis of driving behavior applying Deep Learning | by Juan Guerrero García

Methodology

The present project is focused on the analysis of driver behavior using deep learning techniques. The use of deep learning techniques will provide a golden opportunity to identify the complex and volatile patterns in the data collected from drivers, which will lead to a better understanding of the causes and consequences of aggressive driving behavior. Developing a system to understand and predict the causes and consequences of aggressive driving behavior will enable the design of preventive and corrective measures to reduce incidents related to aggressive driving.

Objectives:

Develop a recurrent neural network (RNN) model using the PyTorch library to process accelerometer and gyroscope data sequences and accurately classify driving behavior into slow, normal, and aggressive categories.
Evaluate the feasibility of using Deep Learning techniques to automatically detect aggressive driving from accelerometer and gyroscope data analyzing the accuracy and performance of the model in classifying aggressive behavior.
Analyze and compare different architectures and Deep Learning model configurations using accelerometer and gyroscope data for the detection of aggressive driving, analyzing their performance in terms of accuracy, recall, and F1-score, with the objective of identifying the optimal model configuration.

Dataset:

The dataset was obtained from Kaggle. The dataset contains 7 quantitative variables (Acceleration, Rotation, Timestamp) and 1 qualitative variable (Behavior Driving).

The description is presented in the next part:

Acceleration - Axis Values: X,Y,Z (m/s2)
Rotation - Axis Values: X,Y,Z (°/s)
Event Timestamp: Event time record (s)
Driver Behavior - (SLOW, NORMAL, AGGRESSIVE)

The proposed dataset was collected using a Samsung Galaxy S10 smartphone and a Dacia Sandero 1.4 MPI which had 75 horsepower. You can see the paper to get more information. The dataset is composed of two files, one for training and one for testing, each having the same structure.

>df_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3644 entries, 0 to 3643
Data columns (total 8 columns):
#   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
0   AccX       3644 non-null   float64
1   AccY       3644 non-null   float64
2   AccZ       3644 non-null   float64
3   GyroX      3644 non-null   float64
4   GyroY      3644 non-null   float64
5   GyroZ      3644 non-null   float64
6   Class      3644 non-null   object 
7   Timestamp  3644 non-null   int64  
dtypes: float64(6), int64(1), object(1)
memory usage: 227.9+ KB

>df_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3084 entries, 0 to 3083
Data columns (total 8 columns):
#   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
0   AccX       3084 non-null   float64
1   AccY       3084 non-null   float64
2   AccZ       3084 non-null   float64
3   GyroX      3084 non-null   float64
4   GyroY      3084 non-null   float64
5   GyroZ      3084 non-null   float64
6   Class      3084 non-null   object 
7   Timestamp  3084 non-null   int64  
dtypes: float64(6), int64(1), object(1)
memory usage: 192.9+ KB

ETL

Data extraction:

First, the appropriate data sources mentioned in the previous section will be identified and selected. This source may include driving event logs, vehicle sensors, and other variables on driver behavior. The dataset will be obtained in an ethical manner and in compliance with all applicable privacy and data security regulations.

Data transformation:

Once the data has been extracted, the transformation phase will proceed. At this stage, various operations will be performed to clean and prepare the data for analysis. Issues such as format normalization, error correction, elimination of duplicate data, and identification of inconsistent or atypical data will be addressed. In addition, specific data cleaning techniques for handling null values, missing values, or incomplete data that may affect the quality of the results will be explored.

Data analysis:

Once the data have been cleaned and properly prepared, exploratory data analysis will proceed. A detailed study will be conducted to identify patterns, trends, and measures that may influence the study of aggressive driver behavior. Statistical and visual techniques will be used to better understand the distribution of the data and to detect possible correlations or significant relationships.

Dataset validation and quality:

During the whole ETL and analysis process, particular consideration will be placed on the quality of the resulting dataset. Testing and validation will be applied to ensure that the data are consistent and representative of the target population. In addition, the quality of the dataset will be evaluated with the objective of minimizing biases or errors that may influence the final results.

The use of ETL and data quality analysis will provide a reliable and cleaned dataset, which will lay the groundwork for a rigorous and exhaustive study of aggressive driver behavior using deep learning techniques.

LSTM Neural Network Model in PyTorch

Before proceeding with the training of the LSTM model, we define the following parameters:

# Configuración de Parámetros
input_size = X_train.shape[1]
hidden_size = 64
batch_size = 32
learning_rate = 0.001
num_epochs = 10
num_frames = 1ty

input_size: Represents the input size dimension and refers to the number of features in each training sample.
hidden_size: This is the size of the hidden space or the number of hidden units in the LSTM layer, which controls the complexity and ability of the model to capture patterns in the data.
batch_size: Indicates the size of the batch of samples to be used for forward and backward propagation during training, which affects the learning rate and efficiency of the model.
learning_rate: learning rate of the Adam optimizer, which determines the size of the steps taken in the opposite direction of the gradient during training.
num_epochs: number of times the model will traverse the entire training set during the training process.
num_frames: Represents the number of time frames used to construct a continuous sequence of data, which can be relevant in the context of time series prediction.

The implementation of the LSTM neural network model in PyTorch and the training procedure are described in this part of the methodological process.

# Define model
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(LSTMModel, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)def forward(self, x):
h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
c0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
out, _ = self.lstm(x)
out = self.fc(out[:, -1, :])
return out
# model
model = LSTMModel(input_size, hidden_size, num_classes)

We’ve defined a class known as LSTMModel, which represents the LSTM neural network model used in the analysis. This class has inherited from nn.Module, is the base class for all PyTorch models, and it consists of an LSTM layer and a linear (fully-connected) layer for the final classification.

The forward function defines the sequence of operations that are executed when passing data through the model. The hidden and cell state are initialized with zero values, the input sequence x is passed through the LSTM layer and the last output (corresponding to the last time step) is taken to perform the classification using the linear layer.

After defining the model, we proceed to configure the loss function (criterion) and the Adam optimizer that will be used during the training process:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

The training process is carried out for a specific number of epochs (num_epochs) in an iterative loop. During each epoch, the model is adjusted using the training set (train_loader) and the training loss is calculated:

 for epoch in range(num_epochs):
model.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss_history.append(loss.item())

Source link