![](https://crypto4nerd.com/wp-content/uploads/2023/08/0yExniM9wK_TQ3_gr-1024x683.jpeg)
While we may not personally like those instances when we face errors during model building but believe me errors are one of the most fascinating ways we learn how an internal system or program works. Machine learning is also not an exception.
While addressing vivid problems and building solutions around those, we encounter many issues and that is how we gain experience and make us a better version of ourselves. I may sound a little more philosophical but that’s true and you too maybe agree with me at least on this point. In this article, I will mention such scenarios, however I will make this article an ongoing one. That means, I will keep on adding the occurances as I feel. On the other hand, I will be grateful if you suggest some on the comments.
IndexError: only integers, slices (`:`), ellipsis (`…`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Occurrence: Walmart Sales Forecast. While creating the pipeline, for the FeatureTransformer pipeline, when I did missing value imputation before actual feature transformation. It worked ok when I put the imputation after the transformation.
ValueError: Supported target types are: (‘binary’, ‘multiclass’). Got ‘continuous’ instead.
StratifiedKFold is not a good choice for regression problem and hence this error. Instead I have used KFold and the issue is resolved. Occurrence: Walmart Sales Forecast.
R2 negative
Awkward implementation of the model but very unlikely. However, majority of the times it happens due to human typo. What I meant is, check the previous line before calculating R2 and verify if you have “-(np.mean(rmse)” calculation for RMSE calculation and pasted the same line for R2 as well while completely oblivious about the negative(- sign) in np.mean(). There could be one more scenario it could happen if your test and training data are of diferent scale i.e., say you have performed scaling on your training set via StandardScaler for example but haven’t done anything with test data while doing evaluation of yor model.
TypeError: ‘SimpleImputer’ object is not iterable
Happens particularly when you miss to put parenthesis during pipeline creation.
MemoryError: Unable to allocate 23.5 GiB for an array with shape (56209, 56209) and data type float64
I encountered this error when I tried to fit QuantileRegressor() model.
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
rf = RandomForestClassifier()
rf.fit(X_train, y_train.values.ravel()) # Use y_train.values.ravel instead of just y_train
cross_val_score error: ValueError: Target is multiclass but average=’binary’. Please choose another average setting, one of [None, ‘micro’, ‘macro’, ‘weighted’].
cross_val_score returns nan values for multilabel classifier if scoring is ‘recall’ or ‘precision’ or ‘f1’. Check sklearn documentation for details.
Choosing cross-validation technique for a regression problem
When selecting a cross-validation scheme for a regression problem, most people go for normal K Fold because the target values are continuous. This will lead to a random split of train and validation set and fail to ensure an identical distribution of target values in train and validation.
Thanks for reading!
I hope this article is helpful for you and provides you necessary info for you to continue your machine learning journey.
Hit a “clap” and follow Debmalya Mondal for such content.
Connect with me on