Recognizing Handwritten Digits with Scikit-learn

Recognizing handwritten text is a problem that can be traced back to the first automatic machines that needed to recognize individual characters in handwritten documents.

To address this issue in Python, the scikit-learn library provides a good example to better understand this technique, the issues involved, and the possibility of making predictions.The scikit-learn library enables you to approach this type of data analysis.

The scikit-learn library provides numerous datasets that are useful for testing many problems of data analysis and prediction of the results. Also in this case there is a dataset of images called Digits. This dataset consists of 1,797 images that are 8x8 pixels in size.

Our goal in this Internship project involves predicting a numeric value, and then reading and interpreting an image that uses a handwritten font.

So let’s get started step by step.

1. Import all the necessary libraries and load the dataset digits using the function.

2. Our data-set is stored in digits.

3. Let us train our SVM with the first 1790 images in out data-set. After that we will use the remaining Data-set as our test data and check the accuracy of our training machine.

4. lets now see the digits.

5. let’s see how our digits look.

The six digits of the validation set

6. Each dataset in the scikit-learn library has a field containing all the information.

7. Fit the model.

As we can see we have achieved 100% accuracy.

8. Let us now define a function that will find the accuracy of our SVM and train our model with varying data-set. We will start with 3 elements in our training data and work our way up to 1790 data and store the accuracy of our models in a dictionary.

9. Visualizing Dataset.

Accuracy vs size of training-set

10. let’s plot a Heatmap.

11. plot Data.


From this article, we can see how easy it is to import a dataset, build a model using Scikit-Learn, train the model, make predictions with it, and finding the accuracy of our prediction(which in our case is 95.11%).

Above 95% of our models the achieved accuracy is 100% . Hence we can easily conclude that our model works for more than 95% of the time.

I am thankful to mentors at for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you

Software Engineer