Recognizing Handwritten Digits with Scikit-learn under Data Analytics using Python

Royce Dcunha
4 min readJan 31, 2021

Data analytics is the science of analyzing raw data in order to make about that information. Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also focus on applying data patterns towards effective decision making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming and operational research to quantify performance .Data analysis is not limited to numbers and strings, because images and sounds can also be analyzed and classified.

Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. Scikit-learn provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python. It features various classification ,regression and clustering algorithms including support vector machine ,random forest, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Here we are going to analyze the digits data-set of the Sci-Kit learn library. We are going to train a Support Vector Machine(SVM) and then we will be predicting the values of a Unknown Handwritten digits.

Here we jupyter notebook for the the performing operations. So lets start first with importing required libraries.

There are total 1797 images are there in the dataset.

Our whole data-set is stored in digits. Following is an example of a digit in our dataset. It consists of 64 pixels (8X8).The data set contains images of hand-written digits: 10 classes where each class refers to a digit from 0 to 9.Each image stored as 8x8 matrices as following (for digit 0):

Matrix value for Digit

This dataset consists of 1,797 images that are 8x8 pixels in size. Each image is a handwritten digit in grayscale, as shown in Figure

Pixel image in dataset

Let us train our SVM with the first 1790 images in out data-set. After that we will use the remaining Data-set as our test data and check the accuracy of our training machine.

It consists of 6 images of 64 pixels each(8X8) of six different numbers.The output of the above test data will be produced as below:

Lets fit our model using SVM classifier. Here we use 1st 1790 images for training the model and remaining are for validation.

Prediction by the model

As we see,both predicted and target values are same for this data.Lets check the model peidiction for some datasets.

As we can see we have achieved 100% accuracy. Let us now define a function that will find the accuracy of our SVM and train our model with varying data-set. We will start with 3 elements in our training data and work our way up to 1790 data and store the accuracy of our models in a dictionary.

Accuracy function

The values dictionary holds all the accuracies values for the given model.

Let us plot our dictionary result to visualize the accuracy result.

Accuracy of the model

Conclusion:As we can clearly see for well above 95% of our models the achieved accuracy is 100% . Hence we can easily conclude that our model works accurate for more than 95% of the time.Hence by using Scikit-Learn library in python,data analysis becomes easy ,effective and take less time.

--

--