多手写数字识别基于CNN

准备工作

编程环境

Python+TensorFlow+OpenCV+Anaconda

对应版本兼容性自行解决，一般都装最新版本即可

提取数字图像

对于一张有很多数字的图片，我们需要提取其中的每一个数字，分别对其进行识别，这里我使用了以下方法来实现图像的提取

Canny边缘检测

对于一张需要提取特征的图片，首先就要提取图像的边缘，一种常用的方法就是 Canny 算法进行边缘检测，这里使用 OpenCV 库来操作，当然，在操作之前，需要先将图片转为灰度图来方便提取

# 读取一张原始图像图像
image = cv2.imread('./image/test3.png')
# 将图像转换为灰度图
gray_image = cv2.cvtColor(image, cv2.COLOR_RGBA2GRAY)
# 使用Canny边缘检测
edges = cv2.Canny(gray_image, 50, 100)

轮廓检测

在使用 Canny 算法提取边缘之后，我们就可以尝试提取数字的轮廓了，在 OpenCV 中提供了一个专门的方法cv2.findContours来提取边缘，对于一个数字来说，我们只需要最外侧的边缘即可，但是这样会遇到一些问题，比如说数字 8 在某些情况下可能会被识别为多个部分，因为这个数字具有多个轮廓，为了避免这种情况，我采取的方式是在处理轮廓时，判断有没有出现在内部的轮廓，如果有则跳过

# 查找轮廓
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# 保存所有的轮廓边界框的列表
recognized_rectangles = [(x, y, w, h) for x, y, w, h in map(cv2.boundingRect, contours)]
# 在原始图像上绘制轮廓及正方形
contour_image = image.copy()

for contour in contours:
    # 计算轮廓的边界框
    x, y, w, h = cv2.boundingRect(contour)
    overlapping = any(
        rx < x and rx + rw > x + w and ry < y and ry + rh > y + h
        for rx, ry, rw, rh in recognized_rectangles
    )
    if not overlapping:
        # 提取其中一张数字图片
        square_image = image[y - 5:y + h + 10, x - 5:x + w + 10]
        # 在原始图像上绘制轮廓同时添加结果
        cv2.rectangle(contour_image, (x - 5, y - 5), (x  w + 10, y + h + 10), (0, 255, 0), 2)

图片预处理

在提取出每一张数字图片后，我们还需要对图片进行预处理来符合模型输入的要求，对于模型来说，输入的图片的格式应当是(28,28,1)的大小，同时需要注意的是，MNIST 数据集中的图片均为黑底白字，所以我们也要处理为黑底白字，同时 MNIST 中每一张的图片虽然大小为 28*28 但是有效区域的大小只有 20*20

构建 CNN 卷积神经网络

这里使用 TensorFlow 自带的模型构建器构建一个简单的 CNN 模型，参考了经典的 LeNet 结构

def build_cnn(activation):
    # 构建卷积神经网络模型
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation=activation, input_shape=(28, 28, 1)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation=activation))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation=activation))
    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation=activation))
    model.add(layers.Dense(10, activation='softmax'))
    return model

开始训练模型

尝试普通训练

def normal_train():
    model = build_cnn('relu')
    # 编译模型
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, decay=1e-6, momentum=0)
    model.compile(optimizer=optimizer,
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=['accuracy'])

    # 定义ModelCheckpoint回调
    checkpoint_path = './normal_trained_model.h5'
    checkpoint = ModelCheckpoint(checkpoint_path,
                                 monitor='val_accuracy',
                                 save_best_only=True,
                                 mode='max',
                                 verbose=1)

    # 训练模型并保存性能最好的模型
    res = model.fit(train_images, train_labels, batch_size=64,
                    epochs=10,
                    validation_data=(test_images, test_labels),
                    callbacks=[checkpoint])
    showimg(res.history)

图片预测

图片预处理之后就可以放入训练好的模型中进行预测了

def predict_one(image, model):
    # 颜色反转为黑底白字
    image = cv2.bitwise_not(image)
    # 获取图像的高度和宽度
    height, width, _ = image.shape
    # 计算正方形的大小（取较大的那个维度作为边长）
    side_length = max(height, width)
    out_length = int(side_length * 1.4)
    # 创建一个黑色底的正方形图像
    square_image = np.zeros((out_length, out_length, 3), dtype=np.uint8)
    # 计算粘贴的区域坐标
    x_start = (out_length - width) // 2
    y_start = (out_length - height) // 2
    # 在正方形图像上粘贴原图像
    square_image[y_start:y_start + height, x_start:x_start + width] = image
    # 将图像大小设为28*28并转换为灰度图像
    gray_image = cv2.cvtColor(square_image, cv2.COLOR_BGR2GRAY)
    image_data = cv2.resize(gray_image, (28, 28))
    # 将图像数据转为模型输入所需的格式
    image_data_for_prediction = np.array(image_data).astype('float32') / 255.0
    image_data_for_prediction = np.expand_dims(image_data_for_prediction, axis=-1)
    image_data_for_prediction = np.expand_dims(image_data_for_prediction, axis=0)
    # 使用模型进行预测
    prediction = model.predict(image_data_for_prediction)
    prediction_label = np.argmax(prediction)
    prediction_confidence = np.ma(prediction)
    # 返回结果
    return prediction_label, prediction_confidence

# 开始训练
import train
train.normal_train()

通过图像可以看到，训练的结果还是比较不错的，这时我们尝试去识别自己的图片

import predict
predict.predict_img('./image/test10.png', './normal_trained_model.h5')

通过结果可以看到，有一些数字出现了识别错误的情况，甚至在错误的情况下给出了较高的可信度，经过分析可能有以下几个原

MNIST 数据集中的数字是西文写法，有些数字的写法可能不同
MNIST 数据集中的样本过少，特征太明显导致鲁棒性不强
图片预处理出现了问题

经过排查之后，第三个原因是不存在的，所以总结之后的原因就是样本数据太少，所以我们要重新优化训练模型

优化训练

数据增强

数据增强是一种非常常见的手法，通过对原数据集进行不同程度的处理，比如拉伸，缩放，翻转等等，变相增加数据量的大小，这里使用 TensorFlow 自带的数据增强器进行操作，因为是数字图像，翻转后的数字是没有意义的，所以这里不进行翻转

# 数据增强
datagen = ImageDataGenerator(
    rotation_range=20,  # 旋转角度范围
    width_shift_range=0.1,  # 宽度偏移范围
    height_shift_range=0.1,  # 高度偏移范围
    zoom_range=0.1,  # 缩放范围
    horizontal_flip=False,  # 不翻转
    vertical_flip=False
)

更换激活函数

原先使用的是 ReLu 作为激活函数，它的优点是具有较高的计算速度，可以让网络快速收敛，但缺点是可能会导致神经元坏死的出现，这里我们将激活函数换为 swish

更换优化器

原先使用的是 SGD 作为优化器，同样的，这个优化器也会导致一些问题的出现，比如出现局部最优解的情况，这里我们将优化器换为 Adam，它是一种自适应学习率的优化器，可以适配大部分情况，不过需要调整更多的超参数，经过多次尝试，设置学习率为 0.00008

# 重新尝试训练
import train
train.optimize_train()

在优化训练模型后，我们重新对图像进行识别

import predict
predict.predict_img('./image/test10.png', './optimize_trained_model.h5')

可以看到，在进行一系列优化操作之后，整体识别的准确率和可信度都得到了一定的提升

源码

train.py

import tensorflow as tf
from matplotlib import pyplot as plt
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# 加载MNIST数据集
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# 数据预处理
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)


def build_cnn(activation):
    # 构建卷积神经网络模型
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation=activation, input_shape=(28, 28, 1)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation=activation))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation=activation))
    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation=activation))
    model.add(layers.Dense(10, activation='softmax'))
    return model


def normal_train():
    model = build_cnn('relu')
    # 编译模型
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, decay=1e-6, momentum=0)
    model.compile(optimizer=optimizer,
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=['accuracy'])

    # 定义ModelCheckpoint回调
    checkpoint_path = './normal_trained_model.h5'
    checkpoint = ModelCheckpoint(checkpoint_path,
                                 monitor='val_accuracy',
                                 save_best_only=True,
                                 mode='max',
                                 verbose=1)

    # 训练模型并保存性能最好的模型
    res = model.fit(train_images, train_labels, batch_size=64,
                    epochs=10,
                    validation_data=(test_images, test_labels),
                    callbacks=[checkpoint])
    showimg(res.history)


def optimize_train():
    model = build_cnn('swish')
    # 数据增强
    datagen = ImageDataGenerator(
        rotation_range=20,  # 旋转角度范围
        width_shift_range=0.1,  # 宽度偏移范围
        height_shift_range=0.1,  # 高度偏移范围
        zoom_range=0.1,  # 缩放范围
        horizontal_flip=False,  # 不进行水平翻转
        vertical_flip=False
    )
    # 编译模型
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.00008)
    model.compile(optimizer=optimizer,
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=['accuracy'])

    # 定义ModelCheckpoint回调
    checkpoint_path = './optimize_trained_model.h5'
    checkpoint = ModelCheckpoint(checkpoint_path,
                                 monitor='val_accuracy',
                                 save_best_only=True,
                                 mode='max',
                                 verbose=1)

    # 训练模型并保存性能最好的模型
    res = model.fit(datagen.flow(train_images, train_labels, batch_size=64),
                    epochs=10,
                    validation_data=(test_images, test_labels),
                    callbacks=[checkpoint])
    showimg(res.history)


def showimg(history):
    # 绘制训练和验证准确率
    plt.figure(figsize=(12, 6), dpi=326)
    plt.subplot(1, 2, 1)
    plt.plot(history['accuracy'])
    plt.plot(history['val_accuracy'])
    plt.title('Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(['Train', 'Test'], loc='upper left')

    # 绘制训练和验证损失值
    plt.subplot(1, 2, 2)
    plt.plot(history['loss'])
    plt.plot(history['val_loss'])
    plt.title('Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(['Train', 'Test'], loc='upper left')

    # 显示图像
    plt.show()

predict.py

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import cv2


def load_model(path):
    # 加载已训练好的手写数字识别模型
    model = tf.keras.models.load_model(path)
    return model


def predict_one(image, model):
    # 颜色反转为黑底白字
    image = cv2.bitwise_not(image)
    # 获取图像的高度和宽度
    height, width, _ = image.shape
    # 计算正方形的大小（取较大的那个维度作为边长）
    side_length = max(height, width)
    out_length = int(side_length * 1.4)
    # 创建一个黑色底的正方形图像
    square_image = np.zeros((out_length, out_length, 3), dtype=np.uint8)
    # 计算粘贴的区域坐标
    x_start = (out_length - width) // 2
    y_start = (out_length - height) // 2
    # 在正方形图像上粘贴原图像
    square_image[y_start:y_start + height, x_start:x_start + width] = image
    # 将图像大小设为28*28并转换为灰度图像
    gray_image = cv2.cvtColor(square_image, cv2.COLOR_BGR2GRAY)
    image_data = cv2.resize(gray_image, (28, 28))
    # 将图像数据转为模型输入所需的格式
    image_data_for_prediction = np.array(image_data).astype('float32') / 255.0
    image_data_for_prediction = np.expand_dims(image_data_for_prediction, axis=-1)
    image_data_for_prediction = np.expand_dims(image_data_for_prediction, axis=0)
    # 使用模型进行预测
    prediction = model.predict(image_data_for_prediction)
    prediction_label = np.argmax(prediction)
    prediction_confidence = np.max(prediction)
    # 返回结果
    return prediction_label, prediction_confidence


def find_counter(image):
    # 将图像转换为灰度图
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # 使用Canny边缘检测
    edges = cv2.Canny(gray_image, 40, 80)
    # 查找轮廓
    contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    return contours


def predict_img(img_path, model_path):
    # 加载本地模型
    model = load_model(model_path)
    # 读取一张原始图像图像
    image = cv2.imread(img_path)
    contours = find_counter(image)
    # 保存所有的轮廓边界框的列表
    recognized_rectangles = [(x, y, w, h) for x, y, w, h in map(cv2.boundingRect, contours)]
    # 在原始图像上绘制轮廓及正方形
    contour_image = image.copy()

    for contour in contours:
        # 计算轮廓的边界框
        x, y, w, h = cv2.boundingRect(contour)
        overlapping = any(
            rx < x and rx + rw > x + w and ry < y and ry + rh > y + h
            for rx, ry, rw, rh in recognized_rectangles
        )
        if not overlapping:
            # 提取其中一张数字图片
            square_image = image[y - 2:y + h + 4, x - 2:x + w + 4]
            # 将图片加入模型获取预测结果
            prediction_label, prediction_confidence = predict_one(square_image, model)
            # 在原始图像上绘制轮廓同时添加结果
            cv2.rectangle(contour_image, (x - 2, y - 2), (x + w + 4, y + h + 4), (0, 255, 0), 2)
            text = f"{prediction_label} ({prediction_confidence * 100:.2f}%)"
            cv2.putText(contour_image, text, (x - 2, y - 6), cv2.FONT_HERSHEY_SIMPLEX,
                        1.0, (255, 0, 0), 2)

    # 显示结果
    plt.figure(dpi=326)
    plt.imshow(contour_image)
    plt.show()

多手写数字识别基于CNN

准备工作

编程环境

提取数字图像

Canny边缘检测

轮廓检测

图片预处理

构建 CNN 卷积神经网络

开始训练模型

尝试普通训练

图片预测

优化训练

数据增强

更换激活函数

更换优化器

源码

发表评论取消回复
使用Cookies保留部分信息用于下次快速评论

NLP 英文文本处理

WordPress 美化基于 Argon 主题

WordPress 优化

NEO 65

指弹-像风一样

二分查找

指弹-流れ行く云

指弹-心墙

NLP 英文文本处理

Java 注解

准备工作

编程环境

提取数字图像

Canny边缘检测

轮廓检测

图片预处理

构建 CNN 卷积神经网络

开始训练模型

尝试普通训练

图片预测

优化训练

数据增强

更换激活函数

更换优化器

源码

发表评论 取消回复 使用Cookies保留部分信息用于下次快速评论

发表评论取消回复
使用Cookies保留部分信息用于下次快速评论