Data Ingestion

Normalize using sklearn

# See nice summaries here
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

Normalize 256 pixel image data

# If values run from 0-255 with no numbers that are considered outliers, we can apply a linear /= function on the numpy.ndarray
# This divides each value by 255, which normalizes to the range 0-1
X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.

Load MNIST test data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

One Hot Encode Outputs

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

Flatten Data

from keras.layers import Flatten
# Flatten requries an input shape as defined by our data.
If we have a 2D array then our input shape would be the length of the X dimension
multiplied by the length of the Y dimension. Flatten handles this for us
if we use it like so

model.add(Flatten(input_shape=(x_shape, y_shape))
x shape and y shape here are the dimensions as mentioned.


# Take for example the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
img_width = X_train.shape[1]
img_height = X_train.shape[2]
model.add(Flatten(input_shape=(img_width, img_height)))

Split data into train/test groups

from sklearn.model_selection import train_test_split
# X values are the feature set and y values are the label data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Reshape Data For CNN

# There are 2 options that I currently know of:

# The easy way
model.add(Reshape((28, 28, 1), input_shape=(28,28)))


# The less easy way
# If data is 2D, we want to reshape the 3D shape into a 4D shape
# keras expects that 3rd dimension (4th dimension in the shape) to be the color dimension
X_train = X_train.reshape(X_train.shape[0], config.img_width, config.img_height, 1)
X_test = X_test.reshape(X_test.shape[0], config.img_width, config.img_height, 1)