I ported a Python tutorial to J.

J has libraries to read numpy arrays from file; we first export the relevant data, viz.

import numpy as np import tensorflow_datasets as tfds

train = tfds.load('mnist', split='train', as_supervised=True, batch_size=-1) # as_supervised and batch_size are cargo-culted to make it export to numpy train_image_data, train_labels = tfds.as_numpy(train) train_images = train_image_data.astype('float64')/255

# normalize + export to float64 for J np.save("data/train-images.npy", train_images) np.save("data/train-labels.npy", train_labels)

test = tfds.load('mnist', split='test', as_supervised=True, batch_size=-1) test_image_data, test_labels = tfds.as_numpy(test) test_images = test_image_data.astype('float64')/255

np.save("data/test-images.npy", test_images) np.save("data/test-labels.npy", test_labels)

Then:

load 'convert/numpy'

NB. raze to a 60000x784 array, instead of 28x28x1 images they're vectors of 784 train_images =: ,"3 readnpy_pnumpy_ 'data/train-images.npy'

NB. convert labels (1, 2, etc.) to vectors (0 1 0 0 ...) vectorize =: 3 : '=&y (i.10)' train_labels =: vectorize"0 readnpy_pnumpy_ 'data/train-labels.npy'

test_images =: ,"3 readnpy_pnumpy_ 'data/test-images.npy' test_labels =: vectorize"0 readnpy_pnumpy_ 'data/test-labels.npy'

X =: train_images Y =: train_labels

NB. move (0,1) -> (-1, 1) scale =: (-&1)@:(*&2)

NB. initialize weights randomly init_weights =: 3 : 'scale"0 y ?@$ 0' w_hidden =: init_weights 784 128 w_output =: init_weights 128 10 weights_init =: w_hidden;w_output

dot =: +/ . *

mmax =: (]->./)

NB. softmax that won't blow up https://cs231n.github.io/linear-classify/#softmax softmax =: ((^ % (+/@:^)) @: mmax) d_softmax =: (([*(1&-)) @: softmax @: mmax)

sigmoid =: monad define % 1 + ^ - y ) sigmoid_ddx =: 3 : '(^-y) % ((1+^-y)^2)'

NB. forward prop forward =: dyad define 'l1 l2' =. x X =. y x_l1 =. X dot l1 x_sigmoid =. sigmoid x_l1 x_l2 =. x_sigmoid dot l2 prediction =. softmax"1 x_l2 (x_l1;x_l2;x_sigmoid;prediction) )

train =: dyad define 'X Y' =. x 'l1 l2' =. y 'x_l1 x_l2 x_sigmoid prediction' =. y forward X l2_err =. (2 * (Y - prediction) % {.$prediction) * (d_softmax"1 x_l2) l1_err =. (|: l2 dot (|: l2_err)) * (sigmoid_ddx"1 x_l1) l2_adj =. l2 + (|: x_sigmoid) dot l2_err l1_adj =. l1 + (|: X) dot l1_err (l1_adj;l2_adj) )

train_mnist =: (X;Y) & train

NB. smooth out a guess into a canonical estimate pickmax =: monad define max =. >./ y =&max y )

eq_arr1 =: */ @: =

point_accuracy =: monad define (+/ (pickmax"1 y) eq_arr1"1 test_labels) % {.$ test_labels )

NB. to store weights encode =: 3!:1

w_train =: (train_mnist ^: 10000) weights_init

NB. guess =: >3 { w_train forward test_images NB. point_accuracy guess NB.

(encode w_train) fwrite 'weights.jdat'

This happens to be 95% accurate! So this is a good result despite the fact that training a neural net on CPU is not so satisfying.

It also happens to be substantially slower than using plain NumPy, presumably since it only uses one core.