I adapted the xor example here from Python; they used NumPy.

NB. input data X =: 4 2 $ 0 0 0 1 1 0 1 1

NB. target data, ~: is 'not-eq' aka xor? Y =: , (i.2) ~:/ (i.2)

scale =: (-&1)@:(*&2)

NB. initialize weights b/w _1 and 1 NB. see https://code.jsoftware.com/wiki/Vocabulary/dollar#dyadic init_weights =: 3 : 'scale"0 y ?@$ 0'

w_hidden =: init_weights 2 2 w_output =: init_weights 2 b_hidden =: init_weights 2 b_output =: scale ? 0

dot =: +/ . *

sigmoid =: monad define % 1 + ^ - y ) sigmoid_ddx =: 3 : 'y * (1-y)'

NB. forward prop forward =: dyad define 'WH WO BH BO' =. x hidden_layer_output =. sigmoid (BH +"1 X (dot "1 2) WH) prediction =. sigmoid (BO + WO dot"1 hidden_layer_output) (hidden_layer_output;prediction) )

train =: dyad define 'X Y' =. x 'WH WO BH BO' =. y 'hidden_layer_output prediction' =. y forward X l1_err =. Y - prediction l1_delta =. l1_err * sigmoid_ddx prediction hidden_err =. l1_delta */ WO hidden_delta =. hidden_err * sigmoid_ddx hidden_layer_output WH_adj =. WH + (|: X) dot hidden_delta WO_adj =. WO + (|: hidden_layer_output) dot l1_delta BH_adj =. +/ BH,hidden_delta BO_adj =. +/ BO,l1_delta (WH_adj;WO_adj;BH_adj;BO_adj) )

w_trained =: (((X;Y) & train) ^: 10000) (w_hidden;w_output;b_hidden;b_output) guess =: >1 { w_trained forward X

Compare to this K implementation for style.

As it happens, this J code is substantially faster than the equivalent using NumPy (0.13s vs. 0.59s).

I'm quite curious as to why the J is so much more performant. I read APL since 1978 recently and APL has quite a few differences as an array environment compared to conventional programming languages.