Nn sequential grayscale

12/29/2023

After gzip compression, the file size is 9,912,422 bytes, for a lossless compression ratio of 4.7x. For reference, the entire MNIST training dataset, uncompressed is 28*28*8 * 60*1000 / 8 = 47,040,000 bytes. Loss does not decrease significantly after 1024 bits, that appears to be best the Autoencoder can accomplish. This re-affirms that 512 bits - which corresponds to 12x (lossy) compression - is a good trade-off, or 1024 bits for 10% less loss. overall, based on these curves, encodim_dims = 64 and quantize_bits = 8 appears to be a good trade-off ( total_bits = 64*8 = 512 bits)Īlternatively we can plot total_bits = encoding_dims * quantize_bits on the x-axis:.128 dimensions is the maximum required, since the next jump to 256 yield no significant decrease in loss.each float32 in the encoding stores around 8 bits of useful information (out of 32), since all of the curves flatten out after 8 bits.The results can be plotted to show the loss per encoding_dims, per quantize_bits:

no_grad (): output = autoencoder ( imgs, quantize_bits = quantize_bits ) loss += distance ( output, imgs ) Results MSELoss () for quantize_bits in : loss = 0 for imgs, _ in autoencoder_train_dataloader : imgs = Variable ( imgs ). to ( device ) output = autoencoder ( imgs ) loss = distance ( output, imgs ) optimizer. parameters (), lr = 0.001 ) num_epochs = 50 for epoch in range ( num_epochs ): for imgs, _ in autoencoder_train_dataloader : imgs = Variable ( imgs ). The arrows mark the departure from a vanilla Autoencoder:įor encoding_dims in : # train autoencoder = Autoencoder ( encoding_dims = encoding_dims ). The code is a straightforward Autoencoder neural network implemented in Pytorch, with some additional transformations in the forward() function to implement quantization. Since I will want to quantize the bits between the encoder and a decoder, I use the sigmoid() function to get the encoder's output to be between 0 and 1, and then the inverse, the logit() function before feeding back to the decoder. To answer these questions, I took a simple Autoencoder neural network with a Linear+ReLu encoder and a Linear+Sigmoid decoder layer. A related question is, what is the "right" number of encoding dimensions to pick for Autoencoders? But are all those 1024 bits really needed? Intuitively the entire float32 space is probably not used. What is the compression ratio? With a CUDA/GPU, those 32 dimensions are actually 32 float32's, so it's 32x32 = 1024 bits, which corresponds to 6.1x (lossy) compression. In this experiment I wanted to understand the compression ratio of Autoencoders: how much information (how many bits) does an Autoencoder encode in the encoding dimensions? Let's say an autoencoder is able to encode a 28x28 grayscale MNIST image (28x28x8 bits = 6272 bits) in a 32 dimensional encoding space with acceptable reconstruction loss.

0 Comments

Nn sequential grayscale

Leave a Reply.

Author

Archives

Categories