Pages

Pages

Wednesday, January 18, 2023

Tensorflow 1.x to 2.x

I changed the only function from Tensorflow 1.x of TempoGAN that I mentioned in this post, from

tf.contrib.layers.batch_norm()

to

tf.compat.v1.layers.batch_normalization()

and, of course, the pre-trained weights are useless now that I changed the net structure. So, I need to retrain it, but there is no script to do it. I can do one myself and be very careful when choosing the parameters. Or, I use Google Colab to run the original net for me.

You may have noticed that Google Colab does not support Tensorflow 1.x anymore, right? But no worries. I've found a way to overcome this with miniconda here.

Tuesday, January 10, 2023

I'm back

Hello. It's been a a while. Not yet back to normal, but as normal as it can be.

I'm here to tell about an error I got while trying to run the TempoGAN on a brand new server. The error happens when "session.run" is called, and it gives:

Internal: Blas SGEMM launch failed : m=216000, n=32, k=4

If you search on the internet, everyone will tell this is due to lack of memory in the GPU. Ok, this net is quite large but... I have 48GB available to me! This net can't be that greedy.

So, what did I found? Yes, it needs more memory in the graphics card... BECAUSE IT IS NOT ALLOCATING ANY!  The code is from 2018, the GPU architecture is modern, and cuBLAS from CUDA 10 does not know how to do its job.

I'm sorry, but you and I will have to upgrade the code to Tensorflow 2.X.

Good luck for us.