1. Duration
Monday, November 21st, 2022 - Saturday, November 26th, 2022
2. Learning Record
2.1 Fine-tuned Models
When images have only two classes which means the labels are "0" and "1", the `label_mode` of `tf.keras.utils.image_dataset_from_directory` should be `binary` and the loss_fn should be `BinaryCrossentropy` instead of `SparseCategoricalCrossentropy`. Otherwise, the accuracy will jitter around 50%, which means the model learns nothing.
2.2 Learned Swin Transformer
I watched the paper [1] and read the code.
It took me a few days to understand the `Window-based Self-Attention & Shifted Window-based Self-Attention` and the `Swin Transformer Block`.
2.3 Refactored the Code
I built a py file to store all functions for loading the datasets. Consequently, the code could work only by changing the basee_dir of the dataset.
3. Feeling
3.1 Glad
I was glad that the models ran well.
3.2 Perplexed
The models seemed to generate relatively good results on some small datasets. But they didn't work that well on the micro-expression dataset.
Besides ViT, SL-ViT, Swin Transformer, I also found that there are lots of other types of transformer models. It seems impossible to learn all of them.