1. Duration

Monday, November 14th, 2022 - Saturday, November 19th, 2022

2. Learning Record

2.1 Fine-Tune the ViT Model

Fine-tuned the vision transformer model on the cat_and_dag dataset and the flowers dataset. The model achieved approximately 86% accuracy on the flowers dataset.

I also refactored the code for micro-expression spotting to fit the input shape of the vision transformer. But the result was as bad as shit. I needed time to fine-tune the hyperparameters.

2.2 Learned Swin Vision Transformer

I found a different structure of vision transformer called Swin Vit. I watched the video and planned to read the code the next week.

2.3 Learned SL-Vit

I read the paper [1] and refactored the code using TensorFlow to make its structure be similar to the vision transformer code shown in the d2l notebook.

The code worked well and gave a similar result on the cat_and_dog dataset and the flowers dataset.

I read the code carefully and found the SL-Vit changes the patch embedding and multilayer attention module of the vision transformer.

3. Feeling

I don't have any good feelings even though the code ran well as I was blocked in my dorm for ten days.

4. Reference

[1]S. H. Lee, S. Lee, and B. C. Song, “Vision Transformer for Small-Size Datasets.” arXiv, 2021. doi: 10.48550/ARXIV.2112.13492.

Eddies Learning Record 24

Eddie He