文章预览
这篇文章我们将一起深入探讨计算机视觉领域的一项重要贡献——Vision Transformer( ViT ),本文聚焦于ViT自发布以来的最新实现方法。 如何从零开始训练ViT? ViT架构图 1. 注意力层 注意力层示意图 让我们从Transformer编码器的核心组件开始:注意力层。 class Attention ( nn . Module ): def __init__ ( self , dim, heads= 8 , dim_head= 64 , dropout= 0 .) : super ().__init_ _ () inner_dim = dim_head * heads # Calculate the total inner dimension based on the number of attention heads and the dimension per head # Determine if a final projection layer is needed based on the number of heads and dimension per head project_out = not (heads == 1 and dim_head == dim) self .heads = heads # Store the number of attention heads self .scale = dim_head ** - 0 . 5 # Scaling factor for the attention scores (inverse of sqrt(dim_head)) self
………………………………