Pytorch Functions (2)

nn.Module.register_buffer
- optimize 나 grad update 가 없고, tesnor를 저장해서 활용하는 용도

torch.tril

lower triangular matrix

self.register_buffer("bias", torch.tril(torch.ones(config.block_size, config.block_size))
              .view(1, 1, config.block_size, config.block_size))

tensor.split(split_size, dim)
- tensor를 split_size로 나눔. dim 은 어느 dimension 으로 나눌 것인지.
```
q, k ,v  = self.c_attn(x).split(self.n_embd, dim=2)
```
torch.view
- 기존의 tensor를 원하는 view size 에 맞춰서 resize하여 return 해줌.
transpose(-1,-2) 는 -1 dim 과 -2 dim을 transpose
- 따라서 transpose(1,2) == transpose(2,1) , transpose(-1,-2) == transpose(-2, -1)
```
B, T, C = x.size()
k = k.view(B, T, self.n_head, C // selg.n_head).transpose(1,2)
```

a @ b.transpose(-1, -2)

a matrix 와 b matrix 사이의 matmul

y = att @ v # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs)

tensor.masked_fill(self.bias[:,:,:T,:T]==0, float(‘-inf’))
bias라는 tensor의 해당하는 부분이 0 이라면, float(‘-inf’) MASK를 씌운다.
F.softmax(att, dim=-1)
마지막 dimension 에 softmax 취함
nn.Dropout(p=0.5)
During training, randomly zeros some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.
tensor.contiguous()
transpose(), view(), expand(), narrow() 등의 tensor 조작 작업을 하게 되면, tensor들의 메모리 저장상태가 변경되는 경우들이 많다.
이 때 contiguous()를 통해 메모리 저장 순서를 axis=0을 기준으로 정렬(?)해주는 함수이다.

Reference

pytorch 공식 문서
https://jimmy-ai.tistory.com/122