Home | Jin’s Tech Blog

Light

Dark

Pytorch Randomness Control하기

Apr 03, 2023 About 1 min

Summary 핵심 요약을 하면 아래 코드와 같다. torch.manual_seed(random_seed) torch.cuda.manual_seed(random_seed) torch.cuda.manual_seed_all(random_seed) # if use multi-GPU torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False np.random.seed(random_seed) random.seed(random_seed) Reference Reproducible PyTorch를 위한 ra... Read More

#pytorch #reproductivity #randomness
Tokenizer Summary (in progress)

Apr 02, 2023 About 5 mins

Intro 분절 규칙의 update 과정 공백 분절 -> 구두점 및 여러 symbol들이 포함됨 -> 구두점 분절 -> ex) [Don, ‘, t] 와 같이 분절이 됨 -> new rule 이 필요하다! BERT style tokenizer doesn’t fit to the GPT model input Transformer XL 은 공백/구두점을 기준으로 분절 수행 -> vocab size = 267,735 개 large size embedding matrix -> memory 이슈가 발생할 수도 있음 (Adaptive Embedding ... Read More

#tokenizer #nlp
Probability 기초 정리

Mar 29, 2023 About 6 mins

Elements of probability sample space set of outcomes probability measures event 가 입력으로 들어왔을 떄 거기에 대응되는 함수 몇몇 properites Random Variables Probability Measure for Random Variable CDF culmulative PMF (Mass Function) discrete 한 경우 ... Read More

#probability #random-variable
DeBERTa, Decoding-enhanced BERT with Disentangled Attention

Mar 22, 2023 About 2 mins

Summary 본 논문에서는 Disentangled Attention 과 Enhanced Mask Decoder 를 활용하여 BERT & RoBERTa 의 성능 도약을 시도하였다. Disentangled Attention 은 context 와 position 정보를 explicit 하게 나누어서 연산을 수행한다. Enhanced Mask Decoder 는 pre-training 과정에서 각 단어에 대한 absolute position 정보를 활용한다. softmax layer를 거치기 전에 additional information 을 더해준다. 또한... Read More

#nlp #bert #attention
Python Segment Tree

Mar 04, 2023 About 1 min

Segment Tree 언제 쓰는가 특정 구간에 속한 연산 (합, 최솟값, 최댓값 등) 을 할 때, 선형탐색 대비 더욱 빠르게 가능. 누적합과의 차이 누적합은 합만을 다룸 어떤 값이 업데이트 될 경우, O(N) 으로 업데이트 해야 함 segment tree는 O(logN) 으로 업데이트 가능 Reference https://yiyj1030.tistory.com/491#:~:text=%EC%84%B8%EA%B7%B8%EB%A8%BC%ED%8A%B8%2... Read More

#algorithm #segment-tree
Decoding Methods For Language Generation (sampling, top-K sampling, top-p samping)

Feb 26, 2023 About 5 mins

Sampling sampling 의 도입 배경 기존의 beam search는 이전 단어를 기반으로 더 높은 확률을 가지는 방향으로 단어를 선택한다. 반면, 사람의 단어 선택은 predictable하지 않고 boring 하지 않다. 이에 좀 더 creative 하고 boring 하지 않은 단어를 선택하는 방법들이 등장하기 시작한다. sampling 이란 sampling 방법론은 p(w_t | w_t-1:w_1) 의 확률이 주어졌을 때 randomly picking 하는 방법론이다. 위의 그림과 같이 P(w|”The”) 에서 (“car”) 가 sampling 되었고, P(... Read More

#nlp #decoding-strategy
Greedy Search & Beam Search

Feb 25, 2023 About 1 min

Greedy Search Greedy Decoding은 해당 시점에서 가장 확률이 높은 단어를 선택하는 방식 시간복잡도 면에서 우수 / 최종 정확도 not good 1, 2등 사이의 확률 분포가 차이가 미미하다면, 2등도 고려해줘야 하는데 그러지 못하고 1등만 고려함. 예측이 한 번이라도 틀리게 될 경우 치명적 문제 발생 Beam Search 에서 k=1인 경우, Greddy Decoding 이 된다. Beam Search Beam Search 는 promising beam k개를 선별하여 진행하는 방식이다. 가장 좋은 방법은 나올 수 있는 모든 경우의 수를 고... Read More

#nlp #decoding-strategy
Sentence-BERT 논문 리뷰

Feb 21, 2023 About 4 mins

Abstract BERT 가 sentence-pair regression task (STS와 같은) 에서 우수한 성능을 보이고 있지만, 만약 10,000개의 sentences 에 대해 most similar pair를 얻기 위해서는 약 65시간이 소요 본 논문에서는 pretrained bert 모델의 modification 을 활용한 sentecne-BERT 를 소개 siamese and triplet network structure를 BERT/RoBERTa 기반으로 취함 cosine simliarity 를 통해 비교가능한 semantically meaningful sente... Read More

#nlp #paper #sbert #sts #sentence-bert
FP16, FP32, BF16, Mixed Precision

Jan 27, 2023 About 2 mins

computer에서 실수 표현하기 실수는 수의 범위가 무한하기 때문에, 이를 bit로 정확하게 표현하는 것은 한계가 있다. 이를 표현하기 위해 floating point (부동소수점) 를 활용 sign(부호), exponent(지수), fraction(가수) sign 부호는 1bit exponent 는 실수의 정수를 나타내는 부분으로, 이 부분의 bit가 크면 더 큰 범위의 수를 표현할 수 있음. Mantissa (francion) 은 실수의 소수를 나타내는 부분으로, 이 부분의 bit가 크면 더 정확하게 실수를 표현할 수 있음. 어떤 실수를 정규형으로 표현하면, N = (-... Read More

#nlp #data-type #floating-point
Pytorch Functions (3)

Jan 17, 2023 About 1 min

dataclass from dataclass import dataclass @dataclass class GPTConfig: # able to make a class simple block_size : int = 1024 vocab_size : int = 50257 n_layer : int = 12 n_head : int = 12 n_embd : int = 768 dropout : float = 0.1 nn.ModuleDict(), nn.ModuleList() nn.ModuleDict(dict( wte = nn.Embedding(co... Read More

#pytorch #framework
GPT-1 논문리뷰

Jan 05, 2023 About 5 mins

GPT1 Abstract unlabeled data 는 labeled data 대비 양이 방대하게 많고, 적은 양의 labeled data 로 task-specific 한 모델들을 각각 만드는 것은 어렵다. generative pre-training 한 방법을 활용해 모델을 학습시키고, 이 후 discriminative fine-tuning 의 approach를 제안한다. 이 방법은 a wide range of benchmarks for natural language understanding 에서 outperformance 를 이뤘다. ... Read More

#language-model #nlp #paper #gpt
Pytorch Functions (2)

Jan 04, 2023 About 2 mins

nn.Module.register_buffer optimize 나 grad update 가 없고, tesnor를 저장해서 활용하는 용도 torch.tril lower triangular matrix self.register_buffer("bias", torch.tril(torch.ones(config.block_size, config.block_size)) .view(1, 1, config.block_size, config.block_size)) tensor.sp... Read More

#pytorch #framework
GPT Pytorch implementation - model.py

Jan 04, 2023 About 4 mins

Causal Attention class CausalSelfAttention(nn.Module): def __init__(self, config): super().__init__() assert config.n_embd % config.n_head == 0 # key, query, value projections for all heads, but in a batch self.c_attn = nn.Linear(config.n_embd, 3 * config.n_embd) # output projection self.c_pro... Read More

#pytorch #gpt
Pytorch Functions (1)

Dec 28, 2022 About 3 mins

gradient freeze 시키고 작업할 때 with torch.no_grad(): device setting device = "cuda" if torch.cuda.is_available() else "cpu" tensor.norm (torch tensor instance 에 대해) text_features /= text_features.norm(dim=-1, keepdim=True) # dim = -1 ; 마지막 차원에 대해서 연산하는 것을 norm 연산을 하는 것을 의미함. # i.e 3x3 matrix 라면, -> 각 행마다 하나... Read More

#pytorch #framework
BM25 Score

Dec 27, 2022 About 3 mins

BM25 TF-IDF 계열의 알고리즘이며, 좀 더 advanced version으로 SOTA 달성. (Elastic Search 에서도 활용) 쿼리에 포함되는 단어들이 특정 문서에서만 얼마나 더 자주 등장하는지 파악, 문서 별 유사도를 파악한다. (TF-IDF 와 목적은 동일하다.) TF IDF 단점 문서의 길이를 반영하여 계산하지 못한다. 그저 term frequency 즉, 문서에 존재 여부만 봄 score(D, Q) = sum (IDF(qi) * advanced_term_freduqncy(qi, D)) tfNorm (... Read More

#nlp #metric #bm25
TF-IDF (Term Frequency-Inverse Document Frequency)

Dec 26, 2022 About 2 mins

TF-IDF Term Frequency * Inverse Document Frequency 의 값을 의미한다. 사용하는 경우 1) 문서의 유사도를 구할 때 2) 검색 시스템에서 검색 결과의 중요도를 구할 때 3) 문서 내에서 특정 단어의 중요도를 구할 때 등에 주로 사용된다. 특정한 query Q = {q1, q2, … ,qn} 가 어떤 문서에서 해당 점수가 제일 높은지를 봄으로써, 가장 유사도가 있는지 등을 판별해 볼 수 있다. 높을수록 해당 쿼리와 특정 문서 사이의 유사도가 높다, 중요도가 크다. 라고 판단할 수 있다. ... Read More

#information-retrieval #metric #tf-idf
Natural Language Generation Metric 정리

Dec 26, 2022 About 2 mins

Perplexity 단어 그 자체로는 당혹감, 고난 등의 뜻을 가지고 있다. 모델이 문장을 생성할 때 얼마나 확신을 가지고, 혹은 가지기 못하고 해당 문장을 생성했는지를 말한다. 모델이 확신을 가지고 생성했으면 그 확률값은 크게 되고, 그 값의 역수를 취하기 때문에, 낮을수록 더 나은 performance를 의미한다. 수식 BLEU BLEU score 는 크게 두 부분으로 구성된다. Brievity Penalty * Precision Brievity Penalty = min (1, output length / r... Read More

#nlp #metric
ranking metric (MRR, MAP, NDCG) 공부

Dec 24, 2022 About 1 min

MRR (Mean Reciprocal Rank) Reciprocal Rank 의 평균 값을 나타낸다. Reciprocal Rank 란, model 이 positive 라고 prediction 한 item이 몇 번 째에 나오는지를 나타낸다. 예를 들어, 3번째에 등장하면 1/3 값, 첫 번째에 등장하면 1/1 값이다. (앞에 등장할수록 더 큰 값을 갖는다.) 장점 가장 앞에 positive item이 언제 등장하는지에 초점을 맞춘다면, 효과적인 metric 단점 뒤에 얼마나 많이 positive item이 나오는지 등은 고려하지 않는다. ... Read More

#ranking #metric #information-retrieval
C++ 내가 보려고 정리하는 syntax 정리

Dec 24, 2022 About 2 mins

strcpy char[11] name void copy_str(char* str){ // char[11] 이라도 입력은, char의 시작 포인터 strcpy_s(name, str);// name 이 복사될 곳, str이 변수 // char[] 에 복사되도록. } // 변수명은 strcpy , 그 입력은 모두 char[] 배열 int,int pair 형 변수 정의 #define pii pair<int, int> set 에 대한 iterator 변수를 받아옴 & set 에 insert auto bg = set_.begin... Read More

#c++#coding-test
내가 보려고 정리하는 코딩 테스트 테크닉

Dec 24, 2022 About 1 min

C++ 기준 1초 수행시간은 대략 1억의 연산이 든다. 함수 수행횟수, 배열 사이즈 등을 유심히 보고 → variable 사이즈 잡기 lower_boud, upper_bound set에서 뭔가 낀 data(날짜 등)를 찾아낼 때 유용하다. 계속 증가하는 변수가 있는데, 고정된 사이즈를 사용해야할 경우, two pointer - st, end pointer 양음 인덱스를 활용한 flag (400회 수행하거나 그런 경우?) 적절한 itemization이 가능하고, 각각의 요소에 접근해서 처리를 해야 하는 경우는 그냥 array로 접근... Read More

#c++#coding-test
Deep Spectral Methods

Jun 20, 2022 About 2 mins

Abstract Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically meaningful segments without any labeled data interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but... Read More

#self-supervised-learning #segmentation #unsupervised-learning #computer-vision #localization #eigendecomposition
ML/DL knowledges

Apr 04, 2022 About 1 min

Sigmoid VS Softmax Sigmoid probabilities produced by a Sigmoid are independent. (변수들이 독립적으로 계산됨) they are not constrained to sum to one. The reason for this is because the Sigmoid looks at each raw output value separately. Used for Binary Classification in the Logistic Regression model ... Read More

#modeling
HiP(Hirarchical Perceiver) review

Mar 22, 2022 About 4 mins

Summary 이번에 살펴본 논문은 Hierarchical Perceiver(HiP) 라는 논문입니다. 기존의 Perceiver 모델이 Transformer 기반의 모델 구조, cross attention을 통해 다양한 modality를 다룰 수 있었습니다. 한편, High resolution을 가진 Image, Video 등의 입력에 대해서는 처리가 어려운 문제점을 지적합니다. 따라서 그에 대한 해결책으로 Hierarchical Perceiver (이하 HiP)를 제안합니다. HiP에서는 flatten operation을 통해서 input을 각 그룹으로 나누어, locality를 보존시키며 연산을 수행합니다. 각... Read More

#multi-modal-learning
Perceiver IO review

Mar 09, 2022 About 6 mins

Summary 이번에 포스팅할 논문은 Perceiver IO: A General Architecture for Structured Inputs & Outputs 입니다. Perceiver IO는 Perceiver의 출력이 단순히 classification과 같은 단순한 task에 국한되는 점을 보완하여, 다양한 structure의 Input과 Output을 가질 수 있도록 개선된 모델구조를 가지고 있습니다. (이전 포스팅은 여기를 참고해 주세요) 그 방법을 간단하게 소개하자면, task별 Input에 대한 Query vector를 구성하여 Perceiver model 내부에서 얻어진 K,V 값과 사전에 ... Read More

#multi-modal-learning
The Transformer Family

Mar 02, 2022 About 1 min

Vanila Transformer self-attention is applied in each encoder and decoer. cross-attention is applied between encoder and decoder. dot(Query vector, Key vector) = attention score. and then dot(attention score, Value vector) = attention value. No long term dependency Linformer Reference https://lilianweng.github.io/posts/202... Read More

#transformer
Perceiver review

Mar 02, 2022 About 4 mins

Summary 이번에 읽어본 논문은 Percevier: General Perception with Iterative Attention 입니다. Perceiver는 다양한 data modality를 다루기 위한 새로운 model architecture를 제안합니다. domain specific assumption 을 줄이기 위해 많은 고민을 한 논문이라는 생각이 듭니다. 가장 인상깊었던 부분은 image input에 대해 2D conv 같은 preprocess 없이 약 50,000 pixel에 직접적으로 attending 합니다. (conv를 사용하게 되면 locally inductive bias 를 가지게 되는데... Read More

#multi-modal-learning
GraphSAGE review

Feb 08, 2022 About 5 mins

Summary 기존 상황 및 문제점 large graph에서의 노드들의 low-dimensional embeddings 은 유용하나, 많은 방법론들이 embedding 을 학습 시 모든 node들이 존재해야 한다는 단점을 가지고 있다. 이러한 방법론은 transductive 하고 unseen nodes에 대해서 일반화가 되지 않는다는 단점을 가지고 있다. 본 논문의 제안 이에 본 논문에서는 GraphSAGE 라는 general inductive framework을 제안한다. GraphSAGE 에서는 각 node에 대한 embedding을 학습하기보다는, embedding... Read More

#graph #unsupervised-learning #large-graph
ViLBERT review

Feb 02, 2022 About 3 mins

Summary 본 논문에서는 Image 와 Text modality를 함께 학습할 수 있는 multimodal model architecture 인 ViLBERT(Vision-and-Language BERT)를 제안합니다. 아이디어를 간단히 소개하자면, visual input과 textual input 이 각각의 stream으로 입력되고, co-attentional transformer layer를 통해 서로 다른 modality 사이에서 interaction하며 학습합니다. ViLBERT는 Conceptual Captions dataset 기반의 2가지의 self-supervised learning 을 통해 pr... Read More

#multi-modal-learning #self-supervised-learning #pretrain
Attention Branch Network review

Nov 08, 2020 About 3 mins

이번 포스트는 지난 6개월간 인턴 생활을 하며 흥미롭게 읽었던 논문들 중 하나인 Attention Branch Network : Learning of Attention Mechanism for Visual Explanation 을 리뷰하는 포스트를 해보려 합니다. 지난 KIST-Europe 인턴 생활동안 Attention의 기술에 대해 관심을 두고 파고들었고 자연어처리에서 사용되는 attention 기법과 컴퓨터 비전에서 처리되는 attention 방법론에 대해 많은 literature study를 했었습니다. 이번 논문은 그런 논문들 중 하나로 소개해 보고자 합니다. 들어가며 모델이 클래스를 예측하는 데에 ... Read More

#attention #cnn #computer-vision
SMILES Convolution Fingerprint(SCFP) review

Apr 25, 2020 About 4 mins

들어가며 이번에 읽은 논문은 Convolutional neural network based on SMILES representation of compounds for detecting chemical motif review 이다. 해당 논문에서는 CNN을 활용해, 기존의 fingerprint보다 성능이 좋은 새로운 fingerprint 를 정의한다. 논문에 따르면, ECFP( known as morgan fingerprint) 보다도 성능이 좋다고 명시하고 있다. 모델의 학습과 평가에 사용된 데이터는 TOX21 dataset 이며, metric으로는 ROC-AUC score를 활용하였다. CNN에는 DNA ... Read More

#smiles #tox21-challenge #chemical-compound #cnn