논문 연구/논문 분석

BERT 구현 공부자료

상솜공방 2024. 8. 5. 14:25

Tokenizer 제작하기

https://velog.io/@jieun9851/Tokenizer-%EC%A0%9C%EC%9E%91%ED%95%98%EA%B8%B0

 

WordPiece Tokenizer

https://kaya-dev.tistory.com/47

 

트랜스포머의 토큰 임베딩

https://seungseop.tistory.com/37

 

트랜스포머 구현

https://cpm0722.github.io/pytorch-implementation/transformer

https://julie-tech.tistory.com/130

 

nn.Embedding과 nn.Linear의 차이

https://velog.io/@wjdghcks6735/PyTorch-nn.Embedding-%EA%B3%BC-nn.Linear%EC%9D%98-%EC%B0%A8%EC%9D%B4

https://kc9302.tistory.com/15

 

워드투벡터 이론

https://wikidocs.net/22660

 

어텐션 시각화 강좌

https://www.youtube.com/watch?v=3MkjgwVGbw4

 

BPE 이론 정리

https://velog.io/@gwkoo/BPEByte-Pair-Encoding

 

Custom data를 이용해 WordPiece 방식으로 vocab 생성 및 BertTokenizer 훈련하기

https://kyunghyunlim.github.io/nlp/ml_ai/2021/10/14/customtokenizer.html

https://monologg.kr/2020/04/27/wordpiece-vocab/

(UNK ratio 비교)

 

한국어 전처리 패키지

https://velog.io/@ganta/%ED%95%9C%EA%B5%AD%EC%96%B4-%EC%A0%84%EC%B2%98%EB%A6%AC-%ED%8C%A8%ED%82%A4%EC%A7%80Text-Preprocessing-Tools-for-Korean-Text

 

BERT 구현

https://velog.io/@lgd1820/pytorch%EB%A1%9C-BERT-%EA%B5%AC%ED%98%84%ED%95%98%EA%B8%B0-%EC%9D%B4%EB%A1%A0

https://paul-hyun.github.io/bert-01/

https://paul-hyun.github.io/bert-02/

 

BERT 이론

https://yeong-jin-data-blog.tistory.com/entry/Transfomer-BERT

https://coding-start.tistory.com/416

https://velog.io/@gmlwlswldbs/BERTBidirectional-Encoder-Representations-from-Transformers

https://choice-life.tistory.com/25

https://moondol-ai.tistory.com/463

https://wikidocs.net/115055

https://vhrehfdl.tistory.com/15

https://modulabs.co.kr/blog/bert-positional-embedding-layer/

포지셔널 임베딩 레이어 구현: https://modulabs.co.kr/blog/bert-positional-embedding-layer/