Tokenizer 제작하기
https://velog.io/@jieun9851/Tokenizer-%EC%A0%9C%EC%9E%91%ED%95%98%EA%B8%B0
WordPiece Tokenizer
https://kaya-dev.tistory.com/47
트랜스포머의 토큰 임베딩
https://seungseop.tistory.com/37
트랜스포머 구현
https://cpm0722.github.io/pytorch-implementation/transformer
https://julie-tech.tistory.com/130
nn.Embedding과 nn.Linear의 차이
https://velog.io/@wjdghcks6735/PyTorch-nn.Embedding-%EA%B3%BC-nn.Linear%EC%9D%98-%EC%B0%A8%EC%9D%B4
워드투벡터 이론
어텐션 시각화 강좌
https://www.youtube.com/watch?v=3MkjgwVGbw4
BPE 이론 정리
https://velog.io/@gwkoo/BPEByte-Pair-Encoding
Custom data를 이용해 WordPiece 방식으로 vocab 생성 및 BertTokenizer 훈련하기
https://kyunghyunlim.github.io/nlp/ml_ai/2021/10/14/customtokenizer.html
https://monologg.kr/2020/04/27/wordpiece-vocab/
(UNK ratio 비교)
한국어 전처리 패키지
BERT 구현
https://paul-hyun.github.io/bert-01/
https://paul-hyun.github.io/bert-02/
BERT 이론
https://yeong-jin-data-blog.tistory.com/entry/Transfomer-BERT
https://coding-start.tistory.com/416
https://velog.io/@gmlwlswldbs/BERTBidirectional-Encoder-Representations-from-Transformers
https://choice-life.tistory.com/25
https://moondol-ai.tistory.com/463
https://vhrehfdl.tistory.com/15
https://modulabs.co.kr/blog/bert-positional-embedding-layer/
포지셔널 임베딩 레이어 구현: https://modulabs.co.kr/blog/bert-positional-embedding-layer/