๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ (Large Language Model)

LLM ์‹ค์Šต์„ ์œ„ํ•œ ๊ธฐ๋ณธ ์ •๋ณด

LLM์˜ ๊ฐœ๋…๊ณผ ๊ตฌ์กฐ

์‹ค์Šต์„ ์œ„ํ•œ ์„ ์ˆ˜ ์ง€์‹

์‹ค์Šต ํ™˜๊ฒฝ ๋ฐ ๋„๊ตฌ

CPU/GPU ์ƒ˜ํ”Œ ์ฝ”๋“œ

1. CPU ๋ฒ„์ „ (๊ธฐ๋ณธ)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ (CPU)
model_name = "gpt2"  # ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ์‚ฌ์šฉ
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ์ž…๋ ฅ ํ…์ŠคํŠธ
input_text = "์ธ๊ณต์ง€๋Šฅ์ด ์„ธ์ƒ์„ ๋ฐ”๊พธ๋Š” ๋ฐฉ์‹์€"

# ํ† ํฐํ™” ๋ฐ ๋ชจ๋ธ ์ž…๋ ฅ
inputs = tokenizer(input_text, return_tensors="pt")

# ํ…์ŠคํŠธ ์ƒ์„ฑ (CPU์—์„œ ์‹คํ–‰)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        temperature=0.7
    )

# ๊ฒฐ๊ณผ ๋””์ฝ”๋”ฉ
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

2. GPU ๋ฒ„์ „ (๊ฐ€์†ํ™”)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ (GPU ์‚ฌ์šฉ)
model_name = "gpt2"  # ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ์‚ฌ์šฉ
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ์ž…๋ ฅ ํ…์ŠคํŠธ
input_text = "์ธ๊ณต์ง€๋Šฅ์ด ์„ธ์ƒ์„ ๋ฐ”๊พธ๋Š” ๋ฐฉ์‹์€"

# ํ† ํฐํ™” ๋ฐ ๋ชจ๋ธ ์ž…๋ ฅ (GPU๋กœ ์ด๋™)
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# ํ…์ŠคํŠธ ์ƒ์„ฑ (GPU์—์„œ ์‹คํ–‰)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )

# ๊ฒฐ๊ณผ ๋””์ฝ”๋”ฉ (๋‹ค์‹œ CPU๋กœ ์ด๋™)
generated_text = tokenizer.decode(outputs[0].cpu(), skip_special_tokens=True)
print(generated_text)

3. ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ธ GPU ์‚ฌ์šฉ (ํฐ ๋ชจ๋ธ์šฉ)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# 4๋น„ํŠธ ์–‘์žํ™” ์„ค์ • (๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# ๋ชจ๋ธ ๋กœ๋“œ (4๋น„ํŠธ ์–‘์žํ™” ์ ์šฉ)
model_name = "EleutherAI/polyglot-ko-1.3b"  # ํ•œ๊ตญ์–ด ๋ชจ๋ธ
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"  # ์ž๋™์œผ๋กœ GPU/CPU ํ• ๋‹น
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ ์‚ฌ์šฉ
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# ํ…์ŠคํŠธ ์ƒ์„ฑ
result = pipe(
    "์˜ค๋Š˜ ๋‚ ์”จ๊ฐ€ ์ข‹์•„์„œ",
    max_length=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

print(result[0]['generated_text'])

์‹ค์Šต์˜ ์ฃผ์š” ๋‹จ๊ณ„

  1. ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง

    • LLM์— ๋ช…ํ™•ํ•œ ์ง€์‹œ๋ฅผ ๋‚ด๋ฆฌ๋Š” ํ”„๋กฌํ”„ํŠธ ์ž‘์„ฑ๋ฒ• ์—ฐ์Šต
    • ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ ์œ ํ˜•, ๊ตฌ์„ฑ ์š”์†Œ, ๋งค๊ฐœ๋ณ€์ˆ˜ ์‹ค์Šต
  2. ๊ธฐ๋ณธ ๋ชจ๋ธ ํ™œ์šฉ

    • ์‚ฌ์ „ ํ•™์Šต๋œ LLM์„ ๋ถˆ๋Ÿฌ์™€ ํ…์ŠคํŠธ ์ƒ์„ฑ, ์š”์•ฝ, ๋ฒˆ์—ญ ๋“ฑ ๊ธฐ๋ณธ ํƒœ์Šคํฌ ์‹ค์Šต
    • API ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ๊ฐ„๋‹จํ•œ ์ฑ—๋ด‡ ๊ตฌ์ถ•
  3. RAG(๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ) ์‹ค์Šต

    • ๋ฌธ์„œ ๊ฒ€์ƒ‰๊ณผ LLM์„ ๊ฒฐํ•ฉํ•ด ์งˆ๋ฌธ์— ๋‹ต๋ณ€ํ•˜๋Š” RAG ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌํ˜„
    • llamaIndex, LangChain ๋“ฑ์œผ๋กœ ๋ฒกํ„ฐDB ๊ตฌ์ถ•, ์ธ๋ฑ์‹ฑ, ๊ฒ€์ƒ‰, ํ”„๋กฌํ”„ํŠธ ๊ฒฐํ•ฉ ์‹ค์Šต
  4. ๋ชจ๋ธ ๋ฏธ์„ธ์กฐ์ •(Fine-tuning)

    • ์ž์‹ ๋งŒ์˜ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ LLM์„ ์ถ”๊ฐ€ ํ•™์Šต(ํŒŒ์ธํŠœ๋‹)
    • QLoRA, LoRA ๋“ฑ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐ ํšจ์œจ์  ๋ฏธ์„ธ์กฐ์ • ๊ธฐ๋ฒ• ํ™œ์šฉ
  5. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ

    • LLM์„ ํ™œ์šฉํ•œ ์ฑ—๋ด‡, ๋ฌธ์„œ ์š”์•ฝ, ์งˆ์˜์‘๋‹ต ๋“ฑ ์‹ค์ œ ์„œ๋น„์Šค ๊ฐœ๋ฐœ ์‹ค์Šต

์‹ค์Šต ์˜ˆ์ œ ๋ฐ ์ฐธ๊ณ  ์ž๋ฃŒ

์‹ค์Šต ์‹œ ์œ ์˜์‚ฌํ•ญ

์ •๋ฆฌ

ref

What Else?
inflearn react api server -50% ํ• ์ธ์ฟ ํฐ: 20652-ab1f1cd4c373 buy me a coffee