[Coursera/IBM course #3] Advanced Keras Techniques

728x90

[IBM AI course #3] Deep Learning & Neural Networks with Keras

Advanced Keras Techniques

Custom Training Loops: 고급 학습 전략 구현 가능.
Custom Layers: tf.keras.layers.Layer 상속 후 build 및 call 정의.
Custom Callbacks: 학습 중 사용자 정의 로직 실행.

Keras의 Custom Training Loop

기본 fit() 대신 직접 학습 루프 구현 → 유연성 향상

Dataset, Model, Optimizer, Loss function

방법

데이터셋을 반복(iterate)
모델 예측 수행
손실(loss) 계산
tf.GradientTape를 이용해 기울기(gradient) 계산
옵티마이저를 통해 모델 가중치 업데이트

장점

사용자 정의 손실 함수 및 메트릭 구현 가능

고급 로깅 및 모니터링 기능 추가 가능

연구에 적합한 유연성 제공

사용자 정의 연산 및 레이어 통합 용이

epochs = 2
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
for epoch in range(epochs):
    print(f'Start of epoch {epoch + 1}')

    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
    # 에폭과 배치 단위로 데이터 반복
        with tf.GradientTape() as tape: # 기울기 계산 준비
            logits = model(x_batch_train, training=True)  # Forward pass. 모델 예측
            loss_value = loss_fn(y_batch_train, logits)  # Compute loss. 예측값과 실제값 간의 오차 측정

        # Compute gradients and update weights
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights)) # 모델 가중치 업데이트

        # Logging the loss every 200 steps
        if step % 200 == 0:
            print(f'Epoch {epoch + 1} Step {step}: Loss = {loss_value.numpy()}')

Keras Tuner를 활용한 하이퍼파라미터 튜닝

하이퍼파라미터는 학습 전에 설정되는 값 (ex. 학습률, 층 수, 배치 크기) → 사용자 정의 변수

Keras Tuner는 이를 자동화해주는 라이브러리.

Keras Tuner 사용 흐름

모델 생성 함수 정의 (build_model)
튜너 객체 생성 (RandomSearch, Hyperband 등)
tuner.search(...)로 탐색
최적 하이퍼파라미터 추출 후 모델 학습

# 1. 모델 생성 함수 정의
def build_model(hp):
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))
    model.add(keras.layers.Dense(
        units=hp.Int('units', min_value=32, max_value=128, step=32),
        activation='relu'
    ))
    model.add(keras.layers.Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])
        ),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

# 2. 튜너 객체 생성
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=1,
    directory='my_dir',
    project_name='mnist_tuning'
)

# 3. 탐색 실행
tuner.search(x_train, y_train, epochs=5, validation_split=0.2)

# 4. 최적 하이퍼파라미터 추출 후 모델 학습
best_hp = tuner.get_best_hyperparameters(1)[0]
model = build_model(best_hp)
model.fit(x_train, y_train, epochs=10, validation_split=0.2)

모델 최적화 기법 소개 (Model Optimization Overview)

최적화는 학습 속도 향상, 하드웨어 자원 효율성 향상, 일반화 성능 향상에 기여.

가중치 초기화 (Weight Initialization)

Xavier (Glorot), He Initialization 사용으로 학습 안정화

kernel_initializer='he_normal'

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

model = Sequential([
    Dense(64, activation='relu', kernel_initializer='he_normal', input_shape=(100,)),
    Dense(10, activation='softmax')
])

#he_normal: ReLU 계열에 적합. Xavier 초기화는 glorot_uniform.

학습률 스케줄링 (Learning Rate Scheduling)

학습률을 점진적으로 감소시켜 안정적인 수렴 유도

LearningRateScheduler 사용.

from tensorflow.keras.callbacks import LearningRateScheduler
import tensorflow as tf

def scheduler(epoch, lr):
    if epoch < 10: # 10epoch까지는 고정, 이후에는 지수 감소.
        return lr
    else:
        return lr * tf.math.exp(-0.1)

callback = LearningRateScheduler(scheduler)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train, epochs=20, callbacks=[callback])

배치 정규화 (Batch Normalization)

활성화 출력을 정규화 → 빠르고 안정적인 학습 가능

from tensorflow.keras.layers import Dense, BatchNormalization

model = Sequential([
    Dense(128, input_shape=(100,), activation='relu'),
    BatchNormalization(),
    Dense(10, activation='softmax')
])

혼합 정밀도 학습 (Mixed Precision Training)

float16과 float32를 병용하여 GPU 활용률 및 속도 향상

tf.keras.mixed_precision.set_global_policy('mixed_float16')

지식 증류 (Knowledge Distillation)

큰 teacher 모델의 출력을 작은 student 모델이 모방

tf.GradientTape로 Teacher의 logit 예측 후 Student 학습.

가지치기 (Pruning)

중요도가 낮은 뉴런/연결 제거 → 모델 경량화

실행 속도 개선 및 배포 최적화

양자화 (Quantization)

float → int로 가중치 표현을 줄여서 추론 속도 향상 및 크기 감소

성능 손실 최소화하면서 임베디드/모바일 디바이스에 적합

# Mixed Precision Training
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16') 
#이후 모델 정의 시 float16을 사용하며, 학습 속도와 메모리 사용을 개선.

# Pruning
import tensorflow_model_optimization as tfmot
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
pruned_model = prune_low_magnitude(model)
# 학습 중 중요도가 낮은 가중치를 제거하여 모델 경량화.

# Quantization-aware Training
import tensorflow_model_optimization as tfmot
quant_aware_model = tfmot.quantization.keras.quantize_model(model)
# 정밀도를 낮춰 모델 크기 및 연산량 감소 (예: float32 → int8).

# 1. Teacher 모델 정의 및 학습 (사전 학습 완료된 상태라고 가정)
teacher = build_teacher_model()
teacher.train(x_train, y_train)

# 2. Student 모델 정의
student = build_student_model()

# 3. distillation loss 정의 (soft target 사용)
def distillation_loss(y_true, y_pred, teacher_pred, temperature):
    soft_labels = softmax(teacher_pred / temperature)
    soft_preds = softmax(y_pred / temperature)
    return Kullback_Leibler_Divergence(soft_labels, soft_preds) 
    

# 4. Knowledge Distillation 학습 루프
for epoch in range(num_epochs):
    for x_batch, y_batch in dataset:
        teacher_logits = teacher.predict(x_batch)
        
        with tf.GradientTape() as tape:
            student_logits = student(x_batch)
            loss = distillation_loss(y_batch, student_logits, teacher_logits, temperature=3.0)
		        # 결국, teacher에 대한 soft_label과 student가 예측한 soft_preds의 차이를 최소화 하도록 학습
        gradients = tape.gradient(loss, student.trainable_weights)
        optimizer.apply_gradients(zip(gradients, student.trainable_weights))

728x90

'🥇 certification logbook' 카테고리의 다른 글

[Coursera/IBM course #4] Linear Regression Prediction (0)	2025.06.14
[Coursera/IBM course #4] Dataset (0)	2025.06.07
[Coursera/IBM course #4] Tensors (2)	2025.06.07
[Coursera/IBM course #3] 강화학습(Reinforcement Learning) (2)	2025.06.04
[Coursera/IBM course #3] Unsupervised Learning in Keras (2)	2025.05.31
[Coursera/IBM course #3] Transformers in Keras (2)	2025.05.28
[Coursera/IBM course #3] TensorFlow for Image Processing (1)	2025.05.24
[Coursera/IBM course #3] TensorFlow 2.x (1)	2025.05.21

Advanced Keras Techniques

Keras의 Custom Training Loop

Keras Tuner를 활용한 하이퍼파라미터 튜닝

모델 최적화 기법 소개 (Model Optimization Overview)

가중치 초기화 (Weight Initialization)

학습률 스케줄링 (Learning Rate Scheduling)

배치 정규화 (Batch Normalization)

혼합 정밀도 학습 (Mixed Precision Training)

지식 증류 (Knowledge Distillation)

가지치기 (Pruning)

양자화 (Quantization)

'🥇 certification logbook' 카테고리의 다른 글

티스토리툴바