09_DL(Deep_Learning)

07_보스턴 집값 예측

chuuvelop 2025. 4. 24. 17:26

728x90

from tensorflow import keras
import pandas as pd

(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()

x_train.shape, x_test.shape

((404, 13), (102, 13))

y_train.shape

(404,)

y_train[:5]

array([15.2, 42.3, 50. , 21.1, 17.7])

보스턴 집값 데이터 독립변수
- CRIM: 인구 1명당 범죄 발생 수
- ZN: 25,000평방 피트 이상의 주거 구역 비중
- INDUS: 소매업 외 상업이 차지하는 면적 비율
- CHAS: 찰스강 위치 변수(1: 강 주변, 0: 이외)
- NOX: 이산화질소 농도
- RM: 집의 평균 방 수
- AGE: 1940년 이전에 지어진 비율
- DIS: 5가지 보스턴 시 고용 시설까지의 거리
- RAD: 순환고속도로의 접근 용이성
- TAX: $10,000당 부동산 세율 총계
- PTRATIO: 지역별 학생과 교사 비율
- B: 지역별 흑인 비율
- LSTAT: 급여가 낮은 직업에 종사하는 인구 비율(%)
종속변수
- 가격(단위: $1,000)

type(x_train)

numpy.ndarray

train_df = pd.DataFrame(x_train)

train_df.head()

train_df.shape

(404, 13)

train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 404 entries, 0 to 403
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       404 non-null    float64
 1   1       404 non-null    float64
 2   2       404 non-null    float64
 3   3       404 non-null    float64
 4   4       404 non-null    float64
 5   5       404 non-null    float64
 6   6       404 non-null    float64
 7   7       404 non-null    float64
 8   8       404 non-null    float64
 9   9       404 non-null    float64
 10  10      404 non-null    float64
 11  11      404 non-null    float64
 12  12      404 non-null    float64
dtypes: float64(13)
memory usage: 41.2 KB

train_df.describe()

모델 설계

model = keras.Sequential()
#입력층
model.add(keras.Input(shape = (13,)))
# 은닉층1
model.add(keras.layers.Dense(32, activation = "relu"))
# 은닉층2
model.add(keras.layers.Dense(8, activation = "relu"))
# 출력층
model.add(keras.layers.Dense(1, activation = "linear"))

es_cb = keras.callbacks.EarlyStopping(patience = 8, restore_best_weights = True)

model.compile(optimizer = "adam", loss = "mean_squared_error",
metrics = ["r2_score", "root_mean_squared_error", "mean_absolute_error"])

model.summary()

model.fit(x_train, y_train, epochs = 200, validation_split = 0.2, callbacks = [es_cb],
batch_size = 32)

...

모델 성능 평가

model.evaluate(x_test, y_test)

[48.103546142578125, 0.4221366047859192, 6.935671806335449, 5.088674545288086]

model.predict(x_test[[0]])

array([[8.605511]], dtype=float32)

y_test[0]

7.2

y_pred = model.predict(x_test).flatten()

for i in range(10):
    label = y_test[i]
    prediction = y_pred[i]
    print(f"실제 가격 : {label:.3f}, 예상 가격: {prediction:.3f}")

실제 가격 : 7.200, 예상 가격: 8.606
실제 가격 : 18.800, 예상 가격: 19.714
실제 가격 : 19.000, 예상 가격: 25.463
실제 가격 : 27.000, 예상 가격: 27.813
실제 가격 : 22.200, 예상 가격: 26.278
실제 가격 : 24.500, 예상 가격: 24.071
실제 가격 : 31.200, 예상 가격: 28.733
실제 가격 : 22.900, 예상 가격: 27.383
실제 가격 : 20.500, 예상 가격: 22.952
실제 가격 : 23.200, 예상 가격: 21.117

728x90