데이터 분석 ML/DL 프로젝트(4)

개발일지2

데이터 분석 ML/DL 프로젝트(4)

shinyfood 2024. 1. 25. 00:26

728x90

요 며칠 골머리를 썩이던 라벨링작업을 XML로 되어있는 파일을 구해 그대로 뽑아와서... 쉽게 처리했다

하자마자 yolo모델은 정확도가 매우 올라가는 모습을 보여줬고, 현재 20시간정도짜리 훈련을 시켜놓고 온 상태.

어제 약 8시간 돌려놓은 모델로 드론과 새를 테스트 해봤는데 웬만큼 잘 잡히는 모습을 보여줬다.

이후 전국 관광객 데이터 + 서울 방문 관광객 비율로 전처리 및 시각화를 했고, 해당 데이터 중 전국 관광객 데이터는 월별로 96년1월부터 2022년12월까지 되어있어, 시계열 예측까지 해보았다.

위의 과정에서 데이터 두개를 합치는시간이 생각보다 너무 오래걸려서 고생을 많이 했다.

df = pd.read_csv("./data/korea_tourist.csv")
df1 = pd.read_csv("./data/seoul_tourist.csv")

# 필요한 컬럼만 추출
df = df.iloc[1:2,:]

	대륙별(1)	대륙별(2)	2003	2004	2005	2006	2007	2008	2009	2010	...	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022
1	합계	소계	4752762	5818138	6022752	6155047	6448240	6890841	7817533	8797658	...	12175550	14201516	13231651	17241823	13335758	15346879	17502756	2519118	967003	3198017
1 rows × 22 columns

#필요없는 컬럼 드롭
df.drop(["대륙별(1)","대륙별(2)"], inplace=True, axis=1)

#타입변경
df = df.astype(int)

df1

# 기본 정보 확인
df1.info()

#필요한 값만 추출
df1 = df1.iloc[1:2,:]

	국가별(1)	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022
1	전체	82.5	80.9	80.4	78.7	78.0	78.8	79.4	76.4	47.2	57.7	82.4

# 필요없는 컬럼 드롭
df1.drop("국가별(1)",inplace=True, axis=1)

#평균치 확인
df1 = df1.astype(float)
df1.transpose().mean()

# 두개의 데이터 프레임 합치기
df2 = pd.concat([df,df1], axis=0)
df2.info()

#평균값으로 채우기
df2 = df2.fillna(74.764)

df2 = df2.reset_index()
df2.drop(columns="index", inplace=True, axis=1)

df2 = df2.transpose()
df2 = df2.round(3)
df2["2"] = df2.iloc[:, 0] * (df2.iloc[:, 1] / 100)
new_columname = ["korea_tourist", "visited_seoul_percent", "visited_tourist"]

df2.columns = new_columname
df2 = df2.reset_index()
df2.rename(columns={"index" : "year"}, inplace=True)
df2["year"] = df2["year"].astype(int)
df2.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   year                   20 non-null     object 
 1   korea_tourist          20 non-null     float64
 2   visited_seoul_percent  20 non-null     float64
 3   visited_tourist        20 non-null     float64
dtypes: float64(3), object(1)
memory usage: 768.0+ bytes

plt.rc("font", family = "Malgun Gothic")
plt.rcParams["axes.unicode_minus"] = False


plt.figure(figsize=(15, 8))
plt.title("전체관광객 중 서울 관광객")
# plt.bar(df2["year"], df2["korea_tourist"], label="전국관광객")
# plt.bar(df2["year"], df2["visited_tourist"], label="서울관광객")

# 전국관광객 막대 그래프
plt.bar(df2["year"] - 0.2, df2["korea_tourist"], width=0.4, label="전국관광객")

# 서울관광객 막대 그래프
plt.bar(df2["year"] + 0.2, df2["visited_tourist"], width=0.4, label="서울관광객")


plt.ylim(0, 2e7)
plt.xlabel("연도")
plt.ylabel("관광객 수(단위 천만)")
plt.xticks(df2["year"])
plt.legend()
plt.savefig("./data/img/전체관광객 중 서울 관광객1.png")
plt.show()

이 외에도

이정도로 마무리 했다.

시계열 데이터의경우는.. 좀

결

과가.. 코로나때문이기도 하지만, 이번이 두번째지만 두번 다 별로 결과가 좋지 않아 아쉬웠다.

그리고 코드가 복잡하여.. 올리기 민망하니 이것은........ㅎㅎ..

728x90