본문 바로가기

프로그래머스 데브 코스/TIL

[6기] 프로그래머스 인공지능 데브코스 102일차 TIL

1211

[17주차 - Day1] Recommendation system

추천 엔진 개발

# TMDB 인기 영화 추천 시스템 실습 코드

이번 실습에서 사용하는 데이터셋: https://www.kaggle.com/tmdb/tmdb-movie-metadata

 

TMDB 5000 Movie Dataset

Metadata on ~5,000 movies from TMDb

www.kaggle.com

 

 

입력 데이터 로딩

import pandas as pd
import numpy as np

movies = pd.read_csv("https://grepp-reco-test.s3.ap-northeast-2.amazonaws.com/tmdb_5000_movies.csv")

credits = pd.read_csv("https://grepp-reco-test.s3.ap-northeast-2.amazonaws.com/tmdb_5000_credits.csv")

import json

def add_genre_name(j):
    genres = []
    ar = json.loads(j)
    for a in ar:
        genres.append(a.get("name"))
    return " ".join(sorted(genres))

movies['genres_name'] = movies.apply(lambda x: add_genre_name(x.genres), axis=1)

 

movies와 credits 데이터프레임을 조인

movie_credits = pd.merge(movies, credits, left_on='id', right_on='movie_id')

movie_credits = movie_credits.drop(columns=['homepage', 'title_x', 'title_y', 'status','production_countries', 'production_companies'])

movie_credits.describe()

popularity = movie_credits.sort_values('popularity',ascending=False)

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12,6))
ax=sns.barplot(
    x=popularity['popularity'].head(10),
    y=popularity['original_title'].head(10)
)

plt.title('Most Popular by Popularity', weight='bold')
plt.xlabel('Score of Popularity', weight='bold')
plt.ylabel('Movie Title', weight='bold')
plt.savefig('best_popular_movies.png')

def reco_top_scored_one(n, genre=None):
  if genre is None:
    return popularity["original_title"].head(n)
  else:
    return popularity[popularity['genres_name'].str.contains(genre)]["original_title"].head(n)