본문 바로가기

프로그래머스 데브 코스/TIL

[6기] 프로그래머스 인공지능 데브코스 71일차 TIL

1110

SQL_Analysis

SQL을 이용한 데이터 분석 숙제 3일차 #1

SELECT
    LEFT(B, 7) "month",
    A.channel,
    COUNT(DISTINCT userid) uniqueUsers,
    COUNT(DISTINCT CASE WHEN amount > 0 THEN userid END) paidUsers,
    ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate 
    SUM(amount) grossRevenue
    SUM(CASE WHEN refunded is False THEN amount END) netRevenue
FROM raw_data.user_session_channel A
JOIN raw_data.session_timestamp B ON B.sessionid = A.sessionid
LEFT JOIN raw_data.session_transaction C ON C.sessionid = A.sessionid
  • 내가 쓴 코드인데 B가 없다는 오류가 계속 떴었음

 

AS
SELECT LEFT(ts, 7) "month",
    usc.channel,
    COUNT(DISTINCT usc.userid) uniqueUsers,
    COUNT(DISTINCT CASE WHEN amount > 0 THEN userid END) paidUsers,
    ROUND(paidUsers*100.0/NULLIF(uniqueUsers, 0), 2) AS conversionRate, 
    SUM(amount) grossRevenue,
    SUM(CASE WHEN refunded is False THEN amount ELSE 0 END) netRevenue
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
LEFT JOIN raw_data.session_transaction str ON usc.sessionid = str.sessionid
GROUP BY 1, 2;
  • 사실 month를 쓰는 코드에 있는 ts를 첫 번째 JOIN한 st랑 헷갈려서 그 이름을 넣는 건 줄 알았다 멍청멍청

 

 

SQL을 이용한 데이터 분석 숙제 4일차 #1

SELECT userid, ts, channel,
  ROW_NUMBER() OVER (PARTITION BY userid ORDER BY ts) seq
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
WHERE userid = 251
ORDER BY 1
  • 이렇게 해 봤는데 결국 시퀀스만 생기고 최대 최소를 확인하려면 직접 처음이랑 끝을 봐야 함
    ㄴ 최대 최소 행 동시에 볼 수 있는 방법을 알아야 함...

 

SELECT userid,
 MAX(CASE WHEN rn1 = 1 THEN channel END) first_touch,
 MAX(CASE WHEN rn2 = 1 THEN channel END) last_touch
FROM (
  SELECT userid,
    channel,
    (ROW_NUMBER() OVER (PARTITION BY usc.userid ORDER BY  st.ts asc)) AS rn1,
    (ROW_NUMBER() OVER (PARTITION BY usc.userid ORDER BY  st.ts desc)) AS rn2
  FROM raw_data.user_session_channel usc
  JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid
)
GROUP BY 1;



SELECT DISTINCT
    A.userid,
    FIRST_VALUE(A.channel) over(partition by A.userid order by B.ts
rows between unbounded preceding and unbounded following) AS First_Channel,
    LAST_VALUE(A.channel) over(partition by A.userid order by B.ts
rows between unbounded preceding and unbounded following) AS Last_Channel
FROM raw_data.user_session_channel A
LEFT JOIN raw_data.session_timestamp B ON A.sessionid = B.sessionid
ORDER BY 1;
  • 두 가지 버전 정답, SQL에서 최대 최소값을 동시에 부르는 방식 두 가지 확인

 

 

SQL을 이용한 데이터 분석 숙제 4일차 #2

SELECT uniqueUsers, grossrevenue
FROM (
  SELECT uniqueUsers, grossrevenue
  From dd
  ORDER BY grossRevenus DESC
)
WHERE ROWNUM <= 10
  • 3일차에 만든 테이블 사용하려고 했는데 계속 column "grossrevenus" does not exist in dd이런 오류가 뜸

 

SELECT DISTINCT
    usc.userid,
    SUM(amount) OVER(PARTITION BY usc.userid)
FROM raw_data.user_session_channel AS usc
JOIN raw_data.session_transaction AS revenue ON revenue.sessionid = usc.sessionid  
ORDER BY 2 DESC 
LIMIT 10;
  • amount를 이용한다는 사실이 놀라울 뿐...

 

 

SQL을 이용한 데이터 분석 숙제 4일차 #3

SELECT n.score,
    COUNT(DISTINCT n.id) numUsers,
    COUNT(DISTINCT CASE WHEN score >= 0 AND score <= 6 THEN id END) detradctor,
    COUNT(DISTINCT CASE WHEN score >= 7 AND score <= 8 THEN id END) passive,
    COUNT(DISTINCT CASE WHEN score >= 9 AND score <= 10 THEN id END) promoter,
    ROUND(promoter*100.0/NULLIF(numUsers, 0), 2)-ROUND(detractor*100.0/NULLIF(numUsers, 0), 2)
FROM raw_data.nps n
  • column "detractor" does not exist in n 오류가 뜬다 바로정답을맞히는경우가없다

 

SELECT LEFT(created, 7) AS month,
  ROUND(SUM(CASE
    WHEN score >= 9 THEN 1 
    WHEN score <= 6 THEN -1 END)::float*100/COUNT(1), 2)
FROM raw_data.nps
GROUP BY 1
ORDER BY 1;
  • 상상 이상으로 간단한 코드로도 해결 가능하다