[BeautifulSoup, selenium] 동적 스크롤 페이지 스크래핑

RPA/Python

[BeautifulSoup, selenium] 동적 스크롤 페이지 스크래핑

꼰대 2021. 5. 14. 13:10

구글 플레이 영화 페이지에 접속하여 할인이 적용된 영화 리스트를 가져온다.

해당 페이지는 동적 페이지로 스크롤을 내려야 나머지 영화가 로딩되기 때문에 selenium으로 페이지를 열어 스크롤을 내리고 스크래핑한다.

from selenium import webdriver

from bs4 import BeautifulSoup

import time

URL = 'https://play.google.com/store/movies/top'

driver = webdriver.Chrome()

driver.maximize_window()

driver.get(URL)

driver.implicitly_wait(10)

# 스크롤 끝까지 내리기

prev_height = driver.execute_script('return document.body.scrollHeight')

while True:

driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')

# driver.implicitly_wait(10)

time.sleep(3)

current_height = driver.execute_script('return document.body.scrollHeight')

if prev_height == current_height:

break

prev_height = current_height

time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'lxml')

movies = soup.find_all('div', attrs={'class':'Vpfmgd'})

for movie in movies:

title = movie.find('div', attrs={'class':'WsMG1c nnK0zc'}).get_text()

original_price = movie.find('span', attrs={'class':'SUZt4c djCuy'})

# 할인 전 가격이 있는 영화만 출력

if original_price:

original_price = original_price.get_text()

discount_price = movie.find('span', attrs={'class':'VfPpfd ZdBevf i5DZme'}).get_text()

link = movie.find('div', attrs={'class':'b8cIId ReQCgd Q9MA7b'}).find('a')['href']

print('제목 : {0}'.format(title))

print('할인 전 가격 : {0}'.format(original_price))

print('할인 가격 : {0}'.format(discount_price))

print('주소 : {0}'.format('https://play.google.com'+link))

print('='*100)

저작자표시

'RPA > Python' 카테고리의 다른 글

[Pandas] chatGPT가 만들어준 RPA (네이버 시가총액 크롤링) (0)	2023.06.21
[Pandas] 기업 시가총액 순위, 엑셀 저장 (0)	2022.02.02
[BeautifulSoup] 기업 시가총액 순위, 엑셀/CSV로 저장 (0)	2021.05.13
[BeautifulSoup] 웹 스크래핑 (WEB Scraping) 환경 설정 및 기초 (0)	2021.05.12
[openpyxl, python-pptx] 엑셀 주소록 읽어 파워포인트로 명찰 만들기 (0)	2021.05.09

현재글[BeautifulSoup, selenium] 동적 스크롤 페이지 스크래핑

꼰대의 Python Archive