Python

์ด๋ฏธ์ง€ ์›น ์Šคํฌ๋ž˜ํ•‘(์›น ํฌ๋กค๋ง)ํ•˜๊ธฐ

๋…ธ๋ฃจ๋ฃฝ 2020. 10. 2. 01:10

dload ํŒจํ‚ค์ง€ ์„ค์น˜

dload๋Š” ์ด๋ฏธ์ง€ ๋‹ค์šด๋กœ๋“œ๋ฅผ ์‰ฝ๊ฒŒ ๋„์™€์ฃผ๋Š” ํŒจํ‚ค์ง€์ด๋‹ค.

File > Settings > Python Interpreter

์˜ค๋ฅธ์ชฝ์— + ๋ฒ„ํŠผ ๋ˆŒ๋Ÿฌ์„œ dload ๊ฒ€์ƒ‰

import dload

dload.save("์ด๋ฏธ์ง€ ์ฃผ์†Œ")

๊ตฌ๊ธ€์—์„œ ์•„๋ฌด ์ด๋ฏธ์ง€๋‚˜ ๊ฒ€์ƒ‰ํ•ด์„œ '์ด๋ฏธ์ง€ ์ฃผ์†Œ ๋ณต์‚ฌ'ํ•ด์„œ ๋งํฌ๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด ์ด๋ฏธ์ง€๊ฐ€ ์ €์žฅ๋œ๋‹ค.

 

์…€๋ ˆ๋‹ˆ์›€ ์„ค์น˜

์…€๋ ˆ๋‹ˆ์›€์€ ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์ž๋™์œผ๋กœ ์ œ์–ดํ•ด์ฃผ๋Š” ํŒจํ‚ค์ง€์ด๋‹ค.

File > Settings > Python Interpreter

seleninum ๊ฒ€์ƒ‰ > Install Package

 

์…€๋ ˆ๋‹ˆ์›€ ์›น๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜

ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € > ๋„์›€๋ง > Chrome ์ •๋ณด > ํฌ๋กฌ ๋ฒ„์ „ ํ™•์ธ

 

์…€๋ ˆ๋‹ˆ์›€ ์›น๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜ ๋งํฌ๋กœ ๊ณ ๊ณ 

chromedriver.storage.googleapis.com/index.html?path=85.0.4183.87/

 

https://chromedriver.storage.googleapis.com/index.html?path=85.0.4183.87/

 

chromedriver.storage.googleapis.com

์ž๊ธฐ ํฌ๋กฌ ๋ฒ„์ „์— ๋งž๊ฒŒ ์„ค์น˜!

(85๋ฒ„์ „์ด ์•„๋‹ˆ๋ผ๋ฉด Parent Directory๋“ค์–ด๊ฐ€์„œ ๋ณธ์ธ ๋ฒ„์ „์œผ๋กœ ์„ค์น˜)

์••์ถ•์„ ํ’€์–ด์„œ ๋‚˜์˜จ ์‹คํ–‰ํŒŒ์ผ์€ ํŒŒ์ด์ฌ ํ”„๋กœ์ ํŠธ ํด๋”๋กœ ์˜ฎ๊ฒจ์ค€๋‹ค.

 

from selenium import webdriver
driver = webdriver.Chrome('chromedriver')

driver.get("http://www.naver.com")

์ด ์ฝ”๋“œ๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด, ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € ์ƒˆ ์ฐฝ์œผ๋กœ naver๊ฐ€ ๋œฌ๋‹ค!

 

์›น ์Šคํฌ๋ž˜ํ•‘(์›น ํฌ๋กค๋ง)ํ•˜๊ธฐ

Beautifulsoup๋Š” ๋ธŒ๋ผ์šฐ์ €์˜ ์†Œ์Šค์ฝ”๋“œ ์ค‘์—์„œ ๋‚ด๊ฐ€ ์›ํ•˜๋Š” ๊ฒƒ์„ ์†Ž์•„๋‚ด์ฃผ๋Š” ํŒจํ‚ค์ง€์ด๋‹ค.

์„ค์น˜๋ฐฉ๋ฒ•์€ ์œ„์™€ ๋™์ผํ•˜๋‹ค.

File > Settings > Python Interpreter > bs4 > Install Package

 

๋„ค์ด๋ฒ„์— ์›ํ•˜๋Š” ๋‹จ์–ด๋ฅผ ์ด๋ฏธ์ง€๊ฒ€์ƒ‰ํ•œ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ํฌ๋กค๋งํ•  ์ด๋ฏธ์ง€ ์ค‘ ํ•˜๋‚˜๋ฅผ ์˜ค๋ฅธ์ชฝ ํด๋ฆญ > ๊ฒ€์‚ฌ > Copy > Copy selector

โš  ์ฃผ์˜ํ•  ์ ์€ img src ์ฃผ์†Œ๊ฐ€ ์žˆ๋Š” ํƒœ๊ทธ ๋ถ€๋ถ„์„ copyํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ!

 

from bs4 import BeautifulSoup
from selenium import webdriver
import time

driver = webdriver.Chrome('chromedriver') # ์›น๋“œ๋ผ์ด๋ฒ„ ํŒŒ์ผ ๊ฒฝ๋กœ
driver.get("https://search.naver.com/search.naver?where=image&sm=tab_jum&query=%ED%83%9C%EB%AF%BC") # ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ URL
time.sleep(5) # 5์ดˆ ๋™์•ˆ ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ฆฌ๋ฉฐ ํŒŒ์ด์ฌ์€ ์‰ฐ๋‹ค.

req = driver.page_source
# HTML์„ BeautifulSoup์ด๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์šฉ์ดํ•œ ์ƒํƒœ๋กœ ๋งŒ๋“ฆ
# soup์ด๋ผ๋Š” ๋ณ€์ˆ˜์— "ํŒŒ์‹ฑ ์šฉ์ดํ•ด์ง„ html"์ด ๋‹ด๊ธด ์ƒํƒœ๊ฐ€ ๋จ
# ์ด์ œ ์ฝ”๋”ฉ์„ ํ†ตํ•ด ํ•„์š”ํ•œ ๋ถ€๋ถ„์„ ์ถ”์ถœํ•˜๋ฉด ๋œ๋‹ค.
soup = BeautifulSoup(req, 'html.parser')

# ์•„๊นŒ ๋ณต์‚ฌํ•œ Selector๋ฅผ ๋„ฃ์–ด์ค€๋‹ค.
thumnails = soup.select_one('#_sau_imageTab > div.photowall._photoGridWrapper > div.photo_grid._box > div:nth-child(2) > a.thumb._thumb > img')
print(thumnails['src'])

driver.quit() # ๋๋‚˜๋ฉด ๋‹ซ์•„์ฃผ๊ธฐ

๊ทธ๋Ÿฌ๋ฉด ์ด๋ฏธ์ง€์˜ src ์ฃผ์†Œ๋งŒ ์ถœ๋ ฅ์ด ๋œ๋‹ค.

 

์ด์ œ ๋ชจ๋“  ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ํฌ๋กค๋งํ•ด์„œ ์ €์žฅํ•ด๋ณด์ž๐Ÿ’จ

์ด๋ฏธ์ง€๋“ค์˜ selector๋ฅผ ์ž์„ธํžˆ ๋ณด๋‹ค๋ณด๋ฉด, div:nth-child๋“ฑ ์ผ์ •ํ•œ ๊ทœ์น™๋“ค์ด ์žˆ๋Š” ๊ฑธ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ด ๋ถ€๋ถ„์„ ์ง€์›Œ์ฃผ๊ฑฐ๋‚˜ ์ •๋ฆฌํ•ด์„œ soup.select์— ๋„ฃ์–ด์ค€๋‹ค.

soup.select_one๋ฅผ ์“ฐ๋ฉด ํ•ด๋‹น html ํƒœ๊ทธ ์ค‘ ํ•˜๋‚˜๋งŒ ์„ ํƒํ•˜๊ณ ,

soup.select๋Š” ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  html ํƒœ๊ทธ๋ฅผ ์„ ํƒํ•œ๋‹ค.

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import dload

driver = webdriver.Chrome('chromedriver') # ์›น๋“œ๋ผ์ด๋ฒ„ ํŒŒ์ผ ๊ฒฝ๋กœ
driver.get("https://search.naver.com/search.naver?where=image&sm=tab_jum&query=%ED%83%9C%EB%AF%BC") # ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ URL
time.sleep(5) # 5์ดˆ ๋™์•ˆ ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๊ธฐ๋‹ค๋ฆฌ๋ฉฐ ํŒŒ์ด์ฌ์€ ์‰ฐ๋‹ค.

req = driver.page_source
# HTML์„ BeautifulSoup์ด๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์šฉ์ดํ•œ ์ƒํƒœ๋กœ ๋งŒ๋“ฆ
# soup์ด๋ผ๋Š” ๋ณ€์ˆ˜์— "ํŒŒ์‹ฑ ์šฉ์ดํ•ด์ง„ html"์ด ๋‹ด๊ธด ์ƒํƒœ๊ฐ€ ๋จ
# ์ด์ œ ์ฝ”๋”ฉ์„ ํ†ตํ•ด ํ•„์š”ํ•œ ๋ถ€๋ถ„์„ ์ถ”์ถœํ•˜๋ฉด ๋œ๋‹ค.
soup = BeautifulSoup(req, 'html.parser')

thumnails = soup.select('#_sau_imageTab > div.photowall._photoGridWrapper > div > div > a.thumb._thumb > img')

i = 1
for thumnail in thumnails:
    img = thumnail['src']
    print(img)
    dload.save(img, f'img/{i}.jpg')
    i += 1

driver.quit() # ๋๋‚˜๋ฉด ๋‹ซ์•„์ฃผ๊ธฐ

๋„ค์ด๋ฒ„๋Š” '#_sau_imageTab > div.photowall._photoGridWrapper > div > div > a.thumb._thumb > img'

๋‹ค์Œ์€ '#imgList > div > a > img'

์ด๋Ÿฐ์‹์œผ๋กœ soup.select์— ๋„ฃ์–ด์ฃผ๋ฉด ๋œ๋‹ค.

โš  ์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€ imgํด๋”๋ฅผ ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒโ€ผ ์•ˆ๊ทธ๋Ÿฌ๋ฉด ์˜ค๋ฅ˜ ๋Œ€์ฐธ์‚ฌ๊ฐ€ ๋‚œ๋‹ค.

 

์ง ! ๊ทธ๋Ÿผ ์ด๋ ‡๊ฒŒ ํฌ๋กค๋ง์ด ์™„๋ฃŒ๋œ๋‹ค๐Ÿ’™