Python

์นด์นด์˜คํ†ก ์›Œ๋“œํด๋ผ์šฐ๋“œ ๋งŒ๋“ค๊ธฐ

๋…ธ๋ฃจ๋ฃฝ 2020. 10. 13. 20:19

์›Œ๋“œํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ ์„ค์ •

์›Œ๋“œํด๋ผ์šฐ๋“œ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•œ๋‹ค.

File > Settings > Python Interpreter > wordcloud

import matplotlib.font_manager as fm

# ์ด์šฉ ๊ฐ€๋Šฅํ•œ ํฐํŠธ ์ค‘ '๊ณ ๋”•'๋งŒ ์„ ๋ณ„ -> ๊ธ€์”จ๊ฐ€ ๊ตต์–ด์•ผ ์˜ˆ๋ป์„œ
for font in fm.fontManager.ttflist:
    if 'Gothic' in font.name:
        print(font.name, font.fname)

์ด ์ฝ”๋“œ๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด ๊ทธ๋Ÿฌ๋ฉด ํฐํŠธ์˜ ๊ฒฝ๋กœ๋“ค์ด ์ญ‰ ์ถœ๋ ฅ๋  ํ…๋ฐ,

๊ทธ์ค‘ ๋งˆ์Œ์— ๋“œ๋Š” ํฐํŠธ ๊ฒฝ๋กœ ํ•˜๋‚˜ ๋ณต์‚ฌํ•ด๋†“๊ธฐ.

(BUT ์›Œ๋“œํด๋ผ์šฐ๋“œ๊ฐ€ ์ œ๋Œ€๋กœ ๋งŒ๋“ค์–ด์ง€์ง€ ์•Š๋Š” ํฐํŠธ๋“ค์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์—

MalgunGothic(์œˆ๋„์šฐ), AppleGothic(๋งฅ)์„ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜์ž!)

 

from wordcloud import WordCloud
text = ''
with open("KakaoTalk.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
    for line in lines:
        text += line

print(text)

wc = WordCloud(font_path="C:/windows/Fonts/malgunbd.ttf", background_color="white", width=600, height=400)
wc.generate(text)
wc.to_file("result.png")

์นด์นด์˜คํ†ก ๋‚ด๋ณด๋‚ด๊ธฐ๋ฅผ ํ†ตํ•ด ๋งŒ๋“ค์–ด์ง„ txt ํŒŒ์ผ์„ ๋„ฃ์–ด์ฃผ๋ฉด ์›Œ๋“œ ํด๋ผ์šฐ๋“œ๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.

 

(์ฐจ๋งˆ ๋‚ด์šฉ์„ ๊ณต๊ฐœํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค๐Ÿ’ฆ)

์—ฌํ•˜ํŠผ ์›Œ๋“œํด๋ผ์šฐ๋“œ๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด, ์ƒ๋Œ€๋ฐฉ ์ด๋ฆ„์ด๋‚˜ ์˜คํ›„ ์‹œ๊ฐ„ ๋“ฑ ์“ธ๋ฐ์—†๋Š” ์ •๋ณด๊ฐ€ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ํด๋ Œ์ง•

๋ฐ์ดํ„ฐ ํด๋ Œ์ง•์€ ๋ถˆํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ณผ์ •์ด๋‹ค.

from wordcloud import WordCloud
text = ''
with open("KakaoTalk.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
    for line in lines[5:]:
        if'] [' in line:
            text += line.split('] ')[2].replace('ใ…‹','').replace('ใ… ','').replace('์ด๋ชจํ‹ฐ์ฝ˜\n','').replace('์‚ฌ์ง„\n','').replace('์‚ญ์ œ๋œ ๋ฉ”์‹œ์ง€์ž…๋‹ˆ๋‹ค.', '')

print(text)

wc = WordCloud(font_path="C:/windows/Fonts/malgunbd.ttf", background_color="white", width=600, height=400)
wc.generate(text)
wc.to_file("result.png")

์‹œ์Šคํ…œ ๋ฉ”์‹œ์ง€์™€ ๋ณด๋‚ธ ์‚ฌ๋žŒ ์ด๋ฆ„, ๋ณด๋‚ธ ์‹œ๊ฐ ๋“ฑ์„ ์‚ญ์ œํ•˜์˜€๊ณ 

๊ทธ ์™ธ ์ด๋ชจํ‹ฐ์ฝ˜์ด๋‚˜ ใ…‹ใ…‹ใ…‹, ใ… ใ… ใ… ใ… ใ… ๊ฐ™์€ ๊ฒƒ๋“ค๋„ replace๋กœ ์ง€์› ๋‹ค.

 

์›Œ๋“œํด๋ผ์šฐ๋“œ ๋ชจ์–‘ ๋งŒ๋“ค๊ธฐ

์ด๋Ÿฐ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ, ๋ฐฐ๊ฒฝ๊ณผ ๊ตฌ๋ถ„์ด ํ™•์‹คํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๋ชจ์–‘์ด ์ž˜ ๋งŒ๋“ค์–ด์ง„๋‹ค.

from wordcloud import WordCloud
from PIL import Image
import numpy as np

text = ''
with open("KakaoTalk.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
    for line in lines[5:]:
        if'] [' in line:
            text += line.split('] ')[2].replace('ใ…‹','').replace('ใ… ','').replace('์ด๋ชจํ‹ฐ์ฝ˜\n','')\
                .replace('์‚ฌ์ง„\n','').replace('์‚ญ์ œ๋œ ๋ฉ”์‹œ์ง€์ž…๋‹ˆ๋‹ค.', '')


print(text)

mask = np.array(Image.open('cloud.png'))
wc = WordCloud(font_path="C:/windows/Fonts/malgunbd.ttf", background_color="white", mask=mask)
wc.generate(text)
wc.to_file("result_masked.png")

๊ทธ๋Ÿผ ์ด๋ ‡๊ฒŒ ๊ตฌ๋ฆ„๋ชจ์–‘์˜ ์›Œ๋“œ ํด๋ผ์šฐ๋“œ๊ฐ€ ๋‚˜์˜จ๋‹ค!

(๋‹ค์‹œ ํ•œ๋ฒˆ ๋งํ•˜์ง€๋งŒ, ์ฐจ๋งˆ ๋‚ด์šฉ์„ ๊ณต๊ฐœํ•  ์ˆ˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค..)