【Python】新聞記事をスクレイピングして見出しを取り出す

台湾の新聞、蘋果日報のネット版から見出しを取り出す処理を考えてみます。
スクレイピングするのはこのサイトです。

https://tw.entertainment.appledaily.com/daily/

まずはライブラリをインポート。

import requests

from bs4 import BeautifulSoup

上記のサイトで、「ctrl＋shift＋i」を押してソースを確認。

python 新聞スクレイピング

どうやら記事の見出しは
class="echn"
と指定されたリストの中にあるようなので、このechnというクラスをまるごと取り出してみます。

#ここで今回取り出したい蘋果日報のURLを入力

url = "https://tw.entertainment.appledaily.com/daily/" 

response = requests.get(url)



#クラスを指定してlistに格納する

bs = BeautifulSoup(response.content,"lxml")

list = bs.find_all(attrs={"class": "echn"})

これを実行して、listの中を見ると、こんな感じになります。

python 新聞スクレイピング