使用 find_all 获取 id 为 content 的 div 元素 - 基于 attrs 过滤
from bs4 import BeautifulSoup
html_content = '''
<div id="content">测试01</div>
<p id="content">测试02</p>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
for element in soup.find_all(name='div', attrs={'id': 'content'}):
print('元素: ', element)
执行结果:
元素: <div id="content">测试01</div>
使用 find_all 获取 class 为 content 的 div 元素 - 基于 id 过滤
from bs4 import BeautifulSoup
html_content = '''
<div id="content">测试01</div>
<p id="content">测试02</p>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
for element in soup.find_all(name='div', id='content'):
print('元素: ', element)
执行结果:
元素: <div id="content">测试01</div>
使用 find_all 获取 id 为 content 的元素,不止 div
from bs4 import BeautifulSoup
html_content = '''
<div class="content">测试01</div>
<p class="content">测试02</p>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
for element in soup.find_all(class_='content'):
print('元素: ', element)
执行结果:
元素: <div id="content">测试01</div>
元素: <p id="content">测试02</p>
使用 select 获取 id 为 content 的 div 元素
from bs4 import BeautifulSoup
html_content = '''
<div id="content">测试01</div>
<p id="content">测试02</p>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
for element in soup.select("div[id='content']"):
print('元素: ', element)
执行结果:
元素: <div id="content">测试01</div>
使用 select 获取 class 为 content 的元素,不止 div
from bs4 import BeautifulSoup
html_content = '''
<div id="content">测试01</div>
<p id="content">测试02</p>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')
for element in soup.select("#content"):
print('元素: ', element)
执行结果:
元素: <div id="content">测试01</div>
元素: <p id="content">测试02</p>