Python BeautifulSoup:使用 get_text 获取 HTML 标签文本内容


#Python BeautifulSoup 教程


简介

注意,文本内容中不会有 HTML 标签。即使在嵌套的情况下,也不会有。

示例1:无嵌套

代码:

from bs4 import BeautifulSoup

html_content = '''
<div id="content" data="你好">测试01</div>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print('text:', content_div.get_text())

执行结果:

text: 测试01

示例2:有嵌套

代码:

from bs4 import BeautifulSoup

html_content = '''
<div id="content" data="你好">
    <p>测试01</p>
    <span>测试02</span>
</div>
<div>测试03</div>
'''
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print('text:', content_div.get_text())

执行结果:

text: 
测试
测试


( 本文完 )