Python BeautifulSoup:使用 string 获取和设置 HTML 标签内容


#Python BeautifulSoup 教程


获取内容

以下是 string 的源码注释:

If this element has a single string child, return
value is that string. If this element has one child tag,
return value is the 'string' attribute of the child tag,
recursively. If this element is itself a string, has no
children, or has more than one child, return value is None.

示例:返回非 None

from bs4 import BeautifulSoup

html_content = '''
<div id="content">
    测试01
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print(content_div.string)

执行结果:


    测试01

示例:返回非 None

from bs4 import BeautifulSoup

html_content = '''
<div id="content"><p>测试01</p></div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print(content_div.string)

执行结果:

测试01

示例:返回为 None

from bs4 import BeautifulSoup

html_content = '''
<div id="content">
    <p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print(content_div.string)

执行结果:

None

这里返回 None ,是因为<div id="content">内部,除了 <p>测试01</p> ,还有空字符串和换行。

示例:返回为 None

from bs4 import BeautifulSoup

html_content = '''
<div id="content">x
    <p>测试01</p>
    <span>测试02</span>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
print(content_div.string)

执行结果:

None

设置内容

示例

from bs4 import BeautifulSoup

html_content = '''
<div id="content">
    <p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')

content_div = soup.select_one("#content")
content_div.string = 'xx'

print(soup)

执行结果:

<div id="content">xx</div>
<div>测试03</div>


( 本文完 )