获取内容
以下是 string 的源码注释:
If this element has a single string child, return
value is that string. If this element has one child tag,
return value is the 'string' attribute of the child tag,
recursively. If this element is itself a string, has no
children, or has more than one child, return value is None.
示例:返回非 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
测试01
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
测试01
示例:返回非 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content"><p>测试01</p></div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
测试01
示例:返回为 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
<p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
None
这里返回 None ,是因为<div id="content">
内部,除了 <p>测试01</p>
,还有空字符串和换行。
示例:返回为 None
from bs4 import BeautifulSoup
html_content = '''
<div id="content">x
<p>测试01</p>
<span>测试02</span>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
print(content_div.string)
执行结果:
None
设置内容
示例
from bs4 import BeautifulSoup
html_content = '''
<div id="content">
<p>测试01</p>
</div>
<div>测试03</div>
'''.strip()
soup = BeautifulSoup(html_content, 'html.parser')
content_div = soup.select_one("#content")
content_div.string = 'xx'
print(soup)
执行结果:
<div id="content">xx</div>
<div>测试03</div>