convert_html_to_markdown
Parameters
html_content
(str): The HTML content that you want to convert into Markdown.
Functionality
The convert_html_to_markdown
function performs the following tasks:
Create HTML2Text Object: Utilizes the
html2text
library to convert HTML into Markdown.Ignore Links: By default, links are not ignored in the conversion. This can be adjusted by setting the
ignore_links
property of thehtml2text
object.Remove Image Data:
Removes standard HTML image tags (
<img>
) from the content.Removes any base64-encoded image data (PNG format) embedded in the HTML content.
Convert HTML to Markdown: Converts the cleaned HTML content into Markdown format using the
html2text
object.
Example
Output:
In this example, the function converts an HTML snippet into Markdown, stripping out image data and preserving text and links.