HTML to text
html2text is a Python package that converts a page of
HTML
into clean, easy-to-read plainASCII text
.
The ASCII also happens to be a valid Markdown
(a text-to-HTML format).
Installation and Setup
pip install html2text
Document Transformer
See a usage example.
from langchain_community.document_loaders import Html2TextTransformer