Wikipedia
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki.
Wikipedia
is the largest and most-read reference work in history.
This notebook shows how to load wiki pages from wikipedia.org
into the Document format that we use downstream.
Installation
First, you need to install the langchain_community
and wikipedia
packages.
%pip install -qU langchain_community wikipedia
Parameters
WikipediaLoader
has the following arguments:
query
: the free text which used to find documents in Wikipedialang
(optional): default="en". Use it to search in a specific language part of Wikipediaload_max_docs
(optional): default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.load_all_available_meta
(optional): default=False. By default only the most important fields downloaded:title
andsummary
. IfTrue
then all available fields will be downloaded.doc_content_chars_max
(optional): default=4000. The maximum number of characters for the document content.
Example
from langchain_community.document_loaders import WikipediaLoader
API Reference:WikipediaLoader
docs = WikipediaLoader(query="HUNTER X HUNTER", load_max_docs=2).load()
len(docs)
2
docs[0].metadata # metadata of the first document
{'title': 'Hunter × Hunter',
'summary': 'Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 37 tankōbon volumes as of November 2022. The story focuses on a young boy named Gon Freecss who discovers that his father, who left him at a young age, is actually a world-renowned Hunter, a licensed professional who specializes in fantastical pursuits such as locating rare or unidentified animal species, treasure hunting, surveying unexplored enclaves, or hunting down lawless individuals. Gon departs on a journey to become a Hunter and eventually find his father. Along the way, Gon meets various other Hunters and encounters the paranormal.\nHunter × Hunter was adapted into a 62-episode anime television series by Nippon Animation and directed by Kazuhiro Furuhashi, which ran on Fuji Television from October 1999 to March 2001. Three separate original video animations (OVAs) totaling 30 episodes were subsequently produced by Nippon Animation and released in Japan from 2002 to 2004. A second anime television series by Madhouse aired on Nippon Television from October 2011 to September 2014, totaling 148 episodes, with two animated theatrical films released in 2013. There are also numerous audio albums, video games, musicals, and other media based on Hunter × Hunter.\nThe manga has been licensed for English release in North America by Viz Media since April 2005. Both television series have been also licensed by Viz Media, with the first series having aired on the Funimation Channel in 2009 and the second series broadcast on Adult Swim\'s Toonami programming block from April 2016 to June 2019.\nHunter × Hunter has been a huge critical and financial success and has become one of the best-selling manga series of all time, having over 84 million copies in circulation by July 2022.',
'source': 'https://en.wikipedia.org/wiki/Hunter_%C3%97_Hunter'}
docs[0].page_content[:400] # a part of the page content
'Hunter × Hunter (pronounced "hunter hunter") is a Japanese manga series written and illustrated by Yoshihiro Togashi. It has been serialized in Shueisha\'s shōnen manga magazine Weekly Shōnen Jump since March 1998, although the manga has frequently gone on extended hiatuses since 2006. Its chapters have been collected in 37 tankōbon volumes as of November 2022. The story focuses on a young boy name'
Related
- Document loader conceptual guide
- Document loader how-to guides