Python Web Scraping Tutorial: BeautifulSoup and Requests

Web scraping is the automated process of extracting data from websites. In the Python ecosystem, BeautifulSoup and Requests are the most popular libraries for collecting data from static HTML pages.

This guide walks you through sending HTTP requests, parsing response content, and filtering target elements cleanly.

1. Installation and Requirements

Before writing scraping scripts, install the BeautifulSoup4 parser and the requests HTTP client library using pip:

BASH
pip install beautifulsoup4 requests

2. Fetching and Parsing HTML Data

Use the requests library to send a GET request to a target webpage, then load the content into BeautifulSoup:

Python
import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, "html.parser")
    # Extract the title tag of the page
    print("Page Title:", soup.title.text)
else:
    print("Failed to retrieve target webpage. Status Code:", response.status_code)

3. Finding Specific Elements

BeautifulSoup provides advanced methods like find() and find_all() to search through HTML tags using attributes:

Python
# Find all quote blocks on the page
quotes = soup.find_all("div", class_="quote")

for quote in quotes:
    text = quote.find("span", class_="text").text
    author = quote.find("small", class_="author").text
    print(f'"{text}" - by {author}')

Understanding how to target standard CSS selectors is essential for handling real-world scraping tasks efficiently.