1 Upvote

Scraping Links from Wikipedia Using Beautiful Soup

Python
Import and Export

Here we are going to scrape the list of constituents of the FSTE 100 index. To do this we are going to scrape the company ticker and the wikipedia page link stored in the table with the 'constituents' ID on the FTSE 100 Wikipedia page (https://en.wikipedia.org/wiki/FTSE_100_Index)

Each data point we require is following the path #constituents > td > a.

import requests
from bs4 import BeautifulSoup

response = requests.get('https://en.wikipedia.org/wiki/FTSE_100_Index')
parser = BeautifulSoup(response.content, 'html.parser')

# Get all tags in the table with the 'constituents' id
constituents_table = parser.select('#constituents')

# Find all  tags in the constituents table
td = constituents_table[0].find_all('td')

# Create empty dataframe to store scraped stock and url data
output = pd.DataFrame(columns=['stock','url'])

# Loop through all  tags, find all  tags and extract text and href from the tag
# Note: We use try/except to skip over  tags that don't contain an  tag
for tag in td:
    try:
        a = tag.find_all('a')
        stock = a[0].text
        url = a[0]['href']
        url = 'https://en.wikipedia.org' + url
        print(stock, url)
	# add current stock to output dataframe
	current_stock = {'stock':stock, 'url':url}
	current_stock = pd.DataFrame([current_stock])
	output = pd.concat([output,current_stock])
    except:
        continue

output.to_csv('ftse_100_stocks.csv',index=False)

By detro - Last Updated July 15, 2022, 4:31 p.m.

Did you find this snippet useful?

Sign up to bookmark this in your snippet library

COMMENTS
RELATED SNIPPETS
Top Contributors
103
100