Extract Major Stock Indices with API: A Guide to Fetching and Visualizing Data

In the world of finance, being able to track major stock indices in real-time is crucial. Whether you’re doing an analysis or just curious about the market, APIs like Yahoo Finance (yfinance) provide a way to easily pull historical data and plot trends. In this post, we’ll walk through a Python script that fetches stock index data, normalizes it for comparison, and plots it for easy visualization.

Step 1: Install and import relevant libraries

This step is essential to bring in essential libraries for this project.

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

Step 2: Define end date

Depending on your project, you may want to define start and end dates for data extraction.

For the purpose of this project – which will be completed in future posts and uploaded to my GitHub page – I will use the date (as start date) Donald Trump was elected President of the United States.

start_date = '2024-11-05'
end_date = datetime.today().strftime('%Y-%m-%d')

Step 3: Define the data fetching function

Creates a modular function to:

Download stock index data for a given symbol between the start_date and end_date.
Return only the closing prices (a common metric for market movement).
Handle errors or missing data gracefully.

def get_index_data(symbol, start_date=start_date, end_date=end_date):
    try:
        index_data = yf.download(symbol, start=start_date, end=end_date)
        
        if index_data.empty:
            print(f"No data found for {symbol}.")
            return None
        
        return index_data['Close']
    except Exception as e:
        print(f"Error fetching data for {symbol}: {e}")
        return None

Step 4: Define the ticker(s)

Maps readable names to Yahoo Finance symbols so you get clean legends in the final plot.

For this project, I selected three tickers that represent the US market to compare against the major indices of the European market.

tickers = {
    'S&P 500': '^GSPC',
    'Dow Jones': '^DJI',
    'Nasdaq': '^IXIC',
    'Euro Stoxx 50': '^STOXX50E',
    'FTSE 100': '^FTSE',
    'DAX 30': '^GDAXI',
    'CAC 40': '^FCHI',
    'IBEX 35': '^IBEX',
    'AEX': '^AEX',
    'SMI': '^SSMI'
    #'PSI 20': '^PSI20.LS' 
}

Step 5: Fetch and Store the Data

Loop through each index symbol:

Fetch its data.
Store it if available.
Track the index name for labeling later.

index_data_list = []
index_names = []

for name, ticker in tickers.items():
    data = get_index_data(ticker)
    if data is not None:
        data.name = name  # Use the full name of the index
        index_data_list.append(data)
        index_names.append(name)

Step 6: Combine and Clean Data

Technically, combining and cleaning data belongs in a dedicated Data Treatment and Processing section, as it involves transforming raw inputs into a structured, analysis-ready format. This includes merging datasets, handling missing values, and standardizing formats – key pre-processing steps. For the purpose of this post and to keep things simple, I’ll leave it here. However, in a future version of this project (that will be publicly available soon), this step will be properly placed under a dedicated data processing section.

pd.concat(...) merges all index dataframes into one big table, aligned by date.

df.ffill() fills in any missing data using the most recent available value (forward-fill).

Why `ffill()`?

Stock markets operate in different time zones and observe different holidays, so some days are missing for specific indices. Forward-filling ensures all indices have aligned values for visualization.

Drawback:

I am assuming the price stayed the same on missing days, which may not reflect reality. It’s fine for visualization, but not ideal for financial modelling or calculations that depend on exact price changes.

if index_data_list:
    df = pd.concat(index_data_list, axis=1)

    df = df.ffill()

    print(df.head())

Step 7: Normalize Data

This converts all index values to a common starting point (100).

Makes it easy to compare relative changes over time, regardless of absolute index levels.

Why normalize?

S&P 500 and DAX trade at very different price levels. By normalizing, you can directly compare percentage growth since the start date.

Drawback:

While great for comparing trends, it hides absolute performance and volatility differences.

#Inside the If statement.
    df_normalized = df / df.iloc[0] * 100

Step 8: Plotting the Data

Loops through the normalized data and plots each index’s performance on the same chart.

#Inside the If statement.
    plt.figure(figsize=(14, 8))
    for i, column in enumerate(df_normalized.columns):
        plt.plot(df_normalized.index, df_normalized[column], label=index_names[i])

    plt.title('Normalized Major Stock Indices Daily Closing Prices', fontsize=16)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Normalized Price', fontsize=12)
    plt.legend(loc='best')

    plt.grid(False)

Step 9: Annotate a Key Event and display the plot

Places a visual marker on the chart for 2024 U.S. election day. Helps analyze how markets reacted around that time.

#Inside the If statement.
     if election_day in df_normalized.index:
        plt.annotate("Trump's election day", 
                     xy=(election_day, df_normalized.loc[election_day, df_normalized.columns[0]]),
                     xytext=(election_day, df_normalized.loc[election_day, df_normalized.columns[0]] + 7),  
                     arrowprops=dict(facecolor='red', shrink=0.02),
                     fontsize=10, color='black')

    plt.xticks(rotation=45)
    plt.tight_layout()

    plt.show()

else:
    print("No valid index data available.")

This project has been a great opportunity to work with stock index data, focusing on extraction, cleaning, and visualization. Normalizing the data allowed for easy trend comparison, and forward-filling helped handle missing values, despite its limitations.

The insights gained here will play a key role in a new project I’m developing, where this data will be integrated to enhance predictive modelling and analysis. Stay tuned for more!

Want to Apply This in Your Work?

If you’re working with data and want to go beyond basic charts and scripts, I offer tailored support to help you build real impact with your insights:

🎓 Personalized Workshops: Hands-on training sessions tailored to your team, tools, and goals

🧠 Consulting Services: Strategic guidance on turning data into decisions – whether it’s modelling, metrics, or storytelling

🕒 Fractional Data Scientist: Embedded support for companies who need senior data expertise without a full-time hire

Whether you’re scaling a product team or building out your first data processes, I can help you use data more effectively, with clarity and confidence.

👉 Get in touch to start a conversation.

Sofia Afonso

Extract Major Stock Indices with API: A Guide to Fetching and Visualizing Data

Step 1: Install and import relevant libraries

Step 2: Define end date

Step 3: Define the data fetching function

Step 4: Define the ticker(s)

Step 5: Fetch and Store the Data

Step 6: Combine and Clean Data

Why `ffill()`?

Drawback:

Step 7: Normalize Data

Why normalize?

Drawback:

Step 8: Plotting the Data

Step 9: Annotate a Key Event and display the plot

Want to Apply This in Your Work?

Leave a comment Cancel reply

Extract Major Stock Indices with API: A Guide to Fetching and Visualizing Data

Step 1: Install and import relevant libraries

Step 2: Define end date

Step 3: Define the data fetching function

Step 4: Define the ticker(s)

Step 5: Fetch and Store the Data

Step 6: Combine and Clean Data

Why ffill()?

Drawback:

Step 7: Normalize Data

Why normalize?

Drawback:

Step 8: Plotting the Data

Step 9: Annotate a Key Event and display the plot

Want to Apply This in Your Work?

Share this:

Leave a comment Cancel reply

Why `ffill()`?