Introduction to Portfolio Analysis & Optimization with Python.

Dimitris Georgiou
32 min readNov 10, 2020

Disclosure: Nothing in this post should be considered as `investment advice`. This is purely `introductory knowledge`. We focus mainly on `historical data`, and so past performance can not guarantee the performance of future returns and risk. The post provides general examples about :

  • Portfolio Management Process
  • Calculation of Descriptive Statistics like return-based metrics (daily/yearly returns) and risk-based metrics (daily/yearly standard deviation) etc
  • Calculation of Additional Statistics through the Markowitz’s Mean-variance analysis leads directly to the Capital Asset Pricing Model or CAPM: Jensen’s alpha, beta, covariance matrix, correlation matrix, Sharpe Ratio, R squared
  • Portfolio Construction & Optimization
  • Calculation of Capital Allocation Line (CAL)
  • Final Evaluation

Resources:

  • I have created a repo for this post including the Python notebook here, and the excel file here.

Basics for Portfolio Theories

All portfolio theories guide investors to select securities (instruments) that will maximize returns and minimize risk

portfolio = portfolio(max{returns}, min{risk})

I. Traditional Approaches

  1. Dow Theory: Charles Dow (editor of Wall Street Journal USA) made the hypothesis that the stock market does not move on a random basis but is influenced by 3 distinct cyclical trends, which can be carefully examined to make predictions about the future behavior of stock prices :
    * Primary Movements: Long-term 1–3 years, swaying the market up or down
    * Secondary Reactions: Opposite direction to primary, playing a “correction” role
    * Minor Movements: Day to day fluctuations to the market
  2. Random Walk Theory (Efficient Market Hypothesis): Contrary, the behavior is almost unpredictable and there is no relation between present and future stock prices. Price changes shown in the stock market reflect changes in the company's entire industry and economy. According to the Random Walk Theory:
    * All investors have full knowledge of the changes that occurred
    * Instant adjustment in the stock prices with this news
    * Markets are efficient
    As a result, current prices reflect all information in the market and so past information will not be helpful for future predictions
  3. Formula Plans: Mechanical revision techniques enabling investors to benefit from price fluctuations in the market by investing or trading stocks when prices are low and selling them when prices are high. Focusing on loss minimization rather than return maximization
    * Construct 2 portfolios, one aggressive (stocks) and defensive (bonds), which are periodically monitored and adjusted accordingly
    portfolio = portfolio_agg + portfolio_def

II. Modern Portfolio Theory:

  • Harry Markowitz in 1952 used mathematical programming and statistical analysis (variance, correlation) in order to arrange optimum allocation of assets within the portfolios. According to theory, “it is possible to construct an `efficient frontier` of optimal portfolios offering the maximum possible expected return for a given level of risk” (by Investopedia)
  • `Efficient Portfolios` under CAPM yield the highest return for the level of risk accepted. Consider a one-period market with n instruments with the same expected returns E[R_i] and variances Var(R_i). We also denote as w_i the fraction of wealth invested in the i-th instrument.
  • CASE : Consider now two portfolios:
    * Portfolio A : 100% invested in instrument # 1
    * Portfolio B : An equal-weighted portfolio
  • CONCLUSION
    The 2 portfolios may have the same expected return but very different variances. A risk-averse investor will clearly prefer portfolio B which is diversified (diversfication will be analyzed later on) among n different instruments without getting lower returns. This is the central point of Markowitz who pointed out that investors seek to minimize variance for a given level of expected return. The assumptions of MPT are :
    * Investors estimate risk on the basis of the variability of expected returns
    * Investors base their decisions solely on expected returns and variance of returns only.
    * For a given risk level, investors prefer high returns to lower returns. Similarly, for a given level of expected return, investors prefer less risk to more risk.
    * Instrument returns are random variables that follow a normal distribution
    * Markets are efficient.

Portfolio Management Process

I. Assess the Current Situation

Define your values, beliefs, and priorities. Be aware of your current assets, liabilities, and cash flows. Define growth goals properly, so that gaps between those and current investment strategy are identifiable

II. Investment Policy Statement (IPS)

Identify the investor’s risk-return profile through questionnaires and trusted sources (e.g schwab questionnaire). In other words, find your (1) objectives and (2) constraints. For our analysis:
* Objectives: The portfolio objective is financing professional future needs. Investment goals and objectives are long term growth and preservation of capital (return goals), in an above-average risky manner (risk goals). Based on the questionnaireA = 3
* Constraints: Commit a portfolio management program (1) for a short-term horizon of 2 years (2) with low liquidity requirements (3) medium sensitivity to tax savings while (4) unique circumstances like COVID-19 pandemic were of utmost importance

Note:
[1]The closer an investor reached his retirement date, the more allocation may change to reflect less risk tolerance for risk. Instead, this post implements an above-average risk tolerance.

[2] Having an above average risk tolerance can be linked to chasing a long-term rate of return of 8% or more

III. Determine Asset & Instrument Allocation

Taking into consideration the IPS, investors resolve an asset selection problem:
1. Risky asset classes: Stocks, Bonds, Commodities, etc.
2. Risk-free asset classes: T-bills etc
3. Index or Market Benchmark: S&P 500 etc
Secondly, they should resolve an instrument selection problem by choosing instruments among these asset classes
* Stocks:
Apple (APPL), Amazon (AMZN), Google (GOOG)
* Bonds:
Corporate Bonds, Government Bonds
* Commodities:
Gold, Silver, Crude Oil, etc.
* Options
Thirdly, assigning percentages of capital (finding the weights w_i for assets and instruments respectively) is the crucial optimization problem (asset & instrument allocation problem) we will focus on.

The risk-return relationship for different asset classes

Note : Our risk tolerance can lead to the formation of moderately aggresive portfolios (or balanced portfolios) with equal allocation to both equities (stocks) and fixed-income securities (bonds). This is our initial assumption, so we are eager to take over a higher level of risk to gain more in a longer time horizon ( > 5 years)

IV. Active or Passive Investing?

  • Actively Managed portfolios: We allocate our capital at individual stocks, bonds, commodities to achieve optimum diversification → (Followed here)
  • Passive Managed portfolios: We allocate our capital at Index Funds (other ready portfolios) selected from various asset classes and economic sectors → (NOT followed here)

PART 1 : Asset & Instrument Selection

Disclosure: The introductory notes were just a warm-up. We will delve into the joy of coding (aka. no life) and we will further explain terms and formulas.

I. What Assets?

So, we are investing in both stocks and bonds. But is it enough? The common refrain from many financial advisors is that we should allocate 5–10% of our portfolio to commodities.

  • Stocks: Large-cap stocks from well-established companies and the risk of failure is minimal in terms of growth prospects. (moderate amount of capital gains distribution). We mainly focus on the technology domain
  • Bonds: Some bonds to balance the risk of the portfolio. We mainly focus on the healthcare domain
  • Commodities: Having a diversified portfolio of stocks we are already exposed to energy, mining companies, etc, however, they seem to protect against inflation and can enhance diversification (maybe?) of the portfolio

Furthermore, when investing nothing is 100% guaranteed. All instruments carry risk. However risk-free assets like T-bills (Treasury US Bills) carry the smallest risk having “full faith and credit” of the US government, therefore the return is so small and close to the current interest rate. All portfolio should share a portion of these assets

  • Risk-Free Asset: T-bills (3month)

II. What Instruments per Asset Class?

Every instrument comes with the respective ticker, an abbreviation used to uniquely identify publicly traded shares. e.g Apple → APPL.

III. Macroeconomic & Microeconomic events affecting?

In general, surveys have found the significance — through statistical analysis — for different Macroeconomic and Microeconomic events on instrument prices.

MACROECONOMIC EVENTS

  1. Crude Oil War (March 2020- April 2020): After Russia had declined to join the OPEC, Saudi Arabia increased the oil prices in most of the last 2 decades, provoking a war between these largest oil powers.
    * Stock, Bond, T-bill Market: The war had worsened things, by injecting volatility in many stock instruments. More specifically, SP 500 was contracted by 7.6%. Because of the Corona Virus, 3 billion people were in lockdown, so global oil requirements could drop by 20%.
    * Commodities Market: This war could tear also the commodities markets which fanned the flames of a volatile stock market, leading to a greater economic downturn
  2. Corona Virus (Q1 2020 -): This virus operates as a macroeconomic factor. The impact of this could be significant in equity and commodities market.
    * Stock, Bond, T-bill Market: The impact of the virus in stock prices is clear, especially when blue stock chips like Apple and Amazon. AAPL faced a 12% fall and AMZN 15% fall. Some of them was supposed to recover soon especially freight companies like Amazon since delivery demands were increased.
    * Commodities Market: Prices of significant commodities were intact. Furthermore, future markets and gold prices were increasing. Speculators tried to make quick money, by increasing the prices of various essential commodities like oilseeds and pulses, running on thin volumes.

MICROECONOMIC EVENTS

IV. PYTHON

  • Libraries Importing
import numpy as np
import pandas as pd
import datetime
from pandas_datareader import data as web
from functools import reduce
  • Getting Data From Yahoo :
    Instrument Data can be obtained from Yahoo! Finance, Google Finance,Quandl, etc. We will prefer Yahoo Finance. We end up with a dict(11-length) of dataframes(6-column).
## 1 - Define `tickers` & `company names` for every instrument
stocks = {'AAPL':'Apple', 'MSFT':'Microsoft', 'AMZN' : 'Amazon', 'GOOG': 'Google', 'FB':'Facebook','NFLX':'Netflix' , 'NVDA' : 'NVIDIA'}
bonds = {'HCA' : 'HCA', 'ALGT' : 'ALGT'}
commodities = {'BTC-USD' : 'Bitcoin', 'PA=F' : 'Palladium'}
instruments = {**stocks, **bonds, **commodities}
tickers = list(instruments.keys())
## 2 - We will look at stock prices over the past years, starting at ## January 1, 2015
start = datetime.datetime(2015,1,1)
end = datetime.datetime(2020,4,16)
## 3 - Let's get instruments data based on the tickers.
instruments_data = {}
for ticker, instrument in instruments.items():
instruments_data[ticker] = web.DataReader(ticker, data_source = 'yahoo', start = start, end = end)
  • Let’s explore some data series. (AAPL)

Note : (1) Open is the price of the instrument at the beginning of the trading day (2) High is the highest price of the instrument on that day (3) Low the lowest price (4) Volume refers to the number of shares of a security traded between its daily open and close.(5) Close is the price at the end of the trading day (6) Adjusted Close is the close price that adjusts the price for corporate actions reflecting the (a) dividends and (b) splits. When a company issues a dividend, the share price is reduced by the size of the dividend per share, as the company is distributing a portion of the company’s earnings.

  • Keep only Adjusted Close prices
    We end up with the dict(11-length) of dataframes(1-column).
for ticker, instrument in instruments.items():
instruments_data[ticker] = instruments_data[ticker]["Adj Close"]
  • Explore the lengths of dataframes (aka. total trading days)
tr_days = []
for ticker, instrument in instruments.items():
tr_days.append(instruments_data[ticker].shape[0])
tr_days = pd.DataFrame(tr_days, index = tickers, columns = ["Trading Days"])
tr_days_stocks_bonds = instruments_data['AAPL'].groupby([instruments_data['AAPL'].index.year]).agg('count')tr_days_bitcoin      = instruments_data['BTC-USD'].groupby([instruments_data['BTC-USD'].index.year]).agg('count')tr_days_palladium    = instruments_data['PA=F'].groupby([instruments_data['PA=F'].index.year]).agg('count')tr_days_per_year = pd.DataFrame([tr_days_stocks_bonds,tr_days_bitcoin, tr_days_palladium], index = ["Stocks", "Bitcoin", "Palladium"])

Note: The NYSE and NASDAQ average about 253 trading days a year. This is from 365.25 (days on average per year) * 5/7 (proportion work days per week) — 6 (weekday holidays) — 3*5/7 (fixed date holidays) = 252.75 ≈ 253. So, we retrieve data from 01–01–2015 to 16–04–2020.
* Stocks, Bonds = 1331 DAYS
* Bitcoin = 1934 DAYS (There is no day off for the crypotcurrency)
* Palldium = 1381 DAYS (Trading days changed after 2019 from 5/7 to 6/7)

  • Merge Dataframes
    We will merge the instruments dataframes to create a cumulative dataframe . So, the result will only contain common trading days. Therefore we will lose some information on ‘Bitcoin’ and ‘Palladium’.
data = list(instruments_data.values())data_df = reduce(lambda x, y: pd.merge(x, y, left_index=True, right_index=True, how='outer'), data).dropna()
data_df.columns = tickers

The common trading days now for all instruments are T = 1271 while their distribution for all the years (2015, …, 2020) is:

tr_days_per_year = data_df['AAPL'].groupby([data_df['AAPL'].index.year]).agg('count')tr_days_per_year = pd.DataFrame([tr_days_per_year], index = ["All instruments (merged)"])

Note: [1] As we may see the common days for all 11 instruments were only T = 1271 < 1331. Without ‘Palladium’ the merging would lead to T = 1331 as expected
[2] We have taken the individual dataframes for all 11 instrument and we developed a ‘master’ dataframe which will be used for further analysis

  • Visualize Adj. Close for all instruments
fig, ax = plt.subplots(figsize=(12,8))
data_df.plot(ax = plt.gca(),grid = True)
ax.set_title('Adjusted Close for all instruments')
ax.set_facecolor((0.95, 0.95, 0.99))
ax.grid(c = (0.75, 0.75, 0.99))

Note : Last year, Bitcoin faced a bubble, where value dropped almost 8 times. What’s wrong with this chart? While absolute price is important when investing we are more concernced about the relative change of an instrument than its price (and its volatility — risk)

PART 2: Descriptive Statistics (Risk & Return Analysis)

In this section, we show different methods to calculate (for every instrument) :

risk and return are the main pillars for portfolio management

I. RETURN : General

[1] Return on an instrument represents a combination of dividends (in stocks) and changes in the price (capital gain or loose)
[2] A 50-year annualized return for stocks versus bonds shows that
* Years: 1959–2008
* Annualized Returns: Stocks (9.18%) > Bonds (6.48%)

I. RETURN : How to calculate?

  • In portfolio management, we are interested in the annual return in order to compare different instruments or portfolios.
  • Since the method to calculate return is an estimation, these consist expected returns E[R]
  • Our data is daily. So we need to find daily returns and after that calculate the respective annual returns for years 2015, 2016, 2017, 2018, 2019, 2020. We will see about the total annual return for all years compounded.
APR vs APY
Relationship of APR and APY
  • So we need to calculate R_nominal for every year, while N = {244, 241, 237, 237, 242, 71} for these years. There are various ways to do it and transform the prices P_0, …, P_N into returns
Simple Return vs Log Return
  • We will use log returns. From the formula, we see that the sum of the log differences can be interpreted as the total change (as a percentage) over the period summed (which is not a property of the other formulations; they will overestimate growth).

I. RETURN : python

  • Log returns (daily) calculation & visualization
log_returns = data_df.pct_change()
log daily returns
log_returns.plot(grid = True, figsize = (15,10)).axhline(y = 0, color = "black", lw = 2)

Note :[1]According to the previous visualization, the returns are quite volatile for most of these instruments, which can move +/- 10% just on any given day.
[2] Which transformation do you prefer? Changes between days (log differences) are used by advanced methods which modeling the behavior of instruments.

  • Annual Percentage Rate (APR) or R_nominal
APR = log_returns.groupby([log_returns.index.year]).agg('sum')
APR_avg = APR.mean()

Note : As we may see to find a useful return for all the 6 years, we did not take just calculate the log difference between the price of an instrument today and the corresponding price of the instrument 6 years before. Instead we take the average of all 6 APRs. The same procedure was followed for APY

  • Annual Percentage Yield (APY) or R_effective
N = np.array(tr_days_per_year.T)
N_total = np.sum(N)
APY = (1 + APR / N )**N - 1
APY_avg = (1 + APR_avg /N_total )**N_total - 1

Note : As we may see, APY is greater than APR, as expected. The r_effective takes into consideration the compounding of the respective years and stands as a more accurate measure for the return of the underlying instruments. If we had even more trading days (continuous timing) we see that APR and APY are equal.

APR = APY considering Taylor Expansion and R_nominal <<

II. RISK : General

Any type of financial instrument may be exposed to the abovementioned general risks. However, certain types of instruments are also subject to specific types of risk
[1] Money-market instruments & Bonds

  • Credit risk: Deterioration in the credit quality of a corporate issuer → Investor’s risk increased since the probability
  • Interest-Rate risk: The higher an instrument’s duration, the more its yield will be affected by a change in interest rates.
  • High-yield securities risk: Fixed- income instruments (eg. Bonds) and money-market instruments are rated by ‘credit rating agencies’, so those with low or no rating can be considered as speculative → risky

[2] Stocks

  • Volatility risk: Instability of the share prices, or the variability of their returns. The more price varies the higher the volatility
  • Small-cap & Mid-cap risks: The volume of such shares is small → Their prices may fall more sharply and faster than large-caps

II. RISK: How to calculate?

We will calculate the volatility risk since we are provided only with the Adjusted Closed prices of all instruments. So, despite this measure refering to Stock shares we use it to measure risk for Bonds and Commodities as well. Again, we calculate risk annually for each instrument separately.

[1] Volatility Risk-Variance (Daily & Annual)
It is a measure of dispersion. In finance, often variance is synonymous with risk. The higher the variance of an instrument price the higher risk the instrument bears.

Volatility Risk — Variance (Daily)

In order to bring variance in an annual format in which we are interested, we can simply annualize the result by multiplying with the total trading days of the corresponding year N

Volatility Risk — Variance (Annualized)

[2] Volatility Risk -Standard Deviation
The most commonly used measure of dispersion in finance is standard deviation. The relation between standard deviation and variance is :

Volatility Risk — Standard Deviation (Daily)

while the annualized standard deviation will be

Volatility Risk — Standard Deviation (Annualized)

II. RISK: python

  • Standard Deviation (Annualized)
STD = log_returns.groupby([log_returns.index.year]).agg('std') * np.sqrt(252)STD_avg = STD.mean()
Annualized Standard Deviation
Average Annualized Standard Deviation for all 6 years
  • Visualize Standard Deviation(Annualized)
# configuration
fig, ax = plt.subplots(figsize = (16,12))
ax.set_title(r"Standard Deviation ($\sigma$) of all instruments for all years")
ax.set_facecolor((0.95, 0.95, 0.99))
ax.grid(c = (0.75, 0.75, 0.99))
ax.set_ylabel(r"Standard Deviation $\sigma$")
ax.set_xlabel(r"Years")
STD.plot(ax = plt.gca(),grid = True)for instr in STD:
stds = STD[instr]
years = list(STD.index)
for year, std in zip(years, stds):
label = "%.3f"%std
plt.annotate(label, xy = (year, std), xytext=((-1)*50, 40),textcoords = 'offset points', ha = 'right', va='bottom', bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
Visualize Annualized Standard Deviation

Note : Although, the volatility of bonds and stocks is roughly at the same levels for each asset class, an unexpected high volatility value occurred due to COVID-2019. Almost all instruments nowadays are prone to price swings, so our portfolio analysis should be reconsidered when the market is stabilized.

  • Variance (Annualized)
VAR = STD **2
VAR_avg = VAR.mean()
Annualized Variance
Average Annualized Variance for all 6 years

III. RETURN vs RISK : python

# configuration - generate different colors & sizes
c = [y + x for y, x in zip(APY_avg, STD_avg)]
c = list(map(lambda x : x /max(c), c))
s = list(map(lambda x : x * 600, c))
# plot
fig, ax = plt.subplots(figsize = (16,12))
ax.set_title(r"Risk ($\sigma$) vs Return ($APY$) of all instruments")
ax.set_facecolor((0.95, 0.95, 0.99))
ax.grid(c = (0.75, 0.75, 0.99))
ax.set_xlabel(r"Standard Deviation $\sigma$")
ax.set_ylabel(r"Annualized Percetaneg Yield $APY$ or $R_{effective}$")
ax.scatter(STD_avg, APY_avg, s = s , c = c , cmap = "Blues", alpha = 0.4, edgecolors="grey", linewidth=2)
ax.axhline(y = 0.0,xmin = 0 ,xmax = 5,c = "blue",linewidth = 1.5,zorder = 0, linestyle = 'dashed')
ax.axvline(x = 0.0,ymin = 0 ,ymax = 40,c = "blue",linewidth = 1.5,zorder = 0, linestyle = 'dashed')
for idx, instr in enumerate(list(STD.columns)):
ax.annotate(instr, (STD_avg[idx] + 0.01, APY_avg[idx]))

Note : Bitcoin BTC-USD has the greatest return but also the highest volatility as expected. Despite having selected bonds HCA and VRTX to lower portfolio’s risk, this year’s performance is significantly bad. COVID-19 has crumbled world economies.

PART 3 : Additional Statistics (Extra Risk & Return Analysis) — Capital Asset Pricing Model (CAPM)

Let’s find Alpha α, Beta β, Correlation ρ, Covariance Matrix Cov, R squared

I. CAPM (1-factor model)

The central insight of the Capital Asset-Pricing Model is that in equilibrium the riskiness of an asset is not measured by the standard deviation σ of its return but by its beta β.

e.g If it is said the beta of the Google stock is 1.5, then it means that the Google is 50% more volatile than the market

In particular, the CAPM model (a linear model) supposes the existence of a linear relationship between the expected return, E[R] say, of any instrument (or portfolio), and the expected return of the market portfolio E[R_m]. For our analysis, as market portfolio, is considered the S&P 500 (SPX) an index of stocks mostly domiciled in U.S.

CAPM model (1-factor linear regression model)
  • Factor (E[R_m] -R_f): The CAPM is an example of a so-called 1-factor model with the market premium (E[R_m] -R_f) playing the role of the single factor
  • Beta β: The sensitivity of the returns to each factor is represented by the factor-specific beta coefficient.
  • Risk-Free rate R_f: It is the last record of all the period we research ( → constant) from very low-risk instruments e.g 3-month US Treasury Bills, which can be viewed as being virtually risk-free

Problems with CAPM

  1. Beta: Including beta → risk can be measured through price volatility. However, price movements in both directions are not equally risky. So using beta or standard deviation is an assumption based on that prices are normally distributed which are not.
  2. Risk-Free Rate: CAPM assumes R_f will remain constant over the whole discounting period, which will not (changes daily/weekly/monthly) → This could lead to instruments prices overvalued.
  3. Market Portfolio : E[R_m]-R_f is theoretical value. Assuming S&P500 to substitute the market is an imperfect comparison

II. Expanded CAPM (1-factor model)

Hence, implementation of the CAPM that does not include a time dimension requires adding the assumptions concerning the return generation process and estimate model over time.

In general, the return of an instrument R consists of 2 parts: Expected & Unexpected (systematic + unsystematic risk) parts

  • Systematic risk: It is the impact of unanticipated macro events. It can get to zero m →0 through diversification. However different firms can be differently affected by macro events, so if the macro factor is F we simply denote as beta β the sensitivity of the firm to the specific factor. Therefore, m = βF. Here the only factor that matters is the market return. So F = R_m-E[R_m] and finally m = β(R_m-E[R_m])
  • Unsystematic risk: It is the impact of unanticipated firm-specific events. It cannot get to zero ε <>0. Unpredictable
Single Index Model

Moreover, in order to test the empirical performance of the CAPM, we have to obtain the test equation with ex-post data. The final fusion model, formatted by combining the CAPM with the single-index model, is implicitly based on that (1) the CAPM and the single-index assumptions model simultaneously hold in every period and that (2) beta is stable over time. So, the empirical performance is tested by using the excess returns (risk premiums) and not the expected returns

CAPM expanded model (using excess returns)

III. How to calculate? How to interpret?

Calculate α, β
α, β interpretation

However how much do we trust these results? We will calculate R squared R² which measures the degree to which the instrument’s performance can be attributed to the performance of the selected benchmark index (S&P 500). So R² ∈(0% , 100%). Since we try to resolve a linear regression problem, the R² is just the correlation ρ of the underlying instrument squared

R² interpretation

IV. Python?

  • Risk-Free Instrument R_f (13-week T-bill (^IRX))
risk_free = web.DataReader('^IRX', data_source = 'yahoo', start = start, end = end)['Adj Close']risk_free = float(risk_free.tail(1))
  • Market Instrument R_m (Assume S&P500 (^GSPC))

It is just a portfolio of 500 instruments from the largest companies in the United States.

##########################################
## Market Instrument R_m data importing ##
##########################################
market = web.DataReader('^GSPC', data_source = 'yahoo', start = start, end = end)['Adj Close']
market = market.rename("^GSPC")
market_log_returns = market.pct_change()
log_returns_total = pd.concat([log_returns,market_log_returns], axis = 1).dropna()############################
## Descriptive Statistics ##
############################
# RETURN
log_returns_total = pd.concat([log_returns,market_log_returns], axis = 1).dropna()
APR_total = log_returns_total.groupby([log_returns_total.index.year]).agg('sum')
APR_avg_total = APR_total.mean()
APR_avg_market = APR_avg_total['^GSPC']
# RISK
STD_total = log_returns_total.groupby([log_returns_total.index.year]).agg('std') * np.sqrt(N)
STD_avg_total = STD_total.mean()
STD_avg_market = STD_avg_total['^GSPC']
Return E[R], and risk σ for all 12 instruments ( + market)

We merged the data for the market (S&P500) along with the rest 11 instruments for reasons of comparion. As we see S&P500 has the lowest return and risk contrary to the rest ‘famous’ instruments

  • Correlation & R squared (find ρ and R²)
corr = log_returns.corrwith(market_log_returns)
r_squared = corr ** 2
R squared R²
  • Apply expanded CAPM (find α and β)
def CAPM():
# 1 - Calculate average Risk Premium for every instrument
# [*] _
# E[R] - R_f
# [*] __
# E[R_m] - R_f
APR_premium = APR_avg - risk_free
APR_market_premium = APR_avg_market - risk_free
# 2- Calculate α, β
beta = corr * STD_avg / STD_avg_market
alpha = APR_premium - beta * APR_market_premium

return alpha, beta
alpha, beta = CAPM()
visualize_statistic(alpha.values, "Alpha α")
visualize_statistic(beta.values, "Beta β", limit = 1)
Alpha α

Note : Positive α > 0 : αll instruments have positive alpha. BTC-USD has α = 0.682, followed by NVDA with α = 0.515, generating excess return over S&P 500. As the market matures and price discovery happens across instruments, it is difficult to generate alpha in large-caps. Since alpha generation is compressed, active-management is required. However having a > 0 like here is the most desirable scenario for an investor.

Beta β

Note : Positive beta β > 0 implying positive correlations with the volatility of the S&P 500.

β > 1 : AAPL, MSFT, GOOG, FB, NFLX, NVDA, VRTX theoretically considered more volatile than S&P 500. Almost all large-cap stocks have a beta value of 1 as it is the primary constituent of major benchmark indices operating in the country.

β < 1: AMZN is not as risky as the rest instruments.

β ~ 0 : BTC−USD and PA=F have beta low close to 0 considered as less risky than market, but this is not a useful insight, since they are almost uncorrelated so we cannot trust the measurement to generalize about its volatility. Especially for BTC-USD we saw before that its risk is quite high (~0.6)

  • Visualize CAPM
visualize_model(alpha/100, beta, data = log_returns_total.copy(), model = 'CAPM')
The linear model CAPM y =α x + β fitted (found α, β) for each instrument

PART 4: Portfolio Construction & Portfolio Optimization

So since we are not interested in the expected return and risk of a collection of individual instruments, but rather we prefer insights and information for the portfolio of instruments as whole. By doing this, the benefits of diversification are better captured.

[1] How much of each instrument will we need?

Our portfolio consists of the following :

  • risk-free instrument (1 instrument)
  • risky portfolio (11 instruments)

So, the first input are the weights of these 12 instruments w_i ; how much of each instrument do we hold as percentage of the entire portfolio holdings. Of course each w_i ∈ (0%, 100%), while the sum of these weights should cover the whole picture of 100%.

[2] How do we start to calculate the expected return and risk of that portfolio?

If the weight w denotes the share of wealth invested in the risky portfolio , then the weight 1-w denotes the share of wealth invested in the risky-free asset. So, the return and the risk of the entire portfolio will be :

Note : From now on we will use for each instrument
1) Expected Return of an instrument : E[r]. It should be annual. We have annual returns APR, but these are 6 values for 6 years. This is why we will use APR_avg
2) Covariance Matrix Σ (for the risk) : It will be calculated on annual values as well. So we will use APR. How the covariance matrix is estimated can have important implications for the practice of modern finance. Our approach here however is straighforward, and so is the calculation of the covariance matrix

[3] How will we measure the efficiency of the portfolio?

By measuring the Sharpe Ratio, which is the ratio of the expected excess return of the portfolio to the portfolio’s volatility. Along with Treynor ratio and Jensen alpha, are often used to rank the performance of a portfolio. So we need to maximize it.

Sharpe Ratio
SR interpretation
portfolios = {"#1 dummy (risky)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},"#1 dummy (total)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},"#2 optimized max sr (risky)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},"#2 optimized max sr (total)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},"#2 optimized min σ (risky)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},"#2 optimized min σ (total)" : {"Return E[R]" : 0, "Risk σ" : 0, "Sharpe Ratio SR" : 0},}

I. Portfolio #1 (the dummy portfolio)

OK let’s start by assigning the obvious weights to the underlying instruments. Those that were derived through the risk analysis [ PART 1]. We know the weights for the 4 asset classes. For example w_stocks = 45%, so since we have # stocks = 7 , the weight for each stock will be w_i = 45% / 7 ~ 6.4%

portfolio #1
# WEIGHTS, RETURN, RISK
cov = APR.cov()
weights = np.array([ 0.45/ 7] * 7 + [ 0.35 / 2] * 2 + [ 0.1 / 2] * 2)
expected_return = np.sum(APR_avg * weights)
expected_risk = np.sqrt( np.dot(weights.T , np.dot(cov, weights)))
# RISKY PORTFOLIO
portfolios["#1 dummy (risky)"]["Return E[R]"] = expected_return
portfolios["#1 dummy (risky)"]["Risk σ"] = expected_risk
portfolios["#1 dummy (risky)"]["Sharpe Ratio SR"] = (expected_return - risk_free) / expected_risk
# TOTAL PORTFOLIO
total_expected_return = 0.9 * expected_return + 0.1 * risk_free
total_expected_risk = 0.9 * expected_risk
portfolios["#1 dummy (total)"]["Return E[R]"] = total_expected_return
portfolios["#1 dummy (total)"]["Risk σ"] = total_expected_risk
portfolios["#1 dummy (total)"]["Sharpe Ratio SR"] = (total_expected_return - risk_free) / total_expected_risk
E[r], σ, SR fosr portfolio #1

Note :
[1] The Portfolio #1 (dummy risky) has higher return than the Portfolio #1 (dummy total) but also a higher risk. The result of this sample approach is disappointing despite translating our risk-aversion policy into weights. Fortunately, The total dummy portfolio , after allocating 10% in the risk-free rate, achieves a low risk of 16,65%. However, in addition to the weakness of Sharpe ratio of assuming normal distribution for the expected returns, its low value achieved here isless than 1 and it is considered unacceptable.

[2] This was a sample approach based on our investing horizon and our risk-aversion. However the analysis is expanded to find the optimal weights that maximize (or minimize) the objective functions.

II. Portfolio #2 (the optimized portfolio)

Great, with this simple approach we found out how to calculate the expected return, risk and sharpe ratio of our current portfolio. But as we said, we are not happy about the result. Can we reduce the risk ? Can we achieve a higher return if we are willing to undertake more risk? This E[r] and σ of the underlying instruments will not change, so we should therefore rearrange the weights again. But how?

We could start by randomly or manually alternating the weightsarray, execure the code again, repeat. We should think that w_i is not an integer so number of portfolio weights are infinite to examine. So the manual approach is not practical. Instead, we will apply the Monte Carlo simulation to construct 10000 different randomly generated weights for the underlying instruments (and therefore 10000 different portfolios) and then calculate the expected return, risk, sharpe ratio as we did before.

The methodology is very simple.
1. Calculate the random weights
2. Calculate E[r], σ, SR of the generated portfolio & save the results
3. Repeat 10000 times

In this direction we are really interested for 2 special portfolios

[1] Portfolio #2 (the optimized)-MAXIMUM Sharpe Ratio SR
[2] Portfolio #2 (the optimized)-MINIMUM Risk σ

optimization problem to find portfolio #2
num_portfolios = 10000
generated_portfolios = [] # store the results
for _ in range(num_portfolios) :
# 1 - select random weights for portfolio holdings &
# rebalance weights to sum to 1
weights = np.array(np.random.random(11))
weights /= np.sum(weights)
# 2 - calculate return, risk, sharpe ratio
expected_return = np.sum(APR_avg * weights)
expected_risk = np.sqrt(np.dot(weights.T,np.dot(cov,weights)))
sharpe_ratio = ( expected_return - risk_free) / expected_risk
# 3 - store the result
generated_portfolios.append([ expected_return, expected_risk, sharpe_ratio, weights] )
maximum_sr_portfolio = sorted(generated_portfolios, key = lambda x : -x[2])[0]
minimum_risk_portfolio = sorted(generated_portfolios, key = lambda x : x[1])[0]
max_sr = maximum_sr_portfolio[2]
max_sr_weights = pd.DataFrame(maximum_sr_portfolio[3], index = log_returns.columns ,columns = ["Optimal Weights #2 optimized max sr "]).T
min_risk_weights = pd.DataFrame(minimum_risk_portfolio[3], index = log_returns.columns ,columns = ["Optimal Weights #2 optimized min σ "]).T

Let’s see the overall results

# RISKY PORTFOLIOS
portfolios["#2 optimized max sr (risky)"]["Return E[R]"] = maximum_sr_portfolio[0]
portfolios["#2 optimized max sr (risky)"]["Risk σ"] = maximum_sr_portfolio[1]
portfolios["#2 optimized max sr (risky)"]["Sharpe Ratio SR"] = (maximum_sr_portfolio[0] - risk_free) / maximum_sr_portfolio[1]
portfolios["#2 optimized min σ (risky)"]["Return E[R]"] = minimum_risk_portfolio[0]
portfolios["#2 optimized min σ (risky)"]["Risk σ"] = minimum_risk_portfolio[1]
portfolios["#2 optimized min σ (risky)"]["Sharpe Ratio SR"] = (minimum_risk_portfolio[0] - risk_free) / minimum_risk_portfolio[1]
# TOTAL PORTFOLIOS
total_expected_return = 0.9 * maximum_sr_portfolio[0] + 0.1 * risk_free
total_expected_risk = 0.9 * maximum_sr_portfolio[1]
portfolios["#2 optimized max sr (total)"]["Return E[R]"] = total_expected_return
portfolios["#2 optimized max sr (total)"]["Risk σ"] = total_expected_risk
portfolios["#2 optimized max sr (total)"]["Sharpe Ratio SR"] = (total_expected_return - risk_free) / total_expected_risk
total_expected_return = 0.9 * minimum_risk_portfolio[0] + 0.1 * risk_free
total_expected_risk = 0.9 * minimum_risk_portfolio[1]
portfolios["#2 optimized min σ (total)"]["Return E[R]"] = total_expected_return
portfolios["#2 optimized min σ (total)"]["Risk σ"] = total_expected_risk
portfolios["#2 optimized min σ (total)"]["Sharpe Ratio SR"] = (total_expected_return - risk_free) / total_expected_risk
Optimal Weights for portfolios #2
E[r], σ, SR for portfolios #2

We located the 2 ‘special’ portfolios. It’s then very simple (and helpful) to plot these combinations of return and risk. We will colour the data points (portfolios) based on their sharpe ratio performance (the higher the denser the blue)

def plot_simulation(CAL = None, INSTRUMENTS = None) :
fig, ax = plt.subplots(figsize = (18,12))
ax.set_facecolor((0.95, 0.95, 0.99))
ax.grid(c = (0.75, 0.75, 0.99))
# portfolio #1
ret = portfolios["#1 dummy (risky)"]["Return E[R]"]
risk = portfolios["#1 dummy (risky)"]["Risk σ"]
sr = (ret - risk_free) / risk
ax.scatter(risk, ret, marker = (5,1,0),color = 'y',s = 700, label = 'portfolio #1 (dummy)')
ax.annotate(round(sr, 2), (risk - 0.006,ret + 0.013), fontsize = 20, color = 'black')
# portfolio #2
ret, risk, sr = [x[0] for x in generated_portfolios], [x[1] for x in generated_portfolios], [x[2] for x in generated_portfolios]
ax.scatter(risk, ret, c = sr, cmap = 'viridis', marker = 'o', s = 10, alpha = 0.5)
ax.scatter(maximum_sr_portfolio[1], maximum_sr_portfolio[0],marker = (5,1,0),color = 'r',s = 700, label = 'portfolio #2 (max sr)')
ax.annotate(round(maximum_sr_portfolio[2], 2), (maximum_sr_portfolio[1] - 0.006,maximum_sr_portfolio[0] + 0.013), fontsize = 20, color = 'black')
ax.scatter(minimum_risk_portfolio[1], minimum_risk_portfolio[0], marker = (5,1,0), color = 'g',s = 700, label = 'portfolio # (min risk)')
ax.annotate(round(minimum_risk_portfolio[2], 2), (minimum_risk_portfolio[1] - 0.006,minimum_risk_portfolio[0] + 0.013), fontsize = 20, color = 'black')
# CAL?
if CAL :
ax.plot(CAL[0], CAL[1], linestyle = '-', color = 'red', label = 'CAL')
if INSTRUMENTS :
ax.scatter(STD_avg, APR_avg, s = s , c = c , cmap = "Blues", alpha = 0.4, edgecolors = "grey", linewidth = 2)
for idx, instr in enumerate(list(STD.columns)):
sr = round((APR_avg[idx] - risk_free) / STD_avg[idx] , 2)
ax.annotate(instr, (STD_avg[idx] + 0.01, APR_avg[idx]))
ax.annotate(sr, (STD_avg[idx] - 0.005 , APR_avg[idx] + 0.015))
ax.set_title('10000 SIMULATED PORTFOLIOS')
ax.set_xlabel('Annualized Risk (σ)')
ax.set_ylabel('Annualized Returns (APR_avg)')
ax.legend(labelspacing = 1.2)

plot_simulation()
Monte Carlo simulation for the construction of 10000 random portfolios

Note :
[1] For the Portfolio #2 (minimum risk) weighting of bonds is increased by ~ 15.7 % (1% →16.7%) while the stocks and commodities weighting preserved at same levels. When allocating larger percentages to these, risk minimization is required
[2] Both portfolios #2 achieved SR ≥1.77 which is far better than the SR= 0.71 of the portfolio #1. The result is considered as ‘relatively good’

[3] From the figure above we can understand that changing the weight of each instrument in the portfolio can impact dramatically on the expected return and level of risk the investor is exposed to. e.g If an investor wants to achieve a E[r] = 30%, he/she can do that by getting exposed to portfolios with risk so different in the range (10%, ~38%).

One last thing

  • Efficient Frontier
    We proceeded with the istrument-based approach. The collection of all 10000 data points (x,y) = (σ, E[r]) constitute the feasible region, which is convex to the left. The efficient frontier will only include some of these data points, the portfolios that have the lowest risk for every level of return met (the left convert bound of the curve). [Further analysed on future article]
  • Capital Allocation Line
    We have the 11 instruments in the portfolio (risky). We need now to include the risk-free rate (total-12 instruments in the portfolio) .
    The capital allocation line (CAL), is a line created on a graph of all possible combinations of risk-free and risky portfolio. The portfolio is formed using the calculated optimal weights for the risky portfolio, and taking into consideration the inclusion of the risk-free instrument (T-bills). The average yearly risk-free rate is roughly 10.99%. Moreover, the Capital Allocation Line has a slope of 1.7707 (the Sharpe ratio of the maxium sr portfolio), irrespective of the weighting between the risk-free instrument and the risky portfolio.
Capital Allocation Line
''' Capital Allocation Line (CAL) '''
cal_x = np.linspace(0.0, 0.3, 50)
cal_y = risk_free + cal_x * max_sr

Let’s plot the final result

plot_simulation(CAL = [cal_x, cal_y] , INSTRUMENTS = 'yes')
10000 generated portfolios + portfolio #2 + portfolio #1 + all 11 instruments

Note :
[1] We can see here how all these 3 portfolios (portfolio #1 #2) are distributed in the mean-variance space along with the performance of the individual instruments. Investing in any portfolio combination (even a random) is far more favorable than investing solely in individual instruments, especially if this portfolio is not random and it is optimized by applying any methodology. Historical data should be used wisely, this is just a demonstration to take insights for our choices and be rational imnvestors.
[2] WARNING : The CAL is the efficient frontier when taking into account risk-free rate. For example, a) we invest (w_R_f, w_risky) → (100%, 0%) , then we are at the begining of the line with (E[r], σ) = (0.11, 0) b) we invest (w_R_f, w_risky) → (0%, 100%) , then we are at portfolio #2 max sr so (E[r], σ) = (0.316, 0.116)
[3] Of course for every portfolio there is a different CAL. We could have plotted the rest 2 CALs as well. Here, we specified our analysis for the portfolio #2 max sr.

PART 5: A simple Utility Function

Using the Certainty Equivalence Test to construct the utility function (the math are skipped), we calculate the utility for different risk aversions A ∈[1,10] to check if the utility always exceeds the risk-free instrument. The 0.5 scales the marginal utility (1st Derivative) and here reflects the use of fractional returns

Simple Utility Function
A = np.linspace(0, 10, 10)
utility_dummy = portfolios["#1 dummy (risky)"]["Return E[R]"] - 1/2 * A * portfolios["#1 dummy (total)"]["Risk σ"] ** 2
utility_max_sr = portfolios["#2 optimized max sr (total)"]["Return E[R]"] - 1/2 * A * portfolios["#2 optimized max sr (total)"]["Risk σ"] ** 2utility_min_risk = portfolios["#2 optimized min σ (total)"]["Return E[R]"] - 1/2 * A * portfolios["#2 optimized min σ (total)"]["Risk σ"] ** 2

Having calculated the utilites for the 3 interesting portfolios, let’s plot the results.

fig, ax = plt.subplots(figsize = (18,12))
ax.set_facecolor((0.95, 0.95, 0.99))
ax.grid(c = (0.75, 0.75, 0.99))
# Risk Free
ax.plot(A, [risk_free] * 10, color = 'y', label = 'risk free', linewidth = 4)
# Portfolio #1
ax.scatter(A, utility_dummy, color = 'r',s = 50)
ax.plot(A, utility_dummy, color = 'r', label = 'portfolio #1 (dummy)')
# Portfolio #2 (max sr)
ax.scatter(A, utility_max_sr, color = 'b',s = 50)
ax.plot(A, utility_max_sr, color = 'b', label = 'portfolio #2 (max sr)')
# Portfolio #2 (min risk)
ax.scatter(A, utility_min_risk, color = 'black',s = 50)
ax.plot(A, utility_min_risk, color = 'black', label = 'portfolio #2 (min risk)')
ax.set_title('Utility Function U = E[r] - 1/2 * A * σ ^{2}')
ax.set_xlabel('Risk Aversion (A)')
ax.set_ylabel('Utility (U)')
ax.set_ylim([0, 0.5])
ax.legend(labelspacing = 1.2)
Utility (U) for the 3 interesting portfolios

Note :
[1] Even with the highest risk aversion of A = 10, the utility of the 2 optimized portfolios (25.67% and 22.45% respectively) exceeds a lot the risk-free rate of 10.99%.
[2] The guaranteed cash from investing only in T-bill yields less than the expected utility as a given risky portfolio with absolute certainty. If each instrument is considered individually, then yes it would be favorable to invest in T-bills. How- ever, investing in this Total Portfolio yields more than investing solely to T-bills.
[3] For high risk aversion amounts, the performance of the dummy portfolio # 1 has less utility even from the risk-free instrument. This is one reason for which optimization is a useful technique for every investor.

PART 6: Final Critic Evaluation & Future Work

I. Final Critic Evaluation

portfolio = portfolios["#2 optimized max sr (total)"]
ret = portfolio['Return E[R]']
risk = portfolio['Risk σ']
sr = portfolio['Sharpe Ratio SR']
utility = ret - 1/2 * 3 * risk ** 2
portfolio = pd.DataFrame([str(round(ret * 100, 2)) + "%", str(round(risk * 100, 2)) + "%", sr, str(round(utility * 100, 2) ) + "%"], index = ['Return E[R]', 'Risk σ', 'Sharpe Ratio SR', 'Utility U'] ,columns = ["Portfolio #2 optimized max sr "]).Tportfolio
Final Results for portfolio # 2 (max sr)
  1. DIVERSIFICATION : Investing in a diversified portfolio is a more favorable and more “useful” option than investing individually in the risk-free rate or in individual instruments. For our above average risk-aversion strategy (A = 3) the optimized portfolio’s #2 (max sr)’s mean-variance performance seems good. A sharpe ratio of SR ~1.77 is considered as ‘relatively good’.
  2. RISK-FREE & CAL : This optimized and diversified portfolio gives the optimal ‘reward-to-risk’ ratio that minimizes the risk for any given level of return. A risk-free instrument is necessary for a portfolio of N instruments, in order to construct the Capital Allocation Line (CAL), where investors choose an ideal position according to the degree of risk aversion
  3. STRATEGIC CAPITAL ALLOCATION : We extracted insights about instruments and portfolios (variety of metrics) that can be proven helpful in the future (return, risk)
  4. INVESTMENT HORIZON : Investing in the short run for various reasons my become a constraint in the investment process and affect the IPS
  5. MACRO & MICRO ANALYSIS : The portfolio construction and management should follow any policy or company events or even any international macro or microeconomic effects on the international economy, such as the coronavirus outbreak

II. Future Work

GENERAL

  • Expanded time period way back to 2000’s (considering macroeconomic factors such as the 2002 downturn and the 2008 crash crisis)
  • The feasible set of solutions would be expanded in the right side (short sales permitted), containing more portfolios that have the same low levels of return but with low volatility

MEAN-VARIANCE FRAMEWORK

  • Considering rolling returns may provide a more accurate view on our investments
  • Build a feasible set of efficient frontiers using metrics like APY
  • Use the Bayesian technique Black-Litterman to better estimate risk and return

MACRO & MICRO ANALYSIS

  • Applying a Principal Component Analysis (PCA) would prove survey’s findings and eventually keep the few most important factors that could be used as extra features since they capture important price data information.

Abbreviations & Explanation

  • Security = Instrument
  • Risk = Volatility
  • Commodities: Basic goods used in commerce being interchangeable with other goods of the same type
  • Bonds (in Fixed-Income): A fixed income instrument that represents a loan made by an investor to a borrower (typically corporate or governmental)
  • Stocks (in Equities): Form of security that indicates the holder has proportionate ownership in the issuing corporation.
  • Market Portfolio: A portfolio consisting of a weighted sum of every asset in the market, with weights in the proportions that they exist in the market, with the necessary assumption that these assets are infinitely divisible.
  • Return : (1) R is return (2) E[R] is expected return (3) E[R]-R_f is risk premium

See Also

--

--

Dimitris Georgiou

Senior Electrical and Computer Engineering Student @ NTUA & Greece Local Operations @Uber | Topics of Interest: Finance, Machine Learning