Vix single day spikes & their historical returns

Taking a look at Vix single day spikes, since there have been two rather significant and rare single day spikes of 39 and 49% in 2016. Note im not talking about Vix daily swings, but single day spikes from close to close. The intention here is to gauge how Vix has historically behaved after significant single day spikes

First, importing modules, vix data etc. Im running this on Quantopian so importing data is straightforward


from quantopian.interactive.data.quandl import yahoo_index_vix
from odo import odo
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import scipy as sp

Setting up and cleaning the vix dataframe that came from quandl and adding the single day % change values

data = odo(yahoo_index_vix, pd.DataFrame)
data = data.drop(["open_", "high", "low", "adjusted_close", "volume", "timestamp"], axis = 1)
data["close"].loc[18] = 14.04 #Fixing error data
data = data.set_index(["asof_date"])

data["pct_change"] = data["close"].pct_change()
data = data.dropna()

This years second biggest spike was 39%. There have been only 12 single day spikes of 39% or greater since 1990 (again, from close to close)

len(data[data["pct_change"] > 0.39])

12

When they occurred and what their actual spike percentages were

data[data["pct_change"] > 0.39].sort()

screen-shot-2016-12-09-at-18-26-13

First, looking at all Vix daily percent changes as an empirical cumulative distribution, so one gets better idea how the daily percent changes are distributed. In this post, we are interested in the positive daily percent changes. As one can observe, spikes above 20% are rather rare

ecd = np.arange(1, len(data)+1, dtype=float) / len(data)
ticks = np.arange(-0.3, 1.1, 0.1)

plt.plot(np.sort(data["pct_change"]), ecd)
plt.xlabel("Vix daily percent changes (from close to close)")
plt.ylabel("Fraction of daily pct changes that are smaller than corresponding x")
plt.axvline(linestyle="--", color="#333333", linewidth=1)
plt.xticks(ticks)
plt.yticks(ticks)
plt.grid(alpha=0.21)
plt.margins(0.05)

unknown-1

We can now plot the returns N days later from a spike, iterations are for 10% and 20% spike returns 5 days later in order to have a large enough sample set. I wanted to see weither or not the spike % was correlated to the forward returns, it seems to be the case. The larger the single day spike, the more likely a negative Vix return down the road. Though it must be noted that the sample set for spikes larger than 20% is low

def pct_ret(close, amount):
rets = (close.shift(-amount) / close) - 1
    return rets

ret10 = pct_ret(data["close"], 5).where(data["pct_change"] > 0.1).dropna()
pct10 = data["pct_change"][data["pct_change"] > 0.1]

ret20 = pct_ret(data["close"], 5).where(data["pct_change"] > 0.2).dropna()
pct20 = data["pct_change"][data["pct_change"] > 0.2]

slope, intercept, r_val, p_val, std_err = sp.stats.linregress(pct10, ret10)
ret10_predict = intercept + slope * pct10

plt.plot(pct10, ret10_predict, "-", label="Linreg")
plt.scatter(data["pct_change"][data["pct_change"]>0.1], ret_10,
color="#333333", alpha=0.55, label="Vix spike >= 10%, return 5 days later")
plt.scatter(data["pct_change"][data["pct_change"]>0.2], ret_20,
color="crimson", s=34, label="Vix spike >= 20%, return 5 days later")

plt.ylabel("Vix % return 5 days later")
plt.xlabel("Vix single day spike % (from close to close)")
plt.axhline(linestyle="--", linewidth=1, color="#333333")
plt.legend(loc="upper right")
plt.ylim(-0.5, 1)
plt.grid(alpha=0.21)
plt.title("R2={}".format(r_val**2))

 

unknown

For example, mean Vix return after a 20% single day spike or greater, 5 days later is about -16%

np.mean(ret_20[ret_20 < 0])

-0.16629674607687678

For a clearer picture on how Vix actually looks like after significant spikes, we can also plot the N day returns of all instances where a significant spike occurred (in the chart below, its 64 trading days). There are notable rebound tendencies at the 10th, 20-23rd and 40th trading days after a spike. The higher spike means are more pronounced since the sample size is rather smaller on those instances

data2 = data.copy().reset_index()

def rets(df, days, pct, pct_to):
ret_df = pd.DataFrame()
for index, row in df.iterrows():
if row["pct_change"] > pct and row["pct_change"] < pct_to and df["pct_change"].iloc[index-1] < pct:
ret = df["close"].iloc[index:index+days]
ret = np.log(ret).diff().fillna(0)
ret = pd.Series(ret).reset_index(drop=True)
ret_df[index] = ret
return ret_df

twenty = rets(data2, 65, 0.2, 0.3).mean(axis=1).cumsum()
thirty = rets(data2, 65, 0.3, 0.4).mean(axis=1).cumsum()
forty = rets(data2, 65, 0.4, 1).mean(axis=1).cumsum()

plt.plot(twenty, color="royalblue", label="Vix mean return after a spike > 20% and < 30%")
plt.plot(thirty, color="crimson", label="Vix mean return after a spike > 30% and < 40%")
plt.plot(forty, color="cadetblue", label="Vix mean return after a spike of > 40%")

plt.xlabel("# Of days from spike")
plt.ylabel("Vix % return")
plt.grid(alpha=0.21)
plt.ylim(-0.3, 0.1)
plt.xlim(0, 64)
plt.legend(loc="upper right")

unknown-1

The mean returns chart is desceptive, since there are of course plenty of instances where Vix just keeps going up, so one can get a better picture by looking at all the instances. In the case below, i plotted all spike instances of 20% or greater, 64 days forward

plt.plot(rets(data2, 65, 0.2, 1).cumsum(axis=0), linewidth=1, alpha=0.21, color="#333333")
plt.plot(forty, color="crimson", label="Mean")

plt.title("All instances of Vix singe day spikes > 20%, returns 64 days forward")
plt.xlabel("# Of days from spike")
plt.ylabel("Vix % return")
plt.grid(alpha=0.21)
plt.axhline(0, linewidth=1, linestyle="--", color="#333333")
plt.xlim(0, 64)
plt.legend(loc="upper right")

unknown-2

One additional way of looking at the dataset is to make a heatmap of all single day spike instances, meaning we plot out all Vix single day spikes and their mean returns, regardless of the size of the spike or the direction of the spike. First converted the pct changes to integers and then grouped all the data by those. From that a heatmap of mean returns for all Vix spike % instances can be summoned

Nothing meaningful happens in the middle of the % change range, but the edges are more pronounced, however again its worth noting that the sample size on the edges is also smaller

df3 = data.copy()

for i in range(1, 35):
df3[str(i)] = pct_ret(data3["close"], i)

df3["pct_change"] = df3["pct_change"].apply(lambda x: int(round(x*100)))
df3.reset_index(inplace=True)
df3.drop(["asof_date","close"], axis=1, inplace=True)

grouped = df3.loc["1":].groupby(df3["pct_change"], as_index=True, squeeze=True).mean()
grouped.drop("pct_change", axis=1, inplace=True)

plt.figure(figsize=(16, 13))
sns.heatmap(grouped, annot=False, cmap="RdBu")
plt.ylabel("Vix single day spike %")
plt.xlabel("Number of days after spike")
plt.title("Vix single day % spikes vs. mean returns N days later")

unknown-3

100 Years of dow jones returns

A quick look at annual returns over the 100+ years of daily percent change (close to close) data that we have on dow jones

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime

dj = local_csv("DjiaHist.csv", date_column = "Date", use_date_column_as_index = True)
dia = get_pricing("DIA", start_date = "2016-01-01", end_date = datetime.date.today(), frequency = "daily")

First cleaning up the data, especially the dates. Also adding day of the year into the df in order to sort all returns based on the day of the year and plot em all at once later

dj.sort_index(ascending=True, inplace=True)
dj.index = pd.to_datetime(dj.index)
dj.rename(columns={"Value" : "value"}, inplace=True)
dj["pct"] = np.log(dj["value"]).diff()
dj["year"] = dj.index.year
dj["day"] = dj.index.dayofyear

dia["day"] = dia.index.dayofyear
dia["pct"] = np.log(dia["price"]).diff()
dia = dia.drop(["open_price", "high", "low", "volume", "close_price"], axis=1)
dia.set_index(dia["day"], inplace = True)
dia.fillna(0, inplace=True)

Pivoting the returns table, so that we get returs for all years and all days of the year in separate columns

daily_rets = pd.pivot_table(dj, index=["day"], columns=["year"], values=["pct"])
daily_rets.convert_objects(convert_numeric = True)
daily_rets.fillna(0, inplace = True)
daily_rets.columns = daily_rets.columns.droplevel()
daily_rets.drop(2016, axis =1, inplace = True)
daily_rets.rename(columns = lambda x: str(x), inplace=True)

daily_rets.head(8)

screen-shot-2016-11-07-at-12-28-50

Heres how it looks with all years plotted along with 2016

f, ax = plt.subplots(figsize=(18, 12))
ax.plot(daily_rets.cumsum(), color="#333333", linewidth=1, alpha=0.1, label=None)
ax.plot(dia["pct"].cumsum(), linewidth=2, color="crimson", label="2016 returns")
plt.grid(False)
plt.ylabel("Annual return")
plt.xlabel("Day of the year")
plt.ylim(-0.7, 0.7)
plt.xlim(0, 365)
plt.axhline(0, linewidth= 1, color="#333333", linestyle="--")
plt.legend(loc="upper left")

unknown

Adding the mean returns of all years so one can compare with 2016.
Also added a daily returns histogram so the historical day to day fluctuatios are more clear and positive or negative periods are painted out clearly

daily_rets["mean"] = daily_rets.mean(axis=1)
daily_rets["2016"] = dia["pct"]

plt.figure(figsize=(18, 12))

ax1 = plt.subplot2grid((4,1), (0,0), rowspan=3)
ax1.plot(daily_rets.index, daily_rets.cumsum(), color="#333333", linewidth=1, alpha=0.06, label=None)
ax1.plot(daily_rets["mean"].cumsum(), color="#333333", linewidth=2, alpha=0.8, label="Mean returns since 1896")
ax1.plot(daily_rets["2016"].dropna().cumsum(), linewidth=2, color="crimson", label ="2016 returns")
plt.title("Cumulative 2016 Returns Vs Mean Historical Returns Since 1896")
plt.axhline(0, linewidth= 1, color="#333333", linestyle="--")
plt.ylim(-0.15, 0.15)
plt.grid(False)
plt.legend(loc="upper left")

ax2 = plt.subplot2grid((4,1), (3,0), rowspan=3, sharex=ax1)
ax2.fill_between(daily_rets.index, 0, daily_rets["mean"], where= daily_rets["mean"]<0, color="crimson")
ax2.fill_between(daily_rets.index, daily_rets["mean"], 0, where= daily_rets["mean"]>0, color="forestgreen")
plt.title("Mean Daily Returns")
ax2.grid(False)
plt.xlim(1, 365)

unknown-1

Now that the show is over, its time to look at returns around elections. Up to 1936 the votes were cast in early january, from there on the vote has been in early november, so i used returns from 1936 onward in the calc. Also plotted the mean return of post-election year

daily_rets["el_year"] = daily_rets.loc[:, "1936"::4].mean(axis=1)
daily_rets["post_el"] = daily_rets.loc[:, "1937"::4].mean(axis=1)

f, ax = plt.subplots(figsize=(18, 12))

ax.plot(daily_rets.index, daily_rets.cumsum(), color="#333333", linewidth=1, alpha=0.06, label=None)
ax.plot(daily_rets["mean"].cumsum(), color="#333333", linewidth=2, alpha=0.8, label="Mean returns since 1896")
ax.plot(daily_rets["el_year"].cumsum(), color="darksage", linewidth=2, alpha=0.8, label="Election year mean returns since 1936")
ax.plot(daily_rets["post_el"].cumsum(), color="steelblue", linewidth=2, alpha=0.8, label="Post election year mean returns since 1936")
ax.plot(rets_df["pct"].dropna().cumsum(), linewidth=2, color="crimson", label ="2016 returns")
plt.grid(False)
plt.ylabel("Annual return")
plt.xlabel("Day of the year")
plt.ylim(-0.15, 0.15)
plt.xlim(1, 365)
plt.axhline(0, linewidth= 1, color="#333333", linestyle="--")
plt.legend(loc="upper left")

unknown-2

We can also pull up decade returns. 80’s and 90’s were good times indeed. Applied a 21 day mean to the returns to the trends would be more clear

def decadeMean(start, end):
return daily_rets.loc[:, start : end].cumsum().mean(axis=1)

decade_rets = pd.DataFrame({#"1900’s" : decadeMean("1900", "1909"),
#"1910’s" : decadeMean("1910", "1919"),
#"1920’s" : decadeMean("1920", "1929"),
#"1930’s" : decadeMean("1930", "1939"),
#"1940’s" : decadeMean("1940", "1949"),
#"1950’s" : decadeMean("1950", "1959"),
#"1960’s" : decadeMean("1960", "1969"),
"1970’s" : decadeMean("1970", "1979"),
"1980’s" : decadeMean("1980", "1989"),
"1990’s" : decadeMean("1990", "1999"),
"2000’s" : decadeMean("2000", "2009"),
"2010’s" : decadeMean("2010", "2015")
}, index= daily_rets.index)

mean_rets = decade_rets.rolling(21).mean()

plt.figure(figsize=(18, 12))
mean_rets.plot(linewidth=1)
rets_df["pct"].dropna().cumsum().rolling(21).mean().plot(color="crimson", linewidth=2, label="2016")
plt.legend(loc="upper left")
plt.ylabel("Annual return")
plt.xlabel("Day of the year")
plt.grid(False)

unknown-3

The most revealing thing about this to me, is that the day to day fluctuations havent really changed over 100+ years – market still behaves the same.

For example, if we randomly reshuffle the order of daily returns of 1910 and compare it to 2015 reshuffled daily returns, its impossible to say which one is which. The nature and behaviour of day to day fluctuations is still the same.

unknown-4

Regarding election

What i think is happening is that the Brexit vote is fresh on speculators minds, the polls were dead even before the vote but few actually believed the exits would win. Perhaps we are seeing something similar with us elections right now, meaning (almost) all predictions and polls point to Clinton (Stay), but a surprise like Trump (Exit) weighs on peoples minds.

Here are some of the good research and analysis done
FiveThirtyEight Elections
NY Times 2016 Election