0

I recently began learning Python, but rather with a complex project I had already started in Excel. I have used different guides for the code I have used so far, tweaked to my needs.

I am using 'yfinance' to gather data for multiple cryptocurrencies in a specific time period from Yahoo! Finance. Also, 'stats models' to obtain alpha, beta and r squared using a DataFrame created with all cryptocurrencies and an additional column with the mkt. return (x variable).

I am having the following error: ValueError: endog and exog matrices are different sizes. I saw another question/answer regarding this error, but it did not seem to relate to my issue.

The error takes place in line 87 [model = sm.OLS(Y2,X_)] of the following code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime

from pandas_datareader import data as pdr
import yfinance as yf

yf.pdr_override()

df1 = pdr.get_data_yahoo("BTC-USD", start="2015-01-01", end="2020-01-01")
df2 = pdr.get_data_yahoo("ETH-USD", start="2015-01-01", end="2020-01-01")
df3 = pdr.get_data_yahoo("XRP-USD", start="2015-01-01", end="2020-01-01")
df4 = pdr.get_data_yahoo("BCH-USD", start="2015-01-01", end="2020-01-01")
df5 = pdr.get_data_yahoo("USDT-USD", start="2015-01-01", end="2020-01-01")
df6 = pdr.get_data_yahoo("BSV-USD", start="2015-01-01", end="2020-01-01")
df7 = pdr.get_data_yahoo("LTC-USD", start="2015-01-01", end="2020-01-01")
df8 = pdr.get_data_yahoo("BNB-USD", start="2015-01-01", end="2020-01-01")
df9 = pdr.get_data_yahoo("EOS-USD", start="2015-01-01", end="2020-01-01")
df10 = pdr.get_data_yahoo("LINK-USD", start="2015-01-01", end="2020-01-01")
df11 = pdr.get_data_yahoo("XMR-USD", start="2015-01-01", end="2020-01-01")
df12 = pdr.get_data_yahoo("BTG-USD", start="2015-01-01", end="2020-01-01")

return_btc = df1.Close.pct_change()[1:]
return_eth = df2.Close.pct_change()[1:]
return_xrp = df3.Close.pct_change()[1:]
return_bch = df4.Close.pct_change()[1:]
return_usdt = df5.Close.pct_change()[1:]
return_bsv = df6.Close.pct_change()[1:]
return_ltc = df7.Close.pct_change()[1:]
return_bnb = df8.Close.pct_change()[1:]
return_eos = df9.Close.pct_change()[1:]
return_link = df10.Close.pct_change()[1:]
return_xmr = df11.Close.pct_change()[1:]
return_btg = df12.Close.pct_change()[1:]

d = {"BTC Return":return_btc, "ETH Return":return_eth, "XRP Return":return_xrp, "BCH Return":return_bch, 
"USDT Return":return_usdt, "BSV Return":return_bsv, "LTC Return":return_ltc, "BNB Return":return_bnb, 
"EOS Return":return_eos, "LINK Return":return_link, "XMR Return":return_xmr, "BTG Return":return_btg}

df = pd.DataFrame(d) # new data frame with all returns data

df = pd.DataFrame(d, columns=["Date", "BTC Return", "ETH Return", "XRP Return", "BCH Return", "USDT Return", "BSV Return", 
"LTC Return", "BNB Return", "EOS Return", "LINK Return", "XMR Return", "BTG Return"])

avg_row = df.mean(axis=1)
return_mkt = avg_row

d1 = {"BTC Return":return_btc, "ETH Return":return_eth, "XRP Return":return_xrp, "BCH Return":return_bch, 
"USDT Return":return_usdt, "BSV Return":return_bsv, "LTC Return":return_ltc, "BNB Return":return_bnb, 
"EOS Return":return_eos, "LINK Return":return_link, "XMR Return":return_xmr, "BTG Return":return_btg, "MKT Return":return_mkt}
df = pd.DataFrame(d1)
print(df)

import statsmodels.api as sm
from statsmodels import regression

X = return_mkt.values
Y1 = return_btc
Y2 = return_eth
#Y3 = return_xrp

def linreg(x,y):
    x = sm.add_constant(x)
    model = regression.linear_model.OLS(y,x).fit()

    # we are removing the constant
    x = x[:, 1]
    return model.params[0], model.params[1]

X_ = sm.add_constant(X) # artificially add intercept to x, as advised in the docs
model = sm.OLS(Y1,X_)
results = model.fit()
rsquared = results.rsquared

alpha, beta = linreg(X,Y1)

def linreg(x,y):
    x = sm.add_constant(x)
    model = regression.linear_model.OLS(y,x).fit()

    # we are removing the constant
    x = x[:, 1]
    return model.params[0], model.params[1]

X_ = sm.add_constant(X) # artificially add intercept to x, as advised in the docs
model = sm.OLS(Y2,X_)
results = model.fit()
rsquared = results.rsquared

alpha, beta = linreg(X,Y2)

The error is located in the second def, as I am trying to compute the previously mentioned statistics for each cryptocurrency. Thus, the 1st def is for BTC (Y1), the 2nd def is for ETH (Y2), and so on (Y3,...).

The entire code was fine when I had only the function for BTC at the end, the error occurred when I tried to add more of the same function for the others.

New contributor
David A. is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
0

Fundamentally, the problem is that because Ethereum (and all other cryptos) started later than bitcoin, there are null values for the price every day for the first few years, which can't be handled. So you have to take just the values where they are not null.

However, there are many things in your code which you could factor out so that you don't repeat yourself unnecessarily. You made an attempt at that with the linreg function, but then you re-defined it for the second crypto, which shouldn't be necessary.

Here is a quick re-write which addresses both the fundamental problem and hopefully illustrates what I mean above. The output is a dataframe with the statistics you're looking for, by cryptocurrency. The goal is to write as much of the code 'generically', and then just provide a list of cryptos that you are interested in.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas_datareader import data as pdr
import datetime
import yfinance as yf
import statsmodels.api as sm
from statsmodels import regression

yf.pdr_override()

cryptos = ["BTC", "ETH", "XRP"]  # Here you can specify the cryptos you want. I just used 3 for demonstration
                                 # The rest of the code is not specific to any one crypto

def get_and_process_data(c):
    raw_data = pdr.get_data_yahoo(c + '-USD', start="2015-01-01", end="2020-01-01")
    return raw_data.Close.pct_change()[1:]

df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})


df['avg_return'] = df.mean(axis=1) # avg market return
print(df)

def model(x, y):
    # Calculate r-squared
    X = sm.add_constant(x) # artificially add intercept to x, as advised in the docs
    model = sm.OLS(y,X).fit()
    rsquared = model.rsquared
    
    # Fit linear regression and calculate alpha and beta
    X = sm.add_constant(x)
    model = regression.linear_model.OLS(y,X).fit()
    alpha = model.params[0]
    beta = model.params[1]

    return rsquared, alpha, beta


results = pd.DataFrame({c: model(df[df[c].notnull()]['avg_return'], df[df[c].notnull()][c]) for c in cryptos}).transpose()
results.columns = ['rsquared', 'alpha', 'beta']
print(results)
| improve this answer | |
  • With the re-write code I get an error for an if/elif: TypeError: unsupported format string passed to Series.__format__ crypto = input("Cryptocurrency: ") if crypto == "Bitcoin" or "BTC": print("Alpha/Performance: " + str(results['alpha']) + " (" + "{:.2%}".format(results['alpha']) + ")") print("BTC Beta: " + str(results['beta']) + " (" + str(round(results['beta'],2)) + ")") print("R squared: " + str(results['rsquared']) + " (" + "{:.0%}".format(results['rsquared']) + ")") elif crypto == "Ethereum" or "Ether" or "ETH": – David A. yesterday

Your Answer

David A. is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.