def stockCorrelate(ticker1, ticker2):
url1 = urllib.request.urlopen('http://ichart.finance.yahoo.com/table.csv?s=%s'%ticker1)
url2 = urllib.request.urlopen('http://ichart.finance.yahoo.com/table.csv?s=%s'%ticker2)
t1Data = url1.readlines()
t2Data = url2.readlines()
t1Data = [line.decode('ascii').split(',') for line in t1Data[1:] ]
t2Data = [line.decode('ascii').split(',') for line in t2Data[1:] ]
t1Close = []
t2Close = []
for i in range(min(len(t1Data), len(t2Data))):
if t1Data[i][0] == t2Data[i][0]:
t1Close.append(float(t1Data[i][4]))
t2Close.append(float(t2Data[i][4]))
print(len(t1Close), len(t2Close))
return correlation(t1Close, t2Close)
Wednesday, March 18, 2009
Chapter 5 -- Listing 5.6
Now that we understand how to decode bytes into strings we only need to make one small change to Listing 5.6
Subscribe to:
Post Comments (Atom)
I'm joining this conversation in the middle, but I have a few questions:
ReplyDelete(1) Why is this referred to as python 3.0?
Errors:
- print statement
- format statement
(2) Why is this code not formatted/color-coded?
Look at prettify.css/prettify.js in code.google.com for a quick way to format this code. Special bonus: code window scrolls to show code clipped on the right.
(3) Why is repeated code not put into a separate function?
The code to download quotes from yahoo and to split the data for correlation could be put into a single routine.
(4) Why the tlData[1:]?
When I run this in python 3.0, I get *2* lines that should be omitted: some sort of unicode plus the usual header of the columns.
Here is my python 3.0 code with my code changes (plus addition of a correlation function):
import urllib, urllib.request
def correlation(s1, s2) :
n = len(s1)
sums1, sums2 = sum(s1), sum(s2)
sumsq1, sumsq2 = sum(a*a for a in s1), sum(a*a for a in s2)
num = (n * sum(x*y for x,y in zip(s1, s2)) - (sums1 * sums2))
denom = (n * sumsq1 - sums1*sums1) * (n * sumsq2 - sums2*sums2)
return num / (denom ** 0.5)
def downloadStockData(ticker) :
urlPath = 'http://ichart.finance.yahoo.com/table.csv?s={0}'.format(ticker)
url = urllib.request.urlopen(urlPath)
return [line.decode('ascii').split(',') for line in url.readlines()[2:] ]
def stockCorrelate(ticker1, ticker2):
t1Data, t2Data = downloadStockData(ticker1), downloadStockData(ticker2)
minLen = min(len(t1Data), len(t2Data))
matching = [ i for i in range(minLen) if t1Data[i][0] == t2Data[i][0] ]
t1Close, t2Close = [ float(t1Data[i][4]) for i in matching ], [ float(t2Data[i][4]) for i in matching ]
print (len(t1Close), len(t2Close))
return correlation(t1Close, t2Close)
if __name__ == "__main__" :
x = stockCorrelate("IBM", "MSFT")
print(x)
Here is a blogger webpage that explains how to add code-coloring to blogger:
ReplyDeletehttp://sunday-lab.blogspot.com/2007/10/source-code-high-light-in-blogger.html
Do this:
(1) Add to the head of your template
<link href='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css' rel='stylesheet' type='text/css'/>
<script src='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js' type='text/javascript'/>
(2) Modify the body of your template
<body onload="prettify()">
(3) Write your code in a block like this:
<pre class="prettyprint" style="overflow:auto">
..,
</pre>
Note the style tag. This causes the addition of a horizontal scrollbar if necessary.
Sorry. For (2), I meant to say:
ReplyDelete(2) Modify the body of your template
<body onload='prettyPrint()'>