Wednesday, March 18, 2009

Chapter 5 -- Listing 5.6

Now that we understand how to decode bytes into strings we only need to make one small change to Listing 5.6


def stockCorrelate(ticker1, ticker2):
url1 = urllib.request.urlopen('http://ichart.finance.yahoo.com/table.csv?s=%s'%ticker1)
url2 = urllib.request.urlopen('http://ichart.finance.yahoo.com/table.csv?s=%s'%ticker2)
t1Data = url1.readlines()
t2Data = url2.readlines()
t1Data = [line.decode('ascii').split(',') for line in t1Data[1:] ]
t2Data = [line.decode('ascii').split(',') for line in t2Data[1:] ]
t1Close = []
t2Close = []
for i in range(min(len(t1Data), len(t2Data))):
if t1Data[i][0] == t2Data[i][0]:
t1Close.append(float(t1Data[i][4]))
t2Close.append(float(t2Data[i][4]))

print(len(t1Close), len(t2Close))
return correlation(t1Close, t2Close)

3 comments:

  1. I'm joining this conversation in the middle, but I have a few questions:
    (1) Why is this referred to as python 3.0?
    Errors:
    - print statement
    - format statement
    (2) Why is this code not formatted/color-coded?
    Look at prettify.css/prettify.js in code.google.com for a quick way to format this code. Special bonus: code window scrolls to show code clipped on the right.
    (3) Why is repeated code not put into a separate function?
    The code to download quotes from yahoo and to split the data for correlation could be put into a single routine.
    (4) Why the tlData[1:]?
    When I run this in python 3.0, I get *2* lines that should be omitted: some sort of unicode plus the usual header of the columns.

    Here is my python 3.0 code with my code changes (plus addition of a correlation function):

    import urllib, urllib.request

    def correlation(s1, s2) :
    n = len(s1)
    sums1, sums2 = sum(s1), sum(s2)
    sumsq1, sumsq2 = sum(a*a for a in s1), sum(a*a for a in s2)
    num = (n * sum(x*y for x,y in zip(s1, s2)) - (sums1 * sums2))
    denom = (n * sumsq1 - sums1*sums1) * (n * sumsq2 - sums2*sums2)
    return num / (denom ** 0.5)

    def downloadStockData(ticker) :
    urlPath = 'http://ichart.finance.yahoo.com/table.csv?s={0}'.format(ticker)
    url = urllib.request.urlopen(urlPath)
    return [line.decode('ascii').split(',') for line in url.readlines()[2:] ]

    def stockCorrelate(ticker1, ticker2):
    t1Data, t2Data = downloadStockData(ticker1), downloadStockData(ticker2)
    minLen = min(len(t1Data), len(t2Data))
    matching = [ i for i in range(minLen) if t1Data[i][0] == t2Data[i][0] ]
    t1Close, t2Close = [ float(t1Data[i][4]) for i in matching ], [ float(t2Data[i][4]) for i in matching ]

    print (len(t1Close), len(t2Close))
    return correlation(t1Close, t2Close)

    if __name__ == "__main__" :
    x = stockCorrelate("IBM", "MSFT")
    print(x)

    ReplyDelete
  2. Here is a blogger webpage that explains how to add code-coloring to blogger:

    http://sunday-lab.blogspot.com/2007/10/source-code-high-light-in-blogger.html

    Do this:
    (1) Add to the head of your template
    <link href='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css' rel='stylesheet' type='text/css'/>
    <script src='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js' type='text/javascript'/>

    (2) Modify the body of your template
    <body onload="prettify()">

    (3) Write your code in a block like this:
    <pre class="prettyprint" style="overflow:auto">
    ..,
    </pre>
    Note the style tag. This causes the addition of a horizontal scrollbar if necessary.

    ReplyDelete
  3. Sorry. For (2), I meant to say:

    (2) Modify the body of your template
    <body onload='prettyPrint()'>

    ReplyDelete