Wednesday, March 18, 2009

Chapter 5 -- Session 5.7

One big change in Python 3.0 that we did not notice until after the book had gone to press was a change to the module for reading from the Internet. The urllib module was changed substantially.

As you see from the session below the urlopen function is no longer a part of urllib, it is now a part of urllib.request.


>>> import urllib
>>> page = urllib.request.urlopen('http://www.cs.luther.edu/python/test.html')
Traceback (most recent call last):
File "", line 1, in
page = urllib.request.urlopen('http://www.cs.luther.edu/python/test.html')
AttributeError: 'module' object has no attribute 'request'
>>> import urllib.request
>>> page = urllib.request.urlopen('http://www.cs.luther.edu/python/test.html')
>>> pageText = page.read()
>>> pageText
b'\n\n\n\t\n\tTest Page\n\t\n\t\n\t\n\n\n

Hello Python Programmer!

\n

This is a test page for the urllib2 module program

\n\n\n'
>>> type(pageText)




If simply moving the urlopen function to urllib.request was the only change that would not have been too bad. The more difficult change is the very subtle addition of the b before the quotes in the pageText string. In fact you can see that the variable pageText refers to something that is called bytes.

The good news is that bytes objects act very similarly to strings. The bad news is that you cannot simply mix and match strings with bytes.

The session below illustrates the difficulty:


>>> 'foo' + b'bar'
Traceback (most recent call last):
File "", line 1, in
'foo' + b'bar'
TypeError: Can't convert 'bytes' object to str implicitly


We will work through these differences in subsequent posts about the rest of chapter 5.

No comments:

Post a Comment