domingo, 22 de enero de 2012

Python's urllib2 and podomatic

I'm creating for myself a small tool to download some podcasts. To do so, I use python and the urllib2 library. Everything went well with a number of sites until yesterday.

I discover this great podcast so I wanted to include it in the tool. The problem is that when I try to download the mp3 file I get a "httpError 403: Forbidden". This puzzles me because the web browser can access to file with no problem.

I started wireshark to look into the requests. I couldn't see any significant difference. So after a few tries I discover the issue was the User Agent field of the header. The library was sending something like:

User-agent: Python-urllib/2.6 /r/n

So I decided to change it. This is the code that does the trick:
 
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers ={'User-agent': user_agent}
try:
req = urllib2.Request (url, headers=headers)
response = urllib2.urlopen (req)
except urllib2.URLError, e:
print ("Error: %(e)s with url: %(u)s" % {'e':e , 'u':url})

Now the question is why do they configure the server like that?

No hay comentarios:

Publicar un comentario