Rodolfo technical: Python's urllib2 and podomatic

I'm creating for myself a small tool to download some podcasts. To do so, I use python and the urllib2 library. Everything went well with a number of sites until yesterday.

I discover this great podcast so I wanted to include it in the tool. The problem is that when I try to download the mp3 file I get a "httpError 403: Forbidden". This puzzles me because the web browser can access to file with no problem.

I started wireshark to look into the requests. I couldn't see any significant difference. So after a few tries I discover the issue was the User Agent field of the header. The library was sending something like:


User-agent: Python-urllib/2.6 /r/n

So I decided to change it. This is the code that does the trick:

 
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        headers ={'User-agent': user_agent}
        try:
            req = urllib2.Request (url, headers=headers)
            response = urllib2.urlopen (req)
        except urllib2.URLError, e:
            print ("Error: %(e)s with url: %(u)s" % {'e':e , 'u':url})

Now the question is why do they configure the server like that?

Rodolfo technical

domingo, 22 de enero de 2012

Python's urllib2 and podomatic

No hay comentarios:

Publicar un comentario

Archivo del blog