I discover this great podcast so I wanted to include it in the tool. The problem is that when I try to download the mp3 file I get a "httpError 403: Forbidden". This puzzles me because the web browser can access to file with no problem.
I started wireshark to look into the requests. I couldn't see any significant difference. So after a few tries I discover the issue was the User Agent field of the header. The library was sending something like:
User-agent: Python-urllib/2.6 /r/n
So I decided to change it. This is the code that does the trick:
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers ={'User-agent': user_agent}
try:
req = urllib2.Request (url, headers=headers)
response = urllib2.urlopen (req)
except urllib2.URLError, e:
print ("Error: %(e)s with url: %(u)s" % {'e':e , 'u':url})
Now the question is why do they configure the server like that?
No hay comentarios:
Publicar un comentario