Python – urllib2.HTTPError: HTTP Error 404: Not Found for valid url


I'm using a python opengraph library to parse a website's opengraph tags

import opengraph
url = ''
og = opengraph.OpenGraph(url=url)
print og.to_json()

When I run this script I get the following error

Traceback (most recent call last):
  File "", line 16, in <module>
    raw = urllib2.urlopen(url)
  File "/usr/lib/python2.7/", line 127, in urlopen
    return, data, timeout)
  File "/usr/lib/python2.7/", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

urllib2 is used under the hood to grab the html before it is parsed

Why am I receiving this 404 error? I can access this url from my browser and also retrieve the open graph tags for this url using this php library

The python library is able to retrieve the open graph tags for all other urls but this url seems to be an anomaly.

Best Solution


You get a 404 response because your request hasn't passed a user-agent. Just installed opengraph on virtualenv to test it, it works after adding missing user-agent in header:

url = ''
req = opengraph.opengraph.urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
og = opengraph.OpenGraph()

'{"site_name": "Fox News", "description": "Registered gun owners in the United Kingdom are now subject to unannounced visits to their homes under new guidance that allows police to inspect firearms storage without a warrant.", "title": "UK gun owners now subject to warrantless home searches", "url": "", "image": "", "scrape": false, "_url": null, "type": "article"}'
