API’s are an extremely useful tool, and I have not been programming for very long so I wanted to learn how to use one. While I was an intern at Ashland Inc., I became friends with an intern in another department who was talking to me one day about the Reddit API. Reddit’s API is extremely accessible and the intern discussed how he used Python to access the JSON and download Wallpapers for his computer and I thought the project sounded fun and useful so let’s look at my Python script to download the top Wallpapers from Reddit:

import json
from urllib import request
Link = request.urlopen('http://www.reddit.com/r/wallpapers/top/.json?sort=top&t=day')
save = Link.read()
linkTranslation = json.loads(save.decode('utf-8'))
for i in linkTranslation["data"]["children"]:
	pic =(i["data"]["url"])
	if(pic.endswith(".png") or pic.endswith(".jpg")):
    	if i["data"]["thumbnail"] != 'default':
        	name = i["data"]["thumbnail"].split(".com/")[1]
        	wallPaper = request.urlretrieve(pic, "E:/Wallpapers/" + name)

The first important thing to notice is the two imports, Python has library support for JSON and opening URL’s, urllib will allow us to get the contents of the URL, the URL we open will give us JSON API data, which adheres to REST API standards. It is important to note here that Reddit controls its api, so too many calls to its system are not allowed, if there are too many calls the Python will start failing, so it only saves so many pictures. So, let’s look at the following two lines of code:

Link = request.urlopen('http://www.reddit.com/r/wallpapers/top/.json?sort=top&t=day')
save = Link.read()

Link has us open the link, so we have the object but what we want is the contents, so in save we use the read function in order to dump the contents into that variable. At this point what we have is a lot of data formatted by JSON standards, so we have to translate it. Note: The URL we came up with is based off the Reddit API, if you add .json? at the end of a page before queries, it will give you the JSON for that page, so we find the page we want to use and manipulate the URL, in this case we went to r/wallpapers because we are downloading wallapers, so let’s visit the URL and take a look at what that data looks like:

"kind": "Listing",
  "data": {
    "modhash": "hzi7s7igg028883209a155c5e5c1bed2779e4e6ffa03ea03fe",
    "children": [
      {
        "kind": "t3",
        "data": {
          "domain": "i.imgur.com",
          "banned_by": null,
          "media_embed": {
            
          },
          "subreddit": "wallpapers",
          "selftext_html": null,
          "selftext": "",
          "likes": null,
          "user_reports": [
            
          ],
          "secure_media": null,
          "link_flair_text": null,
          "id": "2kaggv",
          "gilded": 0,
          "secure_media_embed": {
            
          },
          "clicked": false,
          "report_reasons": null,
          "author": "B-owl",
          "media": null,
          "score": 1114,
          "approved_by": null,
          "over_18": false,
          "hidden": false,
          "thumbnail": "http://b.thumbs.redditmedia.com/yMsfyx-AouZdpGMiu-6avA2Hxrd2XaN1ggVuxPbi5HU.jpg",
          "subreddit_id": "t5_2qhw4",
          "edited": false,
          "link_flair_css_class": null,
          "author_flair_css_class": null,
          "downs": 0,
          "mod_reports": [
            
          ],
          "saved": false,
          "is_self": false,
          "name": "t3_2kaggv",
          "permalink": "/r/wallpapers/comments/2kaggv/pretty_redhead_who_is_she/",
          "stickied": false,
          "created": 1414278602.0,
          "url": "http://i.imgur.com/p4xCDKM.jpg",
          "author_flair_text": null,
          "title": "Pretty Redhead. Who is she?",
          "created_utc": 1414249802.0,
          "ups": 1114,
          "num_comments": 117,
          "visited": false,
          "num_reports": null,
          "distinguished": null
        }
      },

The first thing to notice is the dented structure, from this we can almost guess what might be the initial idea of the data structure, which is that “kind” is an array of arrays, and it has the subtype data which has a modhash and a children array. Upon further inspection, we notice thumbnail and URL in children, but it falls under data category, these are potentially what we need to find the Wallpapers and save them.

linkTranslation = json.loads(save.decode('utf-8'))
for i in linkTranslation["data"]["children"]:
	pic =(i["data"]["url"])
	if(pic.endswith(".png") or pic.endswith(".jpg")):
    	if i["data"]["thumbnail"] != 'default':
        	name = i["data"]["thumbnail"].split(".com/")[1]
        	wallPaper = request.urlretrieve(pic, "E:/Wallpapers/" + name)

Stepping through this, our Json function takes in the contents of save, makes sure they are in utf-8 formatting and converts them to a series of arrays as discussed previous, so then we need to iterate through the array and pull out each URL, we do this in Python with an implied for loop, and then we find the picture we want. The picture is located at [“data”][“url”], which we then use a simple if statement to make sure it is a format we want(this is avoid saving a webpage instead of just a picture with a png or jpg extension). The final part is saving the picture with an appropriate name. For our purposes, we went with the most simple route, we want a word that doesn’t have spaces (This is why we can’t simply use title, and we want).

if i["data"]["thumbnail"] != 'default':
    name = i["data"]["thumbnail"].split(".com/")[1]
    wallPaper = request.urlretrieve(pic, "E:/Wallpapers/" + name)

I use the thumbnail, which can either have a url value or a default value (This is from experience of looking at the API’s, I could be incorrect but let us assume for this case we are right). So, we call the split function on this value, which will return two values in an array, we reference the second but it is best to see what this returns, so for the value http://b.thumbs.redditmedia.com/yMsfyx-AouZdpGMiu-6avA2Hxrd2XaN1ggVuxPbi5HU.jpg the split function will return http://b.thumbs.redditmedia and yMsfyx-AouZdpGMiu-6avA2Hxrd2XaN1ggVuxPbi5HU.jpg We can reference these by adding [0] or [1] because it is returning an array of strings, we use [1] for the second element and we save that as the name. The last statement saves the url which should either be a png or jpg and saves it in the directory E:/Wallpapers/ and saves it with the name we retrieved in the last statement. This is a rather small amount of code to accomplish a big task, and further work could be done to check for resolutions and to make sure we haven’t had the same picture before, these are both projects I hope to come from this code in the future.



blog comments powered by Disqus

Published

18 October 2014

Tags