Tag: script

  • Amazon Product Advertising API From Python

    Product Advertising APIAmazon has a very comprehensive associate program that allows you to promote just about anything imaginable for any niche and earn commission for anything you refer. The size of the catalog is what makes Amazon such a great program. People make some good money promoting Amazon products.

    There is a great Python library out there for accessing the other Amazon web services such as S3, and EC2 called boto. However it doesn’t support the Product Advertising API.

    With the Product Advertising API you have access to everything that you can read on the Amazon site about each product. This includes the product description, images, editor reviews, customer reviews and ratings. This is a lot of great information that you could easily find a good use for with your websites.

    So how do you get at this information from within a Python program? Well the complicated part is dealing with the authentication that Amazon has put in place. To make that a bit easier I used the connection component from boto.

    Here’s a demonstration snippet of code that will print out the top 10 best selling books on Amazon right now.

    Example Usage:

    $ python AmazonSample.py
    Glenn Becks Common Sense: The Case Against an Out-of-Control Government, Inspired by Thomas Paine by Glenn Beck
    Culture of Corruption: Obama and His Team of Tax Cheats, Crooks, and Cronies by Michelle Malkin
    The Angel Experiment (Maximum Ride, Book 1) by James Patterson
    The Time Travelers Wife by Audrey Niffenegger
    The Help by Kathryn Stockett
    South of Broad by Pat Conroy
    Paranoia by Joseph Finder
    The Girl Who Played with Fire by Stieg Larsson
    The Shack [With Headphones] (Playaway Adult Nonfiction) by William P. Young
    The Girl with the Dragon Tattoo by Stieg Larsson
    

    To use this code you’ll need an Amazon associate account and fill out the keys and tag needed for authentication.

    Product Advertising API Python code:

    #!/usr/bin/env python
    # encoding: utf-8
    """
    AmazonExample.py
    
    Created by Matt Warren on 2009-08-17.
    Copyright (c) 2009 HalOtis.com. All rights reserved.
    """
    
    import urllib
    try:
        from xml.etree import ET
    except ImportError:
        from elementtree import ET
        
    from boto.connection import AWSQueryConnection
    
    AWS_ACCESS_KEY_ID = 'YOUR ACCESS KEY'
    AWS_ASSOCIATE_TAG = 'YOUR TAG'
    AWS_SECRET_ACCESS_KEY = 'YOUR SECRET KEY'
    
    def amazon_top_for_category(browseNodeId):
        aws_conn = AWSQueryConnection(
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY, is_secure=False,
            host='ecs.amazonaws.com')
        aws_conn.SignatureVersion = '2'
        params = dict(
            Service='AWSECommerceService',
            Version='2009-07-01',
            SignatureVersion=aws_conn.SignatureVersion,
            AWSAccessKeyId=AWS_ACCESS_KEY_ID,
            AssociateTag=AWS_ASSOCIATE_TAG,
            Operation='ItemSearch',
            BrowseNode=browseNodeId,
            SearchIndex='Books',
            ResponseGroup='ItemAttributes,EditorialReview',
            Order='salesrank',
            Timestamp=time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime()))
        verb = 'GET'
        path = '/onca/xml'
        qs, signature = aws_conn.get_signature(params, verb, path)
        qs = path + '?' + qs + '&Signature=' + urllib.quote(signature)
        response = aws_conn._mexe(verb, qs, None, headers={})
        tree = ET.fromstring(response.read())
        
        NS = tree.tag.split('}')[0][1:]
    
        for item in tree.find('{%s}Items'%NS).findall('{%s}Item'%NS):
            title = item.find('{%s}ItemAttributes'%NS).find('{%s}Title'%NS).text
            author = item.find('{%s}ItemAttributes'%NS).find('{%s}Author'%NS).text
            print title, 'by', author
    
    if __name__ == '__main__':
        amazon_top_for_category(1000) #Amazon category number for US Books
    
  • Scrape Google Search Results Page

    1_google_logoHere’s a short script that will scrape the first 100 listings in the Google Organic results.

    You might want to use this to find the position of your sites and track their position for certain target keyword phrases over time. That could be a very good way to determine, for example, if your SEO efforts are working. Or you could use the list of URLs as a starting point for some other web crawling activity

    As the script is written it will just dump the list of URLs to a txt file.

    It uses the BeautifulSoup library to help with parsing the HTML page.

    Example Usage:

    $ python GoogleScrape.py
    $ cat links.txt
    http://www.halotis.com/
    http://www.halotis.com/2009/07/01/rss-twitter-bot-in-python/
    http://www.blogcatalog.com/blogs/halotis.html
    http://www.blogcatalog.com/topic/sqlite/
    http://ieeexplore.ieee.org/iel5/10358/32956/01543043.pdf?arnumber=1543043
    http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1543043
    http://doi.ieeecomputersociety.org/10.1109/DATE.2001.915065
    http://rapidlibrary.com/index.php?q=hal+otis
    http://www.tagza.com/Software/Video_tutorial_-_URL_re-directing_software-___HalOtis/
    http://portal.acm.org/citation.cfm?id=367328
    http://ag.arizona.edu/herbarium/db/get_taxon.php?id=20605&show_desc=1
    http://www.plantsystematics.org/taxpage/0/genus/Halotis.html
    http://www.mattwarren.name/
    http://www.mattwarren.name/2009/07/31/net-worth-update-3-5/
    http://newweightlossdiet.com/privacy.php
    http://www.ingentaconnect.com/content/nisc/sajms/1988/00000006/00000001/art00002?crawler=true
    http://www.ingentaconnect.com/content/nisc/sajms/2000/00000022/00000001/art00013?crawler=true
    

    Click to access etm69yghjva13xlh.pdf

    Click to access b7fytc095bc57x59.pdf

    ...... $

    Here’s the script:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # (C) 2009 HalOtis Marketing
    # written by Matt Warren
    # http://halotis.com/
    
    import urllib,urllib2
    
    from BeautifulSoup import BeautifulSoup
    
    def google_grab(query):
    
        address = "http://www.google.com/search?q=%s&num=100&hl=en&start=0" % (urllib.quote_plus(query))
        request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
        urlfile = urllib2.urlopen(request)
        page = urlfile.read(200000)
        urlfile.close()
        
        soup = BeautifulSoup(page)
        links =   [x['href'] for x in soup.findAll('a', attrs={'class':'l'})]
        
        return links
    
    if __name__=='__main__':
        # Example: Search written to file
        links = google_grab('halotis')
        open("links.txt","w+b").write("\n".join(links))
    
  • Google Page Rank Python Script

    This isn’t my script but I thought it would appeal to the reader of this blog.  It’s a script that  will lookup the Google Page Rank for any website and uses the same interface as the Google Toolbar to do it. I’d like to thank Fred Cirera for writing it and you can checkout his blog about this script here.

    I’m not exactly sure what I would use this for but it might have applications for anyone who wants to do some really advanced SEO work and find a real way to accomplish Page Rank sculpting. Perhaps finding the best websites to put links on.

    The reason it is such an involved bit of math is that it need to compute a checksum in order to work. It should be pretty reliable since it doesn’t involve and scraping.

    Example usage:

    $ python pagerank.py http://www.google.com/
    PageRank: 10	URL: http://www.google.com/
    
    $ python pagerank.py http://www.mozilla.org/
    PageRank: 9	URL: http://www.mozilla.org/
    
    $ python pagerank.py http://halotis.com
    PageRange: 3   URL: http://www.halotis.com/
    

    And the script:

    #!/usr/bin/env python
    #
    #  Script for getting Google Page Rank of page
    #  Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
    #
    #  original from http://pagerank.gamesaga.net/
    #  this version was adapted from http://www.djangosnippets.org/snippets/221/
    #  by Corey Goldberg - 2010
    #
    #  Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
    
    
    
    import urllib
    
    
    def get_pagerank(url):
        hsh = check_hash(hash_url(url))
        gurl = 'http://www.google.com/search?client=navclient-auto&features=Rank:&q=info:%s&ch=%s' % (urllib.quote(url), hsh)
        try:
            f = urllib.urlopen(gurl)
            rank = f.read().strip()[9:]
        except Exception:
            rank = 'N/A'
        if rank == '':
            rank = '0'
        return rank
        
        
    def  int_str(string, integer, factor):
        for i in range(len(string)) :
            integer *= factor
            integer &= 0xFFFFFFFF
            integer += ord(string[i])
        return integer
    
    
    def hash_url(string):
        c1 = int_str(string, 0x1505, 0x21)
        c2 = int_str(string, 0, 0x1003F)
    
        c1 >>= 2
        c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
        c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
        c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)
    
        t1 = (c1 & 0x3C0) < < 4
        t1 |= c1 & 0x3C
        t1 = (t1 << 2) | (c2 & 0xF0F)
    
        t2 = (c1 & 0xFFFFC000) << 4
        t2 |= c1 & 0x3C00
        t2 = (t2 << 0xA) | (c2 & 0xF0F0000)
    
        return (t1 | t2)
    
    
    def check_hash(hash_int):
        hash_str = '%u' % (hash_int)
        flag = 0
        check_byte = 0
    
        i = len(hash_str) - 1
        while i >= 0:
            byte = int(hash_str[i])
            if 1 == (flag % 2):
                byte *= 2;
                byte = byte / 10 + byte % 10
            check_byte += byte
            flag += 1
            i -= 1
    
        check_byte %= 10
        if 0 != check_byte:
            check_byte = 10 - check_byte
            if 1 == flag % 2:
                if 1 == check_byte % 2:
                    check_byte += 9
                check_byte >>= 1
    
        return '7' + str(check_byte) + hash_str
    
    
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            url = 'http://www.google.com/'
        else:
            url = sys.argv[1]
    
        print get_pagerank(url)
    
  • Targeting Twitter Trends Script

    I noticed that several accounts are spamming the twitter trends. Go to twitter.com and select one of the trends in the right column. You’ll undoubtedly see some tweets that are blatantly inserting words from the trending topics list into unrelated ads.

    I was curious just how easy it would be to get the trending topics to target them with tweets. Turns out it is amazingly simple and shows off some of the beauty of Python.

    This script doesn’t actually do anything with the trend information. It just simply downloads and prints out the list. But combine this code with the sample code from
    RSS Twitter Bot in Python and you’ll have a recipe for some seriously powerful promotion.

    import simplejson  # http://undefined.org/python/#simplejson
    import urllib
    
    result = simplejson.load(urllib.urlopen('http://search.twitter.com/trends.json'))
    
    print [trend['name'] for trend in result['trends']]