Category: Software Engineering

Coding, Python, web development, architecture, and deployment

  • Get Your ClickBank Transactions Into Sqlite With Python

    Clickbank is an amazing service that allows anyone to easily to either as a publisher create and sell information products or as an advertiser sell other peoples products for a commission. Clickbank handles the credit card transactions, and refunds while affiliates can earn as much as 90% of the price of the products as commission. It’s a pretty easy to use system and I have used it both as a publisher and as an affiliate to make significant amounts of money online.

    The script I have today is a Python program that uses Clickbank’s REST API to download the latest transactions for your affiliate IDs and stuffs the data into a database.

    The reason for doing this is that it keeps the data in your control and allows you to more easily see all of the transactions for all your accounts in one place without having to go to clickbank.com and log in to your accounts constantly. I’m going to be including this data in my Business Intelligence Dashboard Application

    One of the new things I did while writing this script was made use of SQLAlchemy to abstract the database. This means that it should be trivial to convert it over to use MySQL – just change the connection string.

    Also you should note that to use this script you’ll need to get the “Clerk API Key” and the “Developer API Key” from your Clickbank account. To generate those keys go to the Account Settings tab from the account dashboard. If you have more than one affiliate ID then you’ll need one Clerk API Key per affiliate ID.

    This is the biggest script I have shared on this site yet. I hope someone finds it useful.

    Here’s the code:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # (C) 2009 HalOtis Marketing
    # written by Matt Warren
    # https://www.mattwarren.co/
    
    import csv
    import httplib
    import logging
    
    from sqlalchemy import Table, Column, Integer, String, MetaData, Date, DateTime, Float
    from sqlalchemy.schema import UniqueConstraint
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy import create_engine
    from sqlalchemy.orm import sessionmaker
    
    LOG_FILENAME = 'ClickbankLoader.log'
    logging.basicConfig(filename=LOG_FILENAME,level=logging.DEBUG,filemode='w')
    
    #generate these keys in the Account Settings area of ClickBank when you log in.
    ACCOUNTS = [{'account':'YOUR_AFFILIATE_ID',  'API_key': 'YOUR_API_KEY' },]
    DEV_API_KEY = 'YOUR_DEV_KEY'
    
    CONNSTRING='sqlite:///clickbank_stats.sqlite'
    
    Base = declarative_base()
    class ClickBankList(Base):
        __tablename__ = 'clickbanklist'
        __table_args__ = (UniqueConstraint('date','receipt','item'),{})
    
        id                 = Column(Integer, primary_key=True)
        account            = Column(String)
        processedPayments  = Column(Integer)
        status             = Column(String)
        futurePayments     = Column(Integer)
        firstName          = Column(String)
        state              = Column(String)
        promo              = Column(String)
        country            = Column(String)
        receipt            = Column(String)
        pmtType            = Column(String)
        site               = Column(String)
        currency           = Column(String)
        item               = Column(String)
        amount             = Column(Float)
        txnType            = Column(String)
        affi               = Column(String)
        lastName           = Column(String)
        date               = Column(DateTime)
        rebillAmount       = Column(Float)
        nextPaymentDate    = Column(DateTime)
        email              = Column(String)
        
        format = '%Y-%m-%dT%H:%M:%S'
        
        def __init__(self, account, processedPayments, status, futurePayments, firstName, state, promo, country, receipt, pmtType, site, currency, item, amount , txnType, affi, lastName, date, rebillAmount, nextPaymentDate, email):
            self.account            = account
            if processedPayments != '':
            	self.processedPayments  = processedPayments
            self.status             = status
            if futurePayments != '':
                self.futurePayments     = futurePayments
            self.firstName          = firstName
            self.state              = state
            self.promo              = promo
            self.country            = country
            self.receipt            = receipt
            self.pmtType            = pmtType
            self.site               = site
            self.currency           = currency
            self.item               = item
            if amount != '':
            	self.amount             = amount 
            self.txnType            = txnType
            self.affi               = affi
            self.lastName           = lastName
            self.date               = datetime.strptime(date[:19], self.format)
            if rebillAmount != '':
            	self.rebillAmount       = rebillAmount
            if nextPaymentDate != '':
            	self.nextPaymentDate    = datetime.strptime(nextPaymentDate[:19], self.format)
            self.email              = email
    
        def __repr__(self):
            return "" % (self.account, self.date, self.receipt, self.item)
    
    def get_clickbank_list(API_key, DEV_key):
        conn = httplib.HTTPSConnection('api.clickbank.com')
        conn.putrequest('GET', '/rest/1.0/orders/list')
        conn.putheader("Accept", 'text/csv')
        conn.putheader("Authorization", DEV_key+':'+API_key)
        conn.endheaders()
        response = conn.getresponse()
        
        if response.status != 200:
            logging.error('HTTP error %s' % response)
            raise Exception(response)
        
        csv_data = response.read()
        
        return csv_data
    
    def load_clickbanklist(csv_data, account, dbconnection=CONNSTRING, echo=False):
        engine = create_engine(dbconnection, echo=echo)
    
        metadata = Base.metadata
        metadata.create_all(engine) 
    
        Session = sessionmaker(bind=engine)
        session = Session()
    
        data = csv.DictReader(iter(csv_data.split('\n')))
    
        for d in data:
            item = ClickBankList(account, **d)
            #check for duplicates before inserting
            checkitem = session.query(ClickBankList).filter_by(date=item.date, receipt=item.receipt, item=item.item).all()
        
            if not checkitem:
                logging.info('inserting new transaction %s' % item)
                session.add(item)
    
        session.commit()
        
    if  __name__=='__main__':
        try:
            for account in ACCOUNTS:
                csv_data = get_clickbank_list(account['API_key'], DEV_API_KEY)
                load_clickbanklist(csv_data, account['account'])
        except:
            logging.exception('Crashed')
    
  • Scrape Google Search Results Page

    Here’s a short script that will scrape the first 100 listings in the Google Organic results.

    You might want to use this to find the position of your sites and track their position for certain target keyword phrases over time. That could be a very good way to determine, for example, if your SEO efforts are working. Or you could use the list of URLs as a starting point for some other web crawling activity

    As the script is written it will just dump the list of URLs to a txt file.

    It uses the BeautifulSoup library to help with parsing the HTML page.

    Example Usage:

    $ python GoogleScrape.py
    $ cat links.txt
    
    Home
    https://www.mattwarren.co/2009/07/01/rss-twitter-bot-in-python/ http://www.blogcatalog.com/blogs/halotis.html http://www.blogcatalog.com/topic/sqlite/ http://ieeexplore.ieee.org/iel5/10358/32956/01543043.pdf?arnumber=1543043 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1543043 http://doi.ieeecomputersociety.org/10.1109/DATE.2001.915065 http://rapidlibrary.com/index.php?q=hal+otis http://www.tagza.com/Software/Video_tutorial_-_URL_re-directing_software-___HalOtis/ http://portal.acm.org/citation.cfm?id=367328 http://ag.arizona.edu/herbarium/db/get_taxon.php?id=20605&show_desc=1 http://www.plantsystematics.org/taxpage/0/genus/Halotis.html http://www.mattwarren.name/ http://www.mattwarren.name/2009/07/31/net-worth-update-3-5/ http://newweightlossdiet.com/privacy.php http://www.ingentaconnect.com/content/nisc/sajms/1988/00000006/00000001/art00002?crawler=true http://www.ingentaconnect.com/content/nisc/sajms/2000/00000022/00000001/art00013?crawler=true

    Click to access etm69yghjva13xlh.pdf

    Click to access b7fytc095bc57x59.pdf

    ...... $

    Here’s the script:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # (C) 2009 HalOtis Marketing
    # written by Matt Warren
    # https://www.mattwarren.co/
    
    import urllib,urllib2
    
    from BeautifulSoup import BeautifulSoup
    
    def google_grab(query):
    
        address = "http://www.google.com/search?q=%s&num=100&hl=en&start=0" % (urllib.quote_plus(query))
        request = urllib2.Request(address, None, {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'} )
        urlfile = urllib2.urlopen(request)
        page = urlfile.read(200000)
        urlfile.close()
        
        soup = BeautifulSoup(page)
        links =   [x['href'] for x in soup.findAll('a', attrs={'class':'l'})]
        
        return links
    
    if __name__=='__main__':
        # Example: Search written to file
        links = google_grab('halotis')
        open("links.txt","w+b").write("\n".join(links))
    
  • Google Page Rank Python Script

    This isn’t my script but I thought it would appeal to the reader of this blog.  It’s a script that  will lookup the Google Page Rank for any website and uses the same interface as the Google Toolbar to do it. I’d like to thank Fred Cirera for writing it and you can checkout his blog about this script here.

    I’m not exactly sure what I would use this for but it might have applications for anyone who wants to do some really advanced SEO work and find a real way to accomplish Page Rank sculpting. Perhaps finding the best websites to put links on.

    The reason it is such an involved bit of math is that it need to compute a checksum in order to work. It should be pretty reliable since it doesn’t involve and scraping.

    Example usage:

    $ python pagerank.py http://www.google.com/
    PageRank: 10	URL: http://www.google.com/
    
    $ python pagerank.py http://www.mozilla.org/
    PageRank: 9	URL: http://www.mozilla.org/
    
    $ python pagerank.py https://www.mattwarren.co/
    PageRange: 3   URL: https://www.mattwarren.co/
    

    And the script:

    #!/usr/bin/env python
    #
    #  Script for getting Google Page Rank of page
    #  Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
    #
    #  original from http://pagerank.gamesaga.net/
    #  this version was adapted from http://www.djangosnippets.org/snippets/221/
    #  by Corey Goldberg - 2010
    #
    #  Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
    
    
    
    import urllib
    
    
    def get_pagerank(url):
        hsh = check_hash(hash_url(url))
        gurl = 'http://www.google.com/search?client=navclient-auto&features=Rank:&q=info:%s&ch=%s' % (urllib.quote(url), hsh)
        try:
            f = urllib.urlopen(gurl)
            rank = f.read().strip()[9:]
        except Exception:
            rank = 'N/A'
        if rank == '':
            rank = '0'
        return rank
        
        
    def  int_str(string, integer, factor):
        for i in range(len(string)) :
            integer *= factor
            integer &= 0xFFFFFFFF
            integer += ord(string[i])
        return integer
    
    
    def hash_url(string):
        c1 = int_str(string, 0x1505, 0x21)
        c2 = int_str(string, 0, 0x1003F)
    
        c1 >>= 2
        c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
        c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
        c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)
    
        t1 = (c1 & 0x3C0) < < 4
        t1 |= c1 & 0x3C
        t1 = (t1 << 2) | (c2 & 0xF0F)
    
        t2 = (c1 & 0xFFFFC000) << 4
        t2 |= c1 & 0x3C00
        t2 = (t2 << 0xA) | (c2 & 0xF0F0000)
    
        return (t1 | t2)
    
    
    def check_hash(hash_int):
        hash_str = '%u' % (hash_int)
        flag = 0
        check_byte = 0
    
        i = len(hash_str) - 1
        while i >= 0:
            byte = int(hash_str[i])
            if 1 == (flag % 2):
                byte *= 2;
                byte = byte / 10 + byte % 10
            check_byte += byte
            flag += 1
            i -= 1
    
        check_byte %= 10
        if 0 != check_byte:
            check_byte = 10 - check_byte
            if 1 == flag % 2:
                if 1 == check_byte % 2:
                    check_byte += 9
                check_byte >>= 1
    
        return '7' + str(check_byte) + hash_str
    
    
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            url = 'http://www.google.com/'
        else:
            url = sys.argv[1]
    
        print get_pagerank(url)
    
  • Targeting Twitter Trends Script

    I noticed that several accounts are spamming the twitter trends. Go to twitter.com and select one of the trends in the right column. You’ll undoubtedly see some tweets that are blatantly inserting words from the trending topics list into unrelated ads.

    I was curious just how easy it would be to get the trending topics to target them with tweets. Turns out it is amazingly simple and shows off some of the beauty of Python.

    This script doesn’t actually do anything with the trend information. It just simply downloads and prints out the list. But combine this code with the sample code from
    RSS Twitter Bot in Python and you’ll have a recipe for some seriously powerful promotion.

    import simplejson  # http://undefined.org/python/#simplejson
    import urllib
    
    result = simplejson.load(urllib.urlopen('http://search.twitter.com/trends.json'))
    
    print [trend['name'] for trend in result['trends']]