Author: Matt

  • Bot Trading Final Thoughts

    My software project for February was a stock trading bot that made trades based on twitter feeds. It was an interesting project that gave me a chance to learn a few new things.

    I had the chance to apply a couple of machine learning techniques.  In particular using natural language processing to perform sentiment analysis and named entity recognition.  There is still a lot to improve on with the tools available in these areas but I was left with the impression that we are in the midst of a big shift with how AI algorithms will be applied to real world applications.

    To give a more concrete example of that; I started working on the named entity recognition by trying to apply  the basic tooling that comes with the NLTK library in Python, this proved far too complex to complete without completing a computational linguistics course first.  My next attempt was to use the Stanford NER model which is the current best approach to the problem, however, out of the box it lacks the training to be useful, was still overly complex to work with, and gave bad results in some simple test cases.  The final approach that I took was to use the Google natural language APIs which was brilliant by comparison. Google is able to tie their entity matching to their knowledge graph and as a result something like ‘apple had a great 3rd quarter’ can identify ‘apple’ as the company which links to their wikipedia page, the names of executives and every bit of information google has about the company.

    Admittedly, working with the knowledge graph is complicated, but the ability to pull things together in that way is stunning.

    I think that as more of these AI algorithms become exposed as trained Software as a Service APIs, you’ll start to see more regular developers being able to embed them into the applications that we all use everyday.

    The leverage provided by software and it’s distribution model allows for disruptions to happen very quickly.  The current bottleneck in my opinion is that there is a lack of people with the required skills to build these things from scratch.  Regular software shops don’t have the vast training data needed to make smart machine learning based algorithms, nor do they have the in-house expertise to apply the latest developments in Deep Learning.  With Google, Microsoft, Amazon and Baidu all announcing machine learning APIs usable by junior developers we will start to see a lot more intelligence in the software we use everyday.  And as more people become aware of what is and is not possible the application of these techniques could explode.

    The future will be interesting.

  • Revisiting Adsense

    I have been writing for my blogs for over 15 years now.  A lot has changed with the internet in that time, and over the years I have investigated many ways to let my websites pay for themselves.

    When Adsense was first announced I jumped on it only to find that it paid almost nothing to publishers like myself.  Just pennies at a time would take years to generate enough given the traffic on my sites for Google to cut me a cheque.  In all the time I had been running ads I think I had only ever received a payment once.

    The dismal returns eventually made me realize that the ads were more detrimental to my readers than they were worth.

    However, recently I have been on a mission to cut my budget for servers and part of that effort is to get my websites to pay their own way or else risk getting shut down.  I started by experimenting with adding Adsense back to one of my blogs and found, to my surprise, that it earns enough to pay for its hosting costs.

    I have extended ads onto this website, so that I pay less out of my own pocket to keep it running.  Hopefully keeping these sites financially sustainable will be mutually beneficial to both me and the readers.

  • Programming for Impact

    One of my ambitious goals for the year is to take on 12 programming projects.  The projects I’ll undertake for this challenge have a couple of requirements:

    1. It should be something that I can complete in 1 month of calendar time
    2. it should take less than 40 hours of effort
    3. it should ideally challenge me to learn something new
    4. it should maximize impact for someone other than myself

    Trying to think of project ideas that can have a big impact with less than 40 hours of effort is not easy.  Luckily Kenneth Reitz serves as a model for how to accomplish this.  He is best known for authoring the awesome requests library, which is ubiquitous. Over the last while he’s been on a tear.

    • typy.io – a service for sharing text snippets
    • pipenv – a wrapper that combines pip and virtual envs
    • maya – datetimes for humans
    • saythanks.io – a way to send thank you to open source developers

    These are potentially high impact projects compared to the effort required to create them. It’s something that I will try to replicate this year with my own projects and an approach to open-source development that I wish more people took.  Flooding the community with high impact contributions enriches us all.

    Impact can be a trade off.  A large impact can come from a small improvement for many people or a large benefit for a small number of people. And of course these are all relative to what leverage you have to help.  The audience that Google can have an impact on is vastly different than the number of people I can reach. So I’m trying to be realistic about what kind of impact a 40 hour project can have.

    Optimizing for impact seems to be a great goal.  It re-frames the importance of a project; If I could do something in 40 hours that would double someone else’s business it might be a worthy project to consider.  If I could contribute a wrapper for an API that is used by 1000s of developers it could have a wider impact.

    Scratching your own itch is the common motivation for open-source development, but in a sense it is inward focused and self-serving. In a world where we increasingly don’t talk to our neighbors or contribute to our communities doing things to help others sounds radical.

    Be radical, make an impact.

  • Writing a Twitter Stock Trading Bot

    My project for the month is a stock trading bot that will ingest tweets from accounts I consider to be market influencers and do some parsing and sentiment analysis to help create and execute a trade through my broker.

    For years I’ve wanted to build something to do automated trading and this is something that seems simple enough to accomplish in a month.  That makes it a worthy experiment.

    There are several steps to this process:

    1. connect to the twitter stream API and listen to specific user accounts
    2. for each tweet that comes in, parse it for a company name or CEO name
    3. if there is a company or CEO mentioned, find the ticker symbol
    4. run a sentiment analysis on the tweet
    5. look up the current price of the stock
    6. decide on a trade (long/short) and size, limits and stop loss
    7. execute trade through broker

    This project will be open-source for those of you interested in watching the progress or curious to see how it works. Twitter Trading Project

  • Programming Deliberate Practice

    For software developers there is an unhealthy prevailing belief that being a great programmer is some innate skill that others have. Brilliance with developing code is difficult to train for because it either requires some gift you don’t have or years of on the job experience.  There is a large amount of impostor syndrome within the community which is not healthy or productive.

    Of course people who are top developers know that it takes a lot of hard work to understand core concepts. It helps to have a mentor and a solid education and access to training.

    There is a tactic to getting better which more programmers should be using.  Deliberate practice is the most critical aspect to improving any craft and programming software is not an exception.  Like playing piano or painting or ceramics there is creativity and technical skill which can be improved on with deliberate practice.

    If you want to get better at your craft it is not good enough to simply work on job tasks. For work you typically do something once and then it’s done, there’s few opportunities for repetition and critical evaluation.  If you were learning to play piano and you had a sheet of music the equivalent to developer workflow would be to play through the song once, stopping to go back and fix your mistakes then when you finished you’d put the song away move on to a new song.  Practising piano requires playing the same song hundreds of times, you start by playing and focusing on not making mistakes, when that is accomplished there is still practice at making the song sound good with appropriate pedal usage, tempo and dynamic, and finally when that is good enough you can continue to practice the same song and add your own touches – arpeggios, slurs, delays etc.

    How many times have you implemented a deck of cards?  Can you write one top to bottom without looking up examples on stack overflow, or querying the documentation or searching through code completion lists? Could you write a deck of cards in a procedural, functional and object oriented styles?  Could you meta program a deck of cards? Could you make a deck that is thread safe? distributed? Web scale? Obfuscated?

    Practice. Do it often, and do it deliberately.

  • Programming Momentum

    Writing code everyday has been an interesting challenge.

    In 2015 I started to work towards a long streak on GitHub which eventually capped out at 250 days. The questions I wanted to answer was:

    • Can I apply ‘deliberate practice‘ to programming and get better?
    • Can ‘free coding’ (like free writing) be effective way to push through writers block?
    • How important is memorising to your coding performance?
    • If syntax and API unknowns don’t present bottlenecks to your flow how fast can I translate an idea into code – can it be limited by typing speed?

    I started a repository for my daily coding.  It had a simple script to generate a blank file everyday for me to code in and I would try to code something.  Sometimes it would be to explore a new python module, or fiddle with syntax, or challenge myself with a rosetta code example or replicate a previous day’s code from memory.  I wrote dozens of Flask apps, memorised the methods of lots of APIs, and gained a level of confidence with writing Python that I don’t have with any other language.

    At the end of the streak I had a repository with hundreds of small scripts.  Only a handful of them were multi-day efforts or had any real value.  The variety of this collection proved to be useful on it’s own too – several times I have referred back to these examples to help with my actual work and to copy/paste snippets from.  Some of them started me down a path of exploration – like calculating the return on investment for solar panels.

    Part of what enabled me to maintain this streak as long as I did was a simple script I wrote to check GitHub for daily activity and email me if I hadn’t yet committed any code.  This simple hack was enough of a reminder to keep me focused even when I was otherwise distracted.

    This past week I turned that script into a web service anyone can use.  CodeStreak.io will watch your public github activity and email or SMS you if you haven’t yet pushed any code for the day.  This is the first project of 2017 that I plan on building to expand on my previous streak.

    In 2017 I want to build 12 projects.  Each should be roughly 10-20 hours of effort and result in something that provides value for other people.  CodeStreak.io is an example of the kind of project that I want to undertake this coming year, but it is also a tool to help ensure that the momentum is sustained for 12 months.  Blocking out 4 hour chunks of time is a helpful way to really focus and be productive, but 4 hours once per week has been (for me) too sparse to maintain interest in something long enough to finish it.  A little bit everyday keeps a project on your mind.  Attempting to maintain a streak will be a tool to power through the bits that are otherwise uninteresting or difficult.  CodeStreak.io is a foundational tool necessary to accomplish my 2017 goal.

    The questions I want to explore with this new goal are:

    1. Without a concern for generating revenue can I just write cool things and get them out there?
    2. Can I get deeper into something new and create something useful out of it with less than 20 hours of effort?
    3. Can you get good at seeing a project from start to finish – what skills or traits will improve the odds?

    Hopefully, I’ll have some answers at the end of 2017.

  • Deploying a Python Flask App on Amazon Lightsail with Dokku

    One of the welcome additions to Amazon’s AWS offerings is a simplified server provisioning service to compete directly with Digital Ocean called Lightsail.  Lightsail provides a nicer web UI for launching instances, many quick launch options for common apps like WordPress or GitLab and simplified billing (yay!).  With Lightsail you don’t need to pre-pay for Reserved Instances to get a good price on an EC2 server.

    Dokku is mini heroku you can run on your own servers.  It uses the same buildpacks that Heroku does to enable git push deployments. By building on top of Docker a collection of available dokku plugins make it easy to start up databases, caching or tie in other services.  In this tutorial I add Postgresql and get an SSL cert using Let’s Encrypt

    Together, Lightsail and Dokku create an easy way to manage your application deployment on an inexpensive server.

    Get started on Lightsail by starting up a new virtual server:

    And then selecting an Ubuntu Image:

    There’s a spot here for ‘Add launch script’ where you can drop in these commands to automatically install dokku on first boot:

    wget https://raw.githubusercontent.com/dokku/dokku/v0.7.2/bootstrap.sh
    sudo DOKKU_TAG=v0.7.2 bash bootstrap.sh

    Give it a name and press Create to start booting up the server. You should be able to SSH to the new server very quickly though you can connect before dokku and package updates have been applied (it’ll take a couple minutes for the dokku command to become available)

    After a couple of minutes have passed and things are installed and running visit your server in a web browser:

    For the public key you’ll want to grab the key on your computer.  if you have linux or macOS you can grab the contents of ~/.ssh/id_rsa.pub.  If you need to generate a key there’s a good How-To on Github about generating them.

    Set the hostname you’ll use for the server if you have one and Finish Setup.

    Next step is to SSH to the server and fiddle with it there using the private key you can download from Lightsail:

    ssh -i LightsailDefaultPrivateKey.pem ubuntu@<YOUR PUBLIC IP ADDRESS>

    And create the app you will be deploying:

    dokku apps:create your-app

    Add a postgres database (there are other great plugins available for dokku too)

    sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git
    dokku postgres:create database-name
    dokku postgres:link database-name your-app

    Now, back to your local app add a git remote to this new server and deploy it:

    git remote add dokku dokku@<PUBLIC IP OR HOSTNAME>:your-app
    git push dokku master

    If that is successful then the project should be visible online. Yay!

    Then there are some next steps to help complete the app. Set any environment variables you need for the app:

    dokku config:set your-app ENV=prod

    You can install an SSL cert using Let’s Encrypt very easily:

    sudo dokku plugin:install https://github.com/dokku/dokku-letsencrypt.git
    dokku letsencrypt your-app
    

    You can configure some pre and post deploy hooks inside the app.yaml file in your project repository to run checks or execute database migrations.

    That’s about it! git push deploys to update your project whenever you want.

  • Programmatic Models

    Recently I’ve been interested in finding a business investment – something like a B&B that allows me to put some of my retirement savings into a business that I have some control over its success.  The normal process for something like this would be to write a business plan or at least do some back of the envelop estimations for how much revenue is expected from the property.

    The usual tool of choice is a spreadsheet.  And those are excellent ways to work through the numbers and visually see things.  However, the flexibility of a spreadsheet is somewhat limited for even more advanced analysis.

    I wanted to take things to a different level.

    What information could I get from looking at the market and scraping webpages that I could feed into a bigger model to see how other owners of similar businesses do.  By pulling in 1000+ comparables and running them all through a similar model to estimate each of their profitability it becomes possible to identify the traits of a successful business.

    Applying this sort of ‘big data’ analysis is proving interesting.  There is an amazing amount of information freely available on the internet, but much of it exists in different silos.

    In the example of running a B&B, there are lots of them listed on booking.com and similar travel booking sites.  These provide a partial picture of how popular a place is (from it’s availability) and the revenue (from the cost to stay there). Another big piece of the picture is the costs – which you can estimate by checking real-estate listings.  By putting all this information you can see many interesting things.

    If your model is accurate then you can get answers to these questions:

    • What percentage of B&Bs turn a profit each year
    • Is there an optimal size / number of rooms
    • which attributes of the property correlate most to it’s profitability

    You can take a deeper dive into the best performing properties to see if they do something unique – do they have nicer websites / photos? Do they do aggressive advertising? Are they active on social media?  Answers to these questions can help you find the strategies that are working best in the market – and perhaps things that are a waste of time.

    This type of analysis is something I think more people should be doing.  It provides some competitive advantage in terms of the information that you bring with you into a potentially big investment, and reduces the risk that you inadvertently buy a lemon.

     

  • Small Projects

    There’s nothing quite like the feeling of starting a new project idea and seeing it all the way through to finished and published.  It’s a feather in your cap that you can look back on and say “I built that”.  Regardless of if it is a big hit or not, it will make you a stand out – very few people get something all the way to done on their own.

    Ambition can act against you in this.  The larger the project the more opportunties there are to hit roadblocks which derail it. The size of a project is a risk that should be minimized.

    That’s why I believe it’s important to create momentum with smaller projects.  A small win still gives you a great amount of confidence.

    This applies both to home projects, or code projects or hobbies.

    Small is a relative term.  You may be able to handle a small 40 hour project, while someone else cannot yet tackle something that big.  Small may be as simple as fixing a wall hook or creating a pull request to fix a typo in the documentation of an open source project.

    By putting a lot of these small projects together you create something bigger than the sum of them.  Fixing all the small things around your house can turn it into a relaxing home, Contributing to Open Source projects could gain you some notoriety and help you get a dream job.

    Derek Sivers said “the best option is the one with the most options” and doing many small projects gives more options than one big one.

    37 signals (now basecamp) started out with 6-10 individual products. When starting they didn’t know which would be a success so creating many smaller ones diversified their risk and helped them succeed.

    Small projects are going to be a core part of my strategy for 2017.  Launching micro-sites, simple tools, or open-source libraries that can be finished in 8-10 hours of effort.

    Think small, get out there, and finish it.  It’s a step to something bigger.

  • Applying Machine Learning Lessons to Humans

    The more that I learn about Deep Learning and other Machine Learning concepts the more intrigued by the idea that we could apply some of things we learn about how these ML models behave back onto human psychology.  This is not something I have heard discussed yet. They were, after all, roughly modelled on how our own neurons work and could be considered a crude model of how we work.

    What are some of the behaviours and lessons we’ve learned from training AIs that could be applicable to how we learn for example.  AIs are obviously dramatic simplifications of our own minds but they learn in similar ways.

    Machine learning algorithms can be divided into supervised and unsupervised learning models.  They are not equivalent and the things you can do with one are not possible with the other approach. Would it be helpful to identify topics in school that can be associated with each approach so that we can optimise our teaching approaches?

    A concrete example of this is how we learn a new language.  A common suggestion for language learning is to immerse yourself in it.  To that end people will listen to radio and music in their target language. Is that an effective way to improve your understanding?  This would be considered mostly unsupervised since we have no answers for what a particular sound we hear might mean (unless we can guess from a context of other words we already know).  If we fed 10,000 hours of voice recordings into an unsupervised machine learning algorithm what kind of things would we be able to learn?  It might be able to pick up some common words or phrases that are used, it might be able to find words that are often used close together.  It would get a feel for the ‘sound’ of a language.  But that is likely as deep of an understanding as it could make.

    Given this insight we could hypothesis that immersing yourself with just recorded audio is not particular effective at learning what words mean.

    If we wanted to teach a computer to hear a word and turn it into text we need to have the sounds and the matching text.  This is a supervised approach and can be quite effective. However, we know that this is much more effective if we have lots and lots of training data.  For a particular word it helps to have the word spoken by many different people, spoken quickly and slowly, varying pitches and accents.  The more examples we have to train on the better the accuracy is going to be.  You’ve probably experienced listening to a song and hearing a word you can’t quite make out. You listen over and over but still can’t get it. Then someone else tells you what the lyric is or you hear a different recording of the song and suddenly it becomes crystal clear.  Now you can hear it.

    Given this, perhaps we could ensure that training programs on a computer don’t just replay the same recorded words over and over again but instead give lots of variations.  It would be interesting experiment to have a 1 page story in your target language recorded by 10-20 different people. Would listening and reading along to all the recordings help with your listening comprehension?  How much better would you learn listening to 1 recording 20 times vs 20 different recordings?

    Several studies have looked at the efficacy of same-language closed captioning to reading and listening comprehension and prove that it can help.  Similar application of supervised learning applied to people.

    Another area that generates much concern in machine learning is how to identify and prevent over-training.  Over training happens when the algorithm essentially memorises the answers and has difficulty applying to new input it hasn’t seen yet.  There are techniques for testing that are used to help diagnose over training. One such approach is to separate the training data from the testing data.  Trying to determine if students have memorised the answers or really understand a concept is critical to their ability to move forward and build on those lessons. Could we apply our machine approaches to humans to help identify memorisation vs understanding?

    I’m sure there are more fascinating ways we could take what we have learned from teaching machines and apply it to how we teach people.