Codemash Friday 9:45a – F# and Machine Learning – A Winning Combination


F# and Machine Learning – A Winning Combination
#Mathias Brandewinder

A fellow accidental developer with an operations research background!

Data Scientist – person who is better at statistics than any software engineer and better at software engineering than any statistician. (from a tweet)

.Net underrepresented in data science/machine learning because it is perceived (possibly correctly) as not being a very good fit. But F# can be, so we are going to talk about it.

Machine Learning
* Writing a program to perform a task
* More data, better performance
* Not explicitly programmed for this [no code change]

i.e. the program gets better all by itself

Classification and Regression
Classification – using data to classify items (spam v. ham for example)
Regression – predicting a number (price given some attributes for example)
Both are part of the concept Supervised Learning – you know what question you are trying to answer and you use data to fit a predictive model

Support Vector Machine example – classic algorithm – we need to look it up
Using the Accord.Net library

F# has a REPL, which makes it easier to iteratively tweek algorithm type problems. Load up the data set and then keep running different variations of the application against it without reloading. When working with lots of data this can be a big time saver.

Goal is to build a model that will predict new data. Need to build it with “training data”. Take your data set and split it in half. Use half to train the algorithm. Don’t expect your model to be perfect.

Math.Net allows you to do algebra in .Net
let A = matrix [ [ 1.; 2.; 3.; ];
[4,; 5.; 6.; ]
[7.; 8.; 9.;]]
Typical in Python, not possible in C#. F# makes the matrix obvious.
Linear Regression problems can be solved in this problem

F# has a timer built in to the REPL, so you can find out how long your functions take to run -> #time;;

Gamers care about algebra – graphics rendering uses vectors
GPUs are optimized for algebra processing
You can use a different LinearAlgebraProvider that uses the GPU for your work, which runs MUCH faster.
Esentially compiles your F# code to native GPU code. F# is good at cross compiling – there is of course an F# to Javascript compiler as well

Interactive experience with a REPL is a huge advantage, and .Net does actually have some decent machine learning libraries available

Unsupervised Machine Learning
You usually have to write your own libraries because a suitable one probably doesn’t exist. As you learn your domain you may need a custom model

Most ML algos
* Read Data
* Transform into Features
* Learn a Model from the Features
* Evaluate Model Quality

Maps to functional
* Read -> Read
* Transform -> Map
* Learn -> Recursion
* Evaluate -> Fold

Unsupervised example – Tell me somthing about my data
Example – Clustering – find groups of “similar” entities
Create centroids for the number of expected groups
Move them closer to group averages
Keep going until there is no more change

Implemented the K Means algo in about 20 lines of F# code (Code is on his github repo)

Type Providers
“No data, no learning” – you need to get data into your system or the algorithm is of no use (80-90% of the job is getting and cleaning data)
“Machine learning is a data janitorial job”
Dynamic languages are nice but you don’t find out about data issues until runtime
Static prevents runtime errors but requires “heavy artillery” (ORMs, etc.)
Type Providers are a compromise between the two
Type providers for csv, JSON, SQL, etc. plus customs for special data sets
There is a WorldBankData provider that has various country and economic statistics, which could be useful. It essentially wraps a public API and makes calls over the Internet to obtain the data.
Type Providers can support queries (like SQL/LINQ) as well
There are even type providers to langauges (R, for example)
Allows you to use functions from R in F# (even gives you a bit of intellisense-like functionality to discover the fuctionality)
You can use F# for what it is good at (type providers) and R for what it is good at


Codemash Friday 8:30a – Gitting More Out of Git


Gitting More Out of Git
#Jordan Kasper

Disclaimers – not for noobs, all examples will be from the command line

Git is decentralized. We all know that, but do we know what it really means?

The entire repo is on your system – not just the branch you are working on
Including all of the history
So, if the “central” goes down, you can still work. In fact, other people can get it from you as well to start working. We don’t typically do that, though.

A remote is repository outside of your current environment. Technically even in another location on you system.
When you clone, you get all branches. It also creates a remote named “origin” that points to the location you cloned from.
Your local branch is “tracking” the remote branch. This is how git knows where to send your changes to when you do a push.
git remote -v will tell you where all of your remotes point
git push -u origin new-feature will set the “upstream” remote branch to track your local branch to
Forking is a github term, not a git thing
A fork is actually just a case of setting a remote
git fetch <remotename> will pull down changes from a specific remote (default is usually origin)

fetch gets all of the changes
pull gets them AND merges into your code

git branch –no-merged will show you all of the branches that have not yet been merged
git diff master stuff is essentially how a pull request works
git diff master..stuff shows the differences between the branches from when the branch split from master

Git and Data Integrity
Git uses snapshots (not file diffs)
File diffs means differences are tracked by file
Snapshot is a picture of the entire repo when taken (all files) – marked with a hash of the entire repo at that time – even changing a single whitespace character will be noted (because you get a different hash)
Hash is actually 40 characters of hexidecimal. You usually only see 7 because that is enough to be distinct in most cases.

When things go Wrong
git commit –amend -m “corrected message” will allow you to correct a commit message, but note that it changes the hash, since you changed the metadata
Two changes with the same things changed but different hashes will cause problems, that is why you should not change a commit after you have shared (pushed) it
reflog is local to you – it is never shared
git reflog shows your changes over time
You can even add a file to a commit: git commit –amend (after staging the file) (Again, don’t do this after you have pushed)
git checkout <filename> will throw away your unstaged changes. It will not remove a newly added file, however.

Three stages of a git repo
HEAD – commited code
Staging – ready to commit
Working Directory – your current work

Committed Changes (oops)
git reset –soft HEAD^ (move head back one commit (more ^’s means more commits))
git reset –mixed HEAD^ (moves head AND staging back one commit)
git reset –hard HEAD^ (wipes out the change completely from all three stages)
you can still get your changes back if you find out you didn’t mean to use “hard”
use git reflog to find the hash for the deleted commit:
git reset HEAD@{1} will bring the change back

Once you have any changes pushed beyond your local repo you should consider it carved in stone. You should not make these kinds of changes to pushed commits.

Head Notation
HEAD^ (back one from current HEAD)
HEAD^^ (two places)
HEAD~n (back n places)
HEAD@{i} (back to reflog index ‘i’)

git stash
You made some changes, but you aren’t ready to commit. Now you need to do work in a different branch. Changing branches will bring along unstaged changes. stashing puts your changes out of your way
You actually have to stage first (most people don’t know that)
git stash will “commit” all staged (and maybe unstaged) changes to a temporary local commit and restore your working environment to the last commit
you can stash multiple times
git stash list shows your currently stashes
BUT – it doesn’t have any comment
git save lets you comment
git stash apply brings it back, but doesn’t get rid of it
git stash drop will remove them
git stash pop brings out your stash and removes it from your list of stashes

Log Options
git log –oneline shows you one line
git log –online –graph shows you a command line version of the pretty branch chart most GUI tools have
git log has lots of options to only show you want you want

git blame filename will show you the last change for each line (hash, author, line)

Playing Nice with Others
git checkout master
git merge feature
Fast forward if no divergent changes
Fast forward does not put the fact that there was a merge in the history
–no-ff forces a merge commit so you can see a merge happened
If there are divergent changes, you get a merge commit (which has 2 parents – only kind of commit with multiple parents)

Merge conflicts – changes git can’t sort out (both commits change the same line for example)
Fix the problem file
Stage the file
Commit (which will be the merge commit)

Rewrites history
Essentially changes the commit you branched from. Doing this one already pushed changes will cause problems, same as above.
You can still get comflicts in a rebase, handled the same way, except you tell rebase you want to continue (problems pause a rebase)
git rebase –abort allows you to stop a rebase if there is an issue

Cherry Picking
Allows you to bring in changes from a different branch that has some work you want to save. After you cherry pick, you should delete the branch you cherry picked from.

Codemash Thursday 3:30p – F# in Social Coding


F# in Social Gaming
Yui Cui

F# tends to be good for math (financial programming, etc.)
BUT it is good for general purpose as well. Starting to get used more this way.

They have implemented a slots game in F#.

Mathematicians create a math model for every game to determine how players win.

Discriminated Unions are like Enums, except that they are real types – you cannot instantiate an invalid value like you can with Enums.

Use type aliases to make your code clearer:
Type Count = int
Type LineNum = int

The point is to make invalid states unrepresentable

You’re never too smart to make mistakes

Units of Measure help
type Pence
so, 42<Pence>

10<Meter> / 2<Second> = 5 <Meter/Second>

Helps you to not use the terms incorrectly and avoid calculation errors

Records are containers for data – they are immutable

You can add a custom indexer to a record to make it easier to access a particular element of the contained array, for example

In F# you have to be explicit about what functions you want to be recursive (rec keyword)

match allows you to use the power of the discriminated union to execute different code in each case in a much safer and clearer fashion than an old-school switch statement. match does to the untrained eye look like a switch

To hold game state, they are using the actor model
Everything is an actor
An actor has a mailbox
It can receive messages via the mailbox

Gatekeeper manages the list of workers
Gatekeeper is the bottleneck, so it directs traffic and then gets out of the way – workers respond back around the gatekeeper

Actors communicate by sending messages
If they need a response they have to send a reply channel

Actors can do their communication asychronously
Does not block I/O

Actors for state management increased their performance on game servers by 5X
Caveats – game servers need to have affinity to players (player gets the same server throught their session). Need to load balance, need to avoid hotspots (tweak the logic to spawn new sessions intelligently)

Timing out players after 3 minutes helped with the hotspot problem.

Need to gracefully scale down – move player off to other servers and refuse new players so that you can turn off the server that is no longer needed

Agent summary
* no locks
* async message passing

Agents are low level and will lose any unsaved state changes if they die due to failure or exception. There is not much you can do to improve this as they come out of the box. Use, MS Orleans, or Cricket frameworks to help mitigate this. Or move out of .net and use a language that runs on the Erlang VM (erlang, elixir, etc.)

Replacing a large class hierarchy
Large scale multiplayer games with experience, levels, quests, etc. has a lot of state data

Their solution – use the message-broker pattern
Caught a gnome – an fact occurred (and event)
Each fact can trigger actions in many different areas of the game
There are lots of facts
Instead of discrete classes (100’s) build a series of discriminated unions that build on each other to express the facts
The problem is that this affects C# interop – it makes code that uses the DUs very ugly
You can use marker interfaces in C# to make it easier
Introduces the possibility of invalid states, though, so now you need to handle the exceptions that occur when there is an invalid state.
Still reduces the class hierarchy

Codemash Thursday 1:00p – Engineering innovation


Engineering Innovation

Dustin Updyke

Played Steven Johnson’s talk from RSA Animate.

We don’t typically get into the squishy bits. Humans don’t have binary inputs and outputs.

We don’t much to make sure ideas are captured and possibly acted upon.

“Without Ideas, mankind has nothing”- Charlotte Lang

Ideas are the currency for the future.

Product Owners are important because they lay the tracks for where we are going to go.

Get the ideas out on the table, vet them quickly, and let them go somewhere

What are the ideas we should chase? Ideas that don’t make sense at central headquarters might make a lot of sense out in the field.

Without new ideas, companies die (example – Kodak)

You can’t have an “A” team that gets to do all of the prototyping and proof of concepts – that doesn’t scale and discourages others from staying engaged in the company.

Core – what we do today
Adjacent – expanding what we do (new markets, incremental features, etc.)
Transformational – creating product spaces or entire markets

Incremental —–>—–>—– Innovation
It is a continuum. Thinking of it as a line makes it easier to see how much you are shifting. Too much change will be resisted, but sometimes you need to have aspirational goals to move you along the path.

“Innovation is about perspective shift. If it were obvious, we’d already be doing it” – Astro Teller, Google

Shaving costs has limits – you will lose in the end of that path.

Invention has fallen in to disuse – we use Innovation to mean invention.

Someone has to have the job to track and monitor innovation. Facilitate ideas via events, sessions, etc.
You can invite in vendors to work with you and see what could be possible.
Every X (quarterly) review – report out the progress

Build a case – Why? Stakeholders? Benefit? Use?
Stakeholders is often skipped, but it is actually one of the more important things to consider.

Tell the story of why, how it works, impact, market, costs, etc.

Apple started asking “why” you would want an iPod in a market full of mp3 players. Asking why allowed them to start with the benefit and how they were going to make a better mp3 player.

Idea -> ??? -> Profit: doesn’t work. You need to know why.

Impact and value often need to be researched. It can’t just be guessed at. Sometimes this might require enlisting a third party.

4 Quadrants:
Why/Value | Risks/Steps/Approach
How it works | Market Opp/Biz Value

Build a one page slide that shows the answers to these questions

Where do ideas come from
People need structure to get started – limits are beneficial because they help focus
Doesn’t have to be heavy

Guide – bring ideas out and get them moving (top down AND bottom up)
Top down innovation is difficult to maintain a flow, and only let’s some people innovate
You should always have ideas in the hopper so that you can start moving on the next idea.
Negative feedback is useful because it helps you find out the problems with the idea that need to be mitigated, and gives you the opportunity to make a convert when you figure out how to work around the problem
Need to have a constant flow of ideas. What do we do once we ship the idea? How do we handle feedback?

Doing is important and where we focus, but the story is important so that people can get behind the idea. It helps connect them to the idea.

We need a habit of creating ideas – habits become self-perpetuating.

Sometimes throwing out a dumb idea can get people thinking, if you can handle looking a little bit dumb.

Sometimes good, unrelated ideas come out of crazy impractical ideas

Don’t kill ideas – just let them sink down the list. They may just not be ready yet – wrong time, supporting tech or biz isn’t ready, etc.

Tracking enables – it lets you connect the dots and see where you could go

The unknown know – things we walk past ever day but don’t think about. Opportunities to make things we do all the time better.

Need a place for good ideas to go: need to do further research, prototyping, etc. and allow budget for doing so

Taguchi Score is how they rank ideas.

Build to think – allow the person who came up with the idea to lead the work on it. Encourages them to have more ideas (and others as well).

There is no bad idea – it just may not be that idea’s time yet

Coach others

Last year they had 2000 ideas in their tracking system, but only worked on 2. That’s not a bad thing. You can only focus on so much. “Conveyor belt that is a roulette wheel” – bets constantly coming past and you have to decide which ones to take.

Codemash Thursday 10:30a – An Introduction to to Artificial Intelligence


An Introduction to to Artificial Intelligence
#Seth Juarez

A high level overview of AI concepts.

Path finding.
What is the best way to get to work?
The moving squares game (put them in order) is also a path finding problem.
We are going to look at how you could solve that with a computer.

A well defined pathing problem has 5 components:
* states
* initial state
* successor function
* goal test
* path cost

Puzzles problem:
States – 8 number tiles and a blank
Initial State – some arrangement
Successor function – swap the blank with a number
Maximum of 4 possible options: left, right, up, down
Goal test – are they in order
Path cost – 1

uninformed search options
breadth first search
uniform cost search
depth first search
depth limited search

Looking for an algorithm that is complete and optimum

Wrote the puzzle as a game

Write solvers for each of the search options to try them out

Breadth First Search
Explore each option for every move. Check that move, then check the next one. Use those as the starting point for the next level. Explore every option at each level.
It is complete – it will find a solution
It is not necessarily optimal – it might not find the shortest solution in cases where the path cost is something other than 1 (a non-decreasing cost function).

Depth First Search
Go down the entire path from a move until you exhaust the possibilities. May essentially overflow the stack and never find a solution.
Possibly not complete, probably not optimal.

Depth Limited Search
Limit how far down you go while doing a depth first search.
Complete only if you set your limit far enough. Compensate by increasing the limit if a solution is not found. Knowledge of the possible limit improves this.
Optimal – it can be if you start with the correct limit. If you set the limit to 3 when it should be 4, then you either don’t get the answer orhave to run again with a higher limit, which is waste.
If you set the limit to 5 when it should be 4, you will search more nodes than you need to.

Informed Search
Define a heuristic function (A*, for example)
Take in to account how long it has taken you to get there already to inform your depth.
Completeness depends upon your heuristic function
So for a real-world map problem, straightline distance would be a good choice
If you have an “admissible heuristic function” A* is complete and optimal

Adversarial Search
Chess solver, for example
Tic Tac Toe example – a computer solver cannot be beat (will always result in a tie unless you make a mistake)

minimax algorithm – Minimize the maximum of the other player (take the move that helps your opponet the least)

alpha-beta pruning – only expand the branches that are better than your current state
Makes the assumption that the opponent plays optimally, which may cause a problem

I should point out he has some pretty awesome visualizations of these that are available on his github account. []