Codemash Thursday 3:30p – F# in Social Coding


F# in Social Gaming
Yui Cui

F# tends to be good for math (financial programming, etc.)
BUT it is good for general purpose as well. Starting to get used more this way.

They have implemented a slots game in F#.

Mathematicians create a math model for every game to determine how players win.

Discriminated Unions are like Enums, except that they are real types – you cannot instantiate an invalid value like you can with Enums.

Use type aliases to make your code clearer:
Type Count = int
Type LineNum = int

The point is to make invalid states unrepresentable

You’re never too smart to make mistakes

Units of Measure help
type Pence
so, 42<Pence>

10<Meter> / 2<Second> = 5 <Meter/Second>

Helps you to not use the terms incorrectly and avoid calculation errors

Records are containers for data – they are immutable

You can add a custom indexer to a record to make it easier to access a particular element of the contained array, for example

In F# you have to be explicit about what functions you want to be recursive (rec keyword)

match allows you to use the power of the discriminated union to execute different code in each case in a much safer and clearer fashion than an old-school switch statement. match does to the untrained eye look like a switch

To hold game state, they are using the actor model
Everything is an actor
An actor has a mailbox
It can receive messages via the mailbox

Gatekeeper manages the list of workers
Gatekeeper is the bottleneck, so it directs traffic and then gets out of the way – workers respond back around the gatekeeper

Actors communicate by sending messages
If they need a response they have to send a reply channel

Actors can do their communication asychronously
Does not block I/O

Actors for state management increased their performance on game servers by 5X
Caveats – game servers need to have affinity to players (player gets the same server throught their session). Need to load balance, need to avoid hotspots (tweak the logic to spawn new sessions intelligently)

Timing out players after 3 minutes helped with the hotspot problem.

Need to gracefully scale down – move player off to other servers and refuse new players so that you can turn off the server that is no longer needed

Agent summary
* no locks
* async message passing

Agents are low level and will lose any unsaved state changes if they die due to failure or exception. There is not much you can do to improve this as they come out of the box. Use, MS Orleans, or Cricket frameworks to help mitigate this. Or move out of .net and use a language that runs on the Erlang VM (erlang, elixir, etc.)

Replacing a large class hierarchy
Large scale multiplayer games with experience, levels, quests, etc. has a lot of state data

Their solution – use the message-broker pattern
Caught a gnome – an fact occurred (and event)
Each fact can trigger actions in many different areas of the game
There are lots of facts
Instead of discrete classes (100’s) build a series of discriminated unions that build on each other to express the facts
The problem is that this affects C# interop – it makes code that uses the DUs very ugly
You can use marker interfaces in C# to make it easier
Introduces the possibility of invalid states, though, so now you need to handle the exceptions that occur when there is an invalid state.
Still reduces the class hierarchy


Codemash Thursday 1:00p – Engineering innovation


Engineering Innovation

Dustin Updyke

Played Steven Johnson’s talk from RSA Animate.

We don’t typically get into the squishy bits. Humans don’t have binary inputs and outputs.

We don’t much to make sure ideas are captured and possibly acted upon.

“Without Ideas, mankind has nothing”- Charlotte Lang

Ideas are the currency for the future.

Product Owners are important because they lay the tracks for where we are going to go.

Get the ideas out on the table, vet them quickly, and let them go somewhere

What are the ideas we should chase? Ideas that don’t make sense at central headquarters might make a lot of sense out in the field.

Without new ideas, companies die (example – Kodak)

You can’t have an “A” team that gets to do all of the prototyping and proof of concepts – that doesn’t scale and discourages others from staying engaged in the company.

Core – what we do today
Adjacent – expanding what we do (new markets, incremental features, etc.)
Transformational – creating product spaces or entire markets

Incremental —–>—–>—– Innovation
It is a continuum. Thinking of it as a line makes it easier to see how much you are shifting. Too much change will be resisted, but sometimes you need to have aspirational goals to move you along the path.

“Innovation is about perspective shift. If it were obvious, we’d already be doing it” – Astro Teller, Google

Shaving costs has limits – you will lose in the end of that path.

Invention has fallen in to disuse – we use Innovation to mean invention.

Someone has to have the job to track and monitor innovation. Facilitate ideas via events, sessions, etc.
You can invite in vendors to work with you and see what could be possible.
Every X (quarterly) review – report out the progress

Build a case – Why? Stakeholders? Benefit? Use?
Stakeholders is often skipped, but it is actually one of the more important things to consider.

Tell the story of why, how it works, impact, market, costs, etc.

Apple started asking “why” you would want an iPod in a market full of mp3 players. Asking why allowed them to start with the benefit and how they were going to make a better mp3 player.

Idea -> ??? -> Profit: doesn’t work. You need to know why.

Impact and value often need to be researched. It can’t just be guessed at. Sometimes this might require enlisting a third party.

4 Quadrants:
Why/Value | Risks/Steps/Approach
How it works | Market Opp/Biz Value

Build a one page slide that shows the answers to these questions

Where do ideas come from
People need structure to get started – limits are beneficial because they help focus
Doesn’t have to be heavy

Guide – bring ideas out and get them moving (top down AND bottom up)
Top down innovation is difficult to maintain a flow, and only let’s some people innovate
You should always have ideas in the hopper so that you can start moving on the next idea.
Negative feedback is useful because it helps you find out the problems with the idea that need to be mitigated, and gives you the opportunity to make a convert when you figure out how to work around the problem
Need to have a constant flow of ideas. What do we do once we ship the idea? How do we handle feedback?

Doing is important and where we focus, but the story is important so that people can get behind the idea. It helps connect them to the idea.

We need a habit of creating ideas – habits become self-perpetuating.

Sometimes throwing out a dumb idea can get people thinking, if you can handle looking a little bit dumb.

Sometimes good, unrelated ideas come out of crazy impractical ideas

Don’t kill ideas – just let them sink down the list. They may just not be ready yet – wrong time, supporting tech or biz isn’t ready, etc.

Tracking enables – it lets you connect the dots and see where you could go

The unknown know – things we walk past ever day but don’t think about. Opportunities to make things we do all the time better.

Need a place for good ideas to go: need to do further research, prototyping, etc. and allow budget for doing so

Taguchi Score is how they rank ideas.

Build to think – allow the person who came up with the idea to lead the work on it. Encourages them to have more ideas (and others as well).

There is no bad idea – it just may not be that idea’s time yet

Coach others

Last year they had 2000 ideas in their tracking system, but only worked on 2. That’s not a bad thing. You can only focus on so much. “Conveyor belt that is a roulette wheel” – bets constantly coming past and you have to decide which ones to take.

Codemash Thursday 10:30a – An Introduction to to Artificial Intelligence


An Introduction to to Artificial Intelligence
#Seth Juarez

A high level overview of AI concepts.

Path finding.
What is the best way to get to work?
The moving squares game (put them in order) is also a path finding problem.
We are going to look at how you could solve that with a computer.

A well defined pathing problem has 5 components:
* states
* initial state
* successor function
* goal test
* path cost

Puzzles problem:
States – 8 number tiles and a blank
Initial State – some arrangement
Successor function – swap the blank with a number
Maximum of 4 possible options: left, right, up, down
Goal test – are they in order
Path cost – 1

uninformed search options
breadth first search
uniform cost search
depth first search
depth limited search

Looking for an algorithm that is complete and optimum

Wrote the puzzle as a game

Write solvers for each of the search options to try them out

Breadth First Search
Explore each option for every move. Check that move, then check the next one. Use those as the starting point for the next level. Explore every option at each level.
It is complete – it will find a solution
It is not necessarily optimal – it might not find the shortest solution in cases where the path cost is something other than 1 (a non-decreasing cost function).

Depth First Search
Go down the entire path from a move until you exhaust the possibilities. May essentially overflow the stack and never find a solution.
Possibly not complete, probably not optimal.

Depth Limited Search
Limit how far down you go while doing a depth first search.
Complete only if you set your limit far enough. Compensate by increasing the limit if a solution is not found. Knowledge of the possible limit improves this.
Optimal – it can be if you start with the correct limit. If you set the limit to 3 when it should be 4, then you either don’t get the answer orhave to run again with a higher limit, which is waste.
If you set the limit to 5 when it should be 4, you will search more nodes than you need to.

Informed Search
Define a heuristic function (A*, for example)
Take in to account how long it has taken you to get there already to inform your depth.
Completeness depends upon your heuristic function
So for a real-world map problem, straightline distance would be a good choice
If you have an “admissible heuristic function” A* is complete and optimal

Adversarial Search
Chess solver, for example
Tic Tac Toe example – a computer solver cannot be beat (will always result in a tie unless you make a mistake)

minimax algorithm – Minimize the maximum of the other player (take the move that helps your opponet the least)

alpha-beta pruning – only expand the branches that are better than your current state
Makes the assumption that the opponent plays optimally, which may cause a problem

I should point out he has some pretty awesome visualizations of these that are available on his github account. []

Codemash Thursday 9:15a – A guided tour of the BigData technologies zoo


A guided tour of the BigData technologies zoo
#Itamar Syn-Hershko

Big Data is a buzzword. “Big Data is any thing which will crash Excel” (@devopsborat)

Even if you don’t have Big Data, these tools and technologies can still be useful.

Agenda – Data at Rest, Streams, Moving data around

There are a LOT of tools and technologies around big data.

Where are we today:
* Database Schemas
* Unreliable at scale
* Expensive at scale
* Relational mindset
* Data is being moved from storage to compute

Schemas assume structured data, can be hard to set up, and are hard to adapt (lack agility)
Scaling strategy is bigger machines (scaling up) which is more expensive than scaling out (multiple simple machines)

Quote from Grace Hopper (heavily paraphrased):
You can’t grow larger oxen, so you need to get another ox to move a bigger load.

Hadoop – based on Google File System and MapReduce
Commodity hardware
Created by Doug Cutting and Mike Cafarella. Open sourced under Apache.
The original product was called Nutch in 2002. Became Hadoop in 2006.
HDFS – Hadoop Distributed File System
Basically takes a big file and store it on a lot of servers. Divide the file into partitions and store each partition on different servers. Essentially sharding. The partitions are each stored on more than one machine for protection.
There is a NameNode that manages how the data is partitioned and how to reassemble it. Losing NameNode is a problem, so it needs to have redundancy.

More DFS: S3, CephFS, GlusterFS, Lustre

Dedicated File Formats: SequenceFile, RCFile, Avro,

MapReduce – parallel computations on data, based on functional programming concepts
Map processes documents in some way (take sentences and break them in to words, for example), producing tuples.
Reduce takes the tuples and combines them (so take each word and add up the counts)

Hadoop does this in Java – you write a Mapper and a Reducer (they implement interfaces). Then you put the .jar files on Hadoop and it runs the job in place using TaskTrackers. These TaskTrackers are controlled by a JobTracker, which runs the job and spins up the TaskTrackers to do the work. There will be a TaskTracker for each partition of data. This is how parallelism is achieved.

Hadoop now has a bunch of distributions. Apache, Cloudera, Hortonworks are the key ones. Each beyond Apache adds other technologies on top (Impala; HCatalog, Tez). Various features are added by cloud vendors to make managing Hadoop in the cloud easier, too.

Apache Hive – Runs SQL over HDFS using HiveQL
HiveQL is not exactly SQL, but very similar
Compiles down to MapReduce, later versions compile down to DAG (Tez)
Think of MapReduce as assembly language, there are various abstractions you can use above it which have their own advantages (HiveQL is one of them obviously)
Apache Pig is a procedural language that expresses processes on data and compiles down to MapReduce (scripts are called Pig Latin). You can write user defined functions in your own language (Javascript, for example) and use them in Pig

Apache HCatalog – Hortonworks distro only
Defines another way to look at your data files and figure out what files you want
HBase is another one

Workflow schedulers – Apache Oozie, LinkedIn Azkaban, Spotify Luigi are examples

The bad and the ugly:
* Data is not always local
* Still too much I/O
* Slow to compute
* Hard to make JobTracker High Availability (HA)
* Poor resource utilization (you can be either a mapper or a reducer)
* NameNodes are a single point of failure

YARN and MapReduce 2.0
YARN does cluster resource management. People call it an operating system for data processing. Improves on the issues with Hadoop above.

Apache Spark
Resilient Distributed Datasets (RDD) – represented as DAG (Directed Acyclic Graph)
Combine data and actions
Take data, transforms it, performs actions on it
RDD is split to do work in parallel as possible.
Transformation: map, filter, union, distinct, join, etc.
Actions: take, count, first, reduce, foreach, etc.
Works continously instead of in batches
Out of the box – Scala and Python
Has integration with Spark R, Spark SQL, Spark GraphX, Spark Streaming
Spark runs in clusters – can self-manage, or you can run in YARN or Apache Mesos
Driver program sends work to the cluster manager. Worker nodes do the work. Worker starts after the last processed data, so somewhat crash tolerant.
Spark has a large ecosystem of it’s own, similar to Hadoop

Stream Processing
Iterative batch processing (Deterministic batch operations)

Apache Storm
Handles streams
Takes from sources (spouts)
Processes in Bolts
Define a topology of Spouts and Bolts connected together
Runs continously, not batch

Apache Samza
Similar to Storm
Handle each message as it arrives
Garantees ordering

Data Pipes – how do we stream into Hadoop?
RabbitMQ, Cassandra, Redis, Kafka, etc.
Apache Flume

Configuration Management, Synchronization

Since we are talking about distributed systems: read Aphyr’s “Call Me Maybe” blog series[]

ELK – Elastic Search, Logstash, Kafka to work with log streams

Apache Mahout – Machine learning framework

Codemash Thursday 8:00am – Building Highly Scalable Apps on the Azure Platform: Real World Guidance


I am going to be posting my notes from each session I attend. These are raw and unedited, so I apologize for any grammar or spelling mistakes.

Building Highly Scalable Apps on the Azure Platform: Real World Guidance
#Kevin Grossnicklaus

You should sign up for an Azure account.
You get an Azure portal where you can access and create any kind of Azure service.
You can use bit and pieces – he sometimes uses Azure Service Bus with his AWS applications.
He puts an Azure deployment project in every Visual Studio solution.
He does not use the Azure emulator – he sets up a console app to run the service directly for development. It is just a wrapper around his Azure service. This is because it is faster to work with since it does not have to do the deployment, even to the local emulator.
He uses the Azure services that he needs from the console application (ASB, for example). This does require a constant internet connection, but it gives you full access to the services you need with all of the features. The emulator is not at parity with the real thing.
He believes very heavily in message-driven architectures. He uses MongoDB.
He has a logging service that takes in the log events and writes to a queue, and has a log writer service that actually writes to a DB.
All of the parts of Azure have libraries to support them available via NuGet if you are using .Net. There are SDKs available for non-.Net applications.
In order to work with other people, he appends things like the machine name or the environment to the front of the queue name so that messages aren’t stolen by the wrong computer or application.
Serialize small objects to the queue using a text format like JSON.
When you create Azure assets, you select what datacenter they will go in. Typically you want all of you stuff in the same datacenter to minimize latency (but what about redundancy – guess that is a different consideration and out of scope for this talk).
Send work via an async call to a queue to be done by a “backend server” if that work doesn’t need to be immediately visible to the user. For example, you might be able to do a simple form validation then queue up the form submission for later processing.
You can have multiple instances of a service running that listen to the same queue, and only the first one to pick it up will process it. If you want to have multiple consumers of the message, you would send it to a topic and all of the services that are interested in that topic would receive the message.
Now he is talking about precalculating things like a Facebook timeline. He didn’t use the terms “read model” or “eventual consistency” but that is what he is talking about here. He would use SignalR to send the updated read model to the client when it is ready.
Integrated Caching – use the memory in the servers you are already running instead of a dedicated cache. You can define the allocation. He allocates 1/3 of the memory for each of his web servers. This can save you money if you have the memory to spare. You can spin up dedicated cache – your application does not see any difference – it is just a different connection string. Redis is also available, but he doesn’t use it. It would require some code changes.
Azure Blob Store is like an unlimited disk. You put documents in containers, which can be public or private. You can use a public container as a CDN. It is probably better to wrap your blobs in an API instead of allowing direct access. This allows you to do things like reformatting (image size, for example).
He definitely spends a lot of time trying to figure out the best ways to use Azure cheaply, which makes sone sense.
He hosts MongoDB through mongolab, which actually runs on Azure so his databases end up in the same datacenter. You can even put the database in the same Azure network as the rest of your stuff. mongolab actually runs mongo on whichever cloud provider you want – it works with AWS as well.