CodeMash 2013 Day 3 Session 2

CodeMash

# Objective-C is not Java
Chris Adamson – @invalidname

Language is effectively owned by Apple now and tied to versions of XCode. gcc is rapidly getting out of date from XCode and should not really be used anymore.
Single inheritence model.
Method calls [object method];
[object method: param1 paramName2: param2];
Method calls are really message dispatches
Slides (from which he is pretty much reading) are available at his slideshare account, so I think it’s two feet time.

I did at least learn enough about Objective-C to see that no sane developer would ever willing use it. All those square braces are crazy. I do like the message-passing concept, though.

CodeMash 2013 Day 3, Session 1

CodeMash

# Puppet for Developers
Eric Hankinson

## Automating your infrastructure
If your infrastructure is code, you can make it testable and version control it. Automation ensures that it truly is repeatable.
Puppet isn’t just for *nix-based OSes – it works on Windows, too.
Removing manual processes removes the chances for errors.

## Puppet Model
There is a Configuration language.
Puppet master and nodes or standalone.
Resources are the parts of your system.
Facter is another piece that lets you define facts (OS version, ipaddresses, etc.)
Nodes are called agents.

## Puppet Language Basics
You can include other scripts and require dependencies
There is support for variables and conditionals
The language appears to be a DSL with some aspects stolen from Java, Ruby, and JavaScript. _I’ll need to research that further._
Lots of slides of code but the spoken words are more vague. The slides of code are mostly just wallpaper.
There are templating capabilties that you can use to pump out configuration files for your services.
You can use inheritence to define a general machine configuration and then specify specifics for each machine of that type (so a web server config with overrides for the ip addresses)
You can build custom parsers to do things like pulling your password and secrets from some other source (you wouldn’t want to check in a secret)

## Custom Modules
There is a Puppet Forge that has lots of already defined custom modules for a lot of existing tools. Not a lot of Windows content, though.
To create a module: puppet module generate leandog-fatcat
Term before the dash is the company name, term after is the module name. Company name is required, so you have to have a dash.
There is an expected folder structure generated that you can use to put the appropriate items in
There is a module file you need to create that has the metadata about the module.
rspec-puppet allows you to use rspec to test your puppet modules
puppet module build will build your module so that you can deploy and use it.
http://forge.puppetlabs.com is a repository of puppet modules – it does not contain the module code but provides a list. You are supposed to put a link to your github repo where the module lives

## Vagrant
Vagrant manages virtual machines
Can manage VMWare or VirtualBox VMs
Puppet and Vagrant work well together

CodeMash 2013 Precompiler Day 2, Session Two

CodeMash

This afternoon’s session is Real-world JavaScript Testing with Justin Searls. This is supposed to be fairly hands on so I apologize if my notes aren’t very good since I will be focusing on doing. 🙂

We will be using a tool that TestDouble (the presenter’s company) built called lineman. It will run tests upon the file being saved. The test framework we are using is Jasmine, which has it’s roots in RSpec. QUnit is also a reasonable alternative.

Jasmine has spies, which are test doubles. Invoked by calling spyOn(<object>, <methodname>) – so, spyOn(loading, ‘show’).

Inject HTML into your tests with jasmine-fixture to avoid your testing being too reliant on the DOM.

You can use JSLint or JSHint to do static analysis of your JavaScript. CoffeeScript code is guaranteed to pass JSLint.

jasmine-given gives you a given-when-then syntax for your tests which many people consider more readable and intuitive.

And then we started pairing and writing some code and tests. Good stuff!

CodeMash 2013 Precompiler Day 2, Session One

CodeMash

Today’s first session is Making Test Automation with Web Applications Work with Jeremy Miller. He’s the guy behind StoryTeller, which is the automated testing tool we use at work.

What makes a good test? Any test that isn’t repeatable is useless.

Test with the finest grained mechanism that tells you something important. Don’t test business logic through the screen. It does make sense to test your UI “magic” (conventions, routing, presentation logic, etc.).

White box testing. The goal of testing is not to make sure your application is perfect. It is about reducing risk to an acceptable level. So, white box testing is very valid, since it tends to be cheaper.

Collapse your application into a single process/appdomain. Even if you have multiple services – bring them all together into one process. You solve your functionality problems and ignore the communication issues. You then can test the communication separately.

Test data sets suck. Define your test data in your test in a declarative fashion.

Diving straight into code and Storyteller. I kind of feel sorry for anyone who hasn’t seen Storyteller before, because there is no explanations about what they are doing with it. They are using RavenDB, a document database that can run in process. This helps them easily create new databases for each test without a lot of the relational overhead. They just put in the documents they need for the test.

There are lots of issues with test data. If you aren’t careful it will lead to brittle tests. This is why you should strive to tie it to the tests themselves instead of defining the traditional large sets of data. If you set up your data as part of your test, it becomes clearer to those trying to understand what the test is testing.

There exists an embedded SMTP server you can use to test email. They also use a tool called PortFinder that finds an open port you can use to listen on, which could be very useful for testing things that use TCP/IP.

If your tests are running all of the time (in continuous integration) then they are worse than worthless because they hold you back.

They use webdriver to run their UI layer and have defined what they call “Gestures” in Storyteller to perform standard UI operations.

Provide element handlers for each type of web element to facilitate making less brittle tests.

Separate your test expressions from you screen drivers. Break out the screen driving logic so that your actual tests are testing logic, It reduces brittleness.

Thread.Sleep is the easiest way to make your test suck, take too long, and unreliable. Build reasonable waiting tools that are more graceful.

It can be valuable to have a stub date/time class so that you can have better control over time. You may even want to consider taking over the system clock to give you friendly timestamp-sensitive tests.

At this point I had to bail. Too much magic, not enough meat.

CodeMash 2013 Precompiler Day One, Session Two

CodeMash

I am going to “live blog” my notes from CodeMash as it goes. I’ll write a post during each session. My commentary will be in italics.

Sridhar Nanjundeswaran is the dev lead for MongoDB on Azure, so that makes him a useful person to know, I think. 🙂

MongoDB is a scalable, high-performance, open source, document-oriented database. Data is stored in BSON, which is optimized for fast querying. MongoDB trades off some of the speed of a key-value store for a lot of the functionality of a relational store. The big mixing features are transactions and joins.

Data is kept in memory-mapped files with b-tree indexes. It is implemented in C++ and runs on pretty much every environment. There are drivers for most development environments/languages. There is a mongo shell that you can use to access it directly. There are drivers from third parties as well for other environments. The drivers translate the BSON to native types, which allows developers to use the idioms of their language.

RDBMS MongoDB
Table Collection
Row(s) Document
Index Index
Partition Shard
Join Embedding/Linking
Fixed Schema Flexible/Implied Schema

There is a a transactional guarantee around single document saves, and when you consider that what would have been in a relational transaction across multiple tables is generally all in a single document. So, for the use cases where you would want to use MongoDB, you have sufficient transactional integrity.

MongoDB has extensions to JSON to facilitate some additional data types. BSON spec defines these. In general, MongoDB will take more space than a relational DB for the same data due to the schema being embedded in the document.

MongoDB is very light in it’s use of CPU, but demands lots of memory and disk (relative to a similar relational DB).

_id is the “primary key” for an element in collection. ObjectId is a 12 byte value that is like a Guid but smaller (Guids are 16 bytes). ObjectIds are guaranteed unique across a cluster.

The MongoDB Shell is sort of like SQL Server Query Analyzer except it uses JavaScript instead of SQL.

Counts in MongoDB can be slow – the b-tree index used is not optimized for counting, but the next version of MongoDB is addressing this.

The JavaScript looks very much like twisted SQL, but it allows you to use Map-Reduce as well. MongoDB supports Hadoop, too.

A document in MongoDB can be up to 16 MB. GridFS is a mapping system to allow you get get around this if you need to.

You can change a document’s data or schema at any time, but you cannot do a push to an existing field that is a scalar. You would need to rewrite the entire entity.

“Normalization” gets really tricky. Do you store the child entities in the document of their parent, or so you store some sort of link/reference and store the child as it’s own collection. The trade offs are how hard is it too query, storage, performance, and maintainability. So, how do you update the entity if it changes and is used in a lot of documents? Sometimes that is acceptable and sometimes not. You have to determine what makes sense for your case.

There is a DBRef, but it is really just syntactic sugar that generally does not add an value. It is NOT a form of foreign key constraint as people often try to use it.

Indexes are the single biggest tunable performance factor in MongoDB (just like in relational databases). Too many indexes has the same problems as in relational – writes become slow, heavy resource usage (memory in this case is the bigger deal). Indexes get built during the write operation, same as in relational. Generally, the same rules and practices apply to MongoDB indexes as relational.

You can actually create unique constraints via indexes. The only database constraint you can enforce in Mongo. You can recreate the index as well, but that is a blocking operation.

You can create an index that defines a time to live which will delete the document when the TTL expires. There is a process that does this work every minute.

A collection cannot have more than 64 indexes (which is way more than practical). The size of the key values in the index cannot exceed 1024 characters and the name cannot exceed 127 bytes. You can only use one index per query. You can not sort more than 32 MB of data without there being an index on it. MongoDB indexes are case sensitive.

Append .explain() to any JavaScript query to see it’s query plan. There is also a profiler you can turn on that will profile everything or any query that exceeds a threshold. Several query plans are tried, the winning one is cached. It remains cached for some period of time, but not forever.

Replicate you database to multiple other nodes. One node is the primary. If it dies, the remaining nodes vote on who should become the new primary. In the event of a tie, there won’t be a primary and we’ll have a read-only situation. To avoid that, make one non-primary node the arbitrator who will ensure that someone wins the election to primary. You can also set priorities for the nodes to help define failover order. You can define a node as “hidden” which will prevent it from ever becoming the primary. You might do this for a reporting node.

You can set replication delays. Replication is asynchronous and pull data from the primary. You might set a node to delay replication (slaveDelay) so that you get a bit of a backup that will still have things that might be dropped or other destructive events. You would want to mark this node as hidden as well.

Replication is eventually consistent, but if you require things to be always consistent you could query the primary always. There are times when you need that and times when you don’t. There are some very complex procedures around how to configure replication and primary voting – you can probably cover yourself for most scenarios you would need. You can replicate to a node that is physically not in the same data center, which allows for disaster recovery.

You can define how you read data – Primary Only, Primary preferred, Secondary only, Secondary preferred, or nearest. If you need consistent reads, use primary only or preferred.

In order to get disaster recovery, you need to have multiple data centers (more than 2) with more than one node per data center.

Heartbeat goes out every 2 seconds. It times out after 10 seconds. A missed heartbeat means a node is down and triggers the election if it is the primary.

There is an oplog that tells what has happened in replication. It is replayable. There is no multi-operation transactions, but every single operation is transactional – it succeeds or fails all on it’s own.

Bad things happen when the clocks of a replica set get out of sync. Keep them in sync.

The slides for these presentations are generally here, I think: https://speakerdeck.com/sridharnanjundeswaran