Reusability is a Deal with the Devil

Uncategorized

OK, it’s not, but it has some similarities… :)

Reusability sounds like a great thing. Build something and reuse it often. What’s not to like? Well, a few of things.

If you build something with the intent of being able to reuse it, by definition you are building more than you need right now. It takes extra effort to build in the extensibility points. You need to test for the hypothetical future cases.

There are cases where this extra effort is worthwhile. If you know you are going to be building a lot of similar applications the work you are saving yourself should offset the extra work. When you are building an interface to third party systems it makes sense to build in extension points. If your business is to build libraries for other developers, of course they need to be reusable. But these are rare cases and usually only need limited reusability.

If you have existing code and want to make it reusable, that effort has a cost. If the current codebase is not good, you will pay a lot to make it reusable, and not just immediately. The code will always be buggy because it started that way. To further tax the debt metaphor, you will pay interest on that code forever.

There are times when it might be OK to take current code and reuse it. In these cases you will want to copy and paste it into a new project. It should not keep any ties to its past life. It is serving a different purpose. The chance that it will solve the same problems is low. Trying to keep it in sync will just lead to more bugs.

If you really need to reuse code, you have to have automated tests around it. You need that safety net. Without it, as the code evolves away from its original purpose, bugs will creep in. A change to business logic to solve the current problem may have hidden impacts on other parts of the codebase. Automated tests will save you from a 4 hour hunt for that method you didn’t know existed because of a dependency injection trick.

If you need to get a feature done fast, reuse is your solid fair weather friend. Just be careful when the weather changes.

Un-Meet-Ups

Uncategorized

Often the best part of a conference is talking to other people in the hallway. The best part of many user group meetings, um, I mean, meet ups, is going for a beer afterwards. Talking to the other people in your “tribe” is a lot of fun. It helps to hear about other people having similar challenges and how they overcome them. Sometimes it is nice to tell a story and see other heads nodding along.
What if you could have that experience without having to have a meetup or conference? Sure, there are happy hours with your co-workers, but it’s not the same. You work with these people daily so the conversation degrades to complaining about the trivial things at work.
I am not sure exactly how we started. I think it started as a Software Craftsmanship group (anybody still have one of those?). It started before I got involved, but a friend invited me along one time. We met in a bar and did lightning talks about things. Over time the prepared talks went away. Now we just hang out and talk about life. Much of the conversation is about technology, but we talk about all kinds of things.
The membership has changed over time. The friend that brought me in no longer attends. Most of us have never worked with each other, but we all know some people in common. We’ve mixed up the activities, too. We started a year or two ago going for long walks along the Montour Trail. We started that out with lightning talks, too – one every two miles. But the talks fell away and now its just random.
I don’t have any great advice for how to get started. Don’t overthink it. Maybe just try posting something to twitter and see who responds. The important part is that you keep doing it. Over time these people have become good friends of mine. I look forward to hanging out with them every month or so.

I would have written a shorter letter but didn’t have time.

Improvement, Techniques

I received an email from a coworker. The email was terse and wordy at the same time. I’m not sure what the author wants from me. I think he wants me to take some action. Perhaps he wants me to help advocate for the use of the tool he is talking about. Perhaps he wants me to tell him what his next steps should be. It is a rambling email that never gets to the point.

I read my fair share of emails like this. I have written them as well. And not just email. Much of what I read could get to the point much faster. I am trying to get better at writing by writing more. I am using an excellent tool, Hemingway Editor, that helps a lot with keeping my writing succinct.

The biggest problem I have in my writing is long rambling sentences. I often write sentences that should have been a paragraph. These are never as clear and understandable as they sounded in my head. That alone is reason enough to use Hemingway but it has other features that help reduce complexity. If you spend any amount of time writing, you should give it a try.

Responsibility

Uncategorized
From the random stack of unfinished blog posts…

There is a problem with processes and documentation. We tend to use them as shields to prevent us from having to think or make decisions

Software development as a profession is a funny idea. We want treated as professionals, but most of the time we don’t seem to want to act as professionals. We want to hide behind “processes”. We love our processes. It only makes sense, I suppose, since we spend most of our time trying to figure out how to get computers to keep the users stuck in processes. Processes to buy goods and services, or to fill orders, or to handle insurance claims. Things that are repetitive.

Automating processes like that is a good thing – that’s why we do it. It saves time and reduces errors caused by repeating similar actions. It takes the drudgery out of these things and lets the user to get back to their other activities. These other activities are more creative or fulfilling. Or it enables unskilled labor to deal with tasks in a reliable fashion, freeing up skilled labor to do more profitable things.

That is the problem: we are trying to automate processes that make it possible for unskilled labor to develop software. But we’re not unskilled labor – its just not possible for unskilled labor to do these things. Developing software is a creative activity. Creativity doesn’t happen as well when its done under a long list of constraints, which is what process does.

Is all process bad for software development? Just like empowerment doesn’t mean self-management you can’t completely avoid process. Process constrains us to be able to produce software in reliable, semi-predictable ways – to a point. The problem is that we tend to go past that point. We want constrained into tiny boxes, so we don’t have to think a lot, so we can blame the process when we fail. We are risk averse and afraid of failure.

Agile is attractive because we know we have to do something to enable us to produce good software in a reasonable amount of time. We know that the old ways just weren’t working. The problem is, we have years of experience (and beatings) that failure is bad. We clutch to any process we can so that we can blame the process when things go wrong. And because we don’t truly embrace agility, things do go wrong. So we add more process. And things get worse. And so on.

The next time a problem occurs, try something else. Instead of figuring out how to add more process to prevent the problem, think about how taking responsibility could have stopped it sooner. Sometimes you just have to stand up and do what’s right. Admit that you were wrong about some aspect that is now causing a problem. We have “fear of failure” so ingrained that we have to do anything to prevent being seen as failures. And that is just wrong.

Codemash Friday 2:45p – Getting Started with Machine Learning on Azure

Uncategorized

Getting Started with Machine Learning on Azure
#Seth Juarez

ML on Azure is easy…
…if you understand a few things about machine learning first

Agenda
1. Data Science
2. Prediction
3. Process
4. nuML
5. AzureML
6. Models (if time permits)

Data Science
key word – science – try something, it might work, repeat with a different trial, etc.
Science is inexact
Guess, test, repeat

Machine Learning
finding (and exploiting) patterns in data
replacing “human writing code” with “human supplying data”
the trick of ML is generalization

Supervised v. Unsupervised Learning
Supervised – prediction – we know what we are looking for
Unsupervised – lots of data, try and figure out things from the data (clusters, trends, etc.)

Kinect finding your body is Supervised (knows what it is looking for)
Netflix figuring out recommendations is Unsupervised

What kinds of decision?
binary – yes/no, male/female
multi-class – grades, class
regression – number between 0 and 100, real value

multi-class can be done using binary (A v. something else, B v. something else, etc. – then take the best scoring one at the value)

Process
1. Data
2. Clean and Transform the Data
3. Create a Model
4. Use the Model to predict things

“features” is the term used to describe the attributes that could influence your decision
Data cleaning and transformation is just shaving yaks (it takes a lot of time)

nuML [http://www.numl.net/]
A .Net ML library
Comes with a REPL for C#
Attributes to mark things [Feature], [Label], etc.
gets data into matrix
(Mahout does this stuff in Java)
turns it into a model
has visualizers for the model (graphical and text based)
Can use the model to create a prediction given a set of inputs (features)

How do you know it worked?
Train and Test – use some portion of your data to train the model, and then the rest to test (Seth is suggesting 80-20)
nuML has facilities to do this
It will take your data, create a model, score the model, and repeat 100 times
Then it returns the best model
You have to be careful of overfitting the problem – if you create too fine grained of a model you might overfit the model to your data and get bad predictions.
Training v. Testing means – if you have 10 rows of data, you will train on 8 rows and then test with 2

Limitation – limited amount of resources on your machine (CPU, RAM, Disk)

AzureML
Drag and droppy action to set up the path through the steps above
Will give you some charts of your data so you can start to get some insight into it
Has a bunch of transformation options, also draggy
If you don’t know the 4 step process, the “wizard” can be tricky, but if you know it, it’s fairly straightforward.
You just drag and drop your flow through the steps – you can have multiple transformations (shaving the yak)
You can define the training/testing ratio by specifying the percentage to use for training
Define scoring for the trained model
Evaluate the scored models
You need to define what you are learning so that it can train the model
precision – true positive over true positive plus false positive
accuracy – tp + tn / the whole set

If you are getting 95% accuracy, you probably have something wrong – that is usually too accurate (overfitting). Ideal is in the 80-90% range.

You can run multiple scoring methods – you will get a chart comparing them

You can publish as a web service to use your model for predictions

Linear Classifiers – need to define a boundary between sets
Perceptron algorithm – draws a line between two sets
Kernel Perceptron – takes data that can’t have a line drawn between it, 3D-ifies it, draws a plane between the sets

Lots of math formulas – review the slides

“Kernel Trick” – will allow you to determine a seperator between two sets no matter what space by going into multiple dimensions with the data

If you cannot get a reasonable answer – neural networks
A network of perceptrons
Now we’re in to calculus