Matthew McCullough

matthewmccull
Biography

Matthew is an energetic 15 year veteran of enterprise software development and world-traveling open source educator. Matthew guides the Training efforts at GitHub.com and is author of the Git Master Class series for O’Reilly, co-author of the O’Reilly’s Version Control with Git book, co-author of the Presentation Patterns book, a speaker on the No Fluff Just Stuff tour, an author of three of the top 10 DZone RefCards, including the Git RefCard, and President of the Denver Open Source Users Group.

Advanced Git Tricks

Git is a powerful content tracker and has gained acceptance by many forward leaning consultants and teams over the past several years. Those developers know that it offers the usual commit, branch, merge and tag in a distributed environment, and yet, only a few developers have explored the more powerful functions of Git. These range from searching months of history for a unit-test bug to undoing literally any mistake to splitting in-progress work into multiple commits within a single file.

Many developers like yourself have been introduced to Git’s fundamentals. But do you really know Git and all it has to offer? There is so much developer productivity power lurking beneath the 2nd and 3rd layers of Git commands. Reading the documentation for Git is one thing, but seeing these power-feature in action is mind-blowing. We’ll unwind any destructive mistake in history with the RefLog. We’ll look at best practices in maintaining linear history with Rebase. We’ll make merging the same work onto multiple branches a trivial effort with tracking the merge resolution decisions with rerere. We’ll conclude with identifying the nuances of lightweight and heavyweight tags, pushing compressed object sets on USB sticks, and seamlessly integrating visual merge and diff tools.

Simple MapReduce with Cascading on Hadoop

Hadoop is a MapReduce framework that has literally sprung into the vernacular of “big data” developers everywhere. But coding to the raw Hadoop APIs can be a real chore. Data analysts can express what they want in more English-like vocabularies, but it seems the Hadoop APIs require us to be the translator to a less comprehensible functional and data-centric DSL.

The Cascading framework gives developers a convenient higher level abstraction for querying and scheduling complex jobs on a Hadoop cluster. Programmers can think more holistically about the questions being asked of the data and the flow that such data will take without concern for the minutia.

We’ll explore how to set up, code to, and leverage the Cascading API on top of a Hadoop sample or production cluster for a more effective way to code MapReduce applications all while being able to think in a more natural (less than fully MapReduce) way.

During this presentation, we’ll also explore Cascading’s Clojure-based derivative, Cascalog, and how functional programming paradigms and language syntax are emerging as the next important step in big-data thinking and processing.

Git Foundations

Many development shops have made the leap from RCS, Perforce, ClearCase, PVCS, CVS, BitKeeper or SourceSafe to the modern Subversion (SVN) version control system. But why not take the next massive stride in productivity and get on board with Git, a distributed version control system. Jump ahead of the masses staying on Subversion, and increase your team’s productivity, debugging effectiveness, flexibility in cutting releases, and repository and connectivity redundancy (at $0 cost). Understand how distributed version control systems (DVCSes) are game-changers and pick up the lingo that will become standard in the next few years.

In this talk, we discuss the team changes that liberate you from the central server, but still conform the corporate expectation that there’s a central master repository. You’ll get a cheat sheet for Git, and a trail-map from someone who’s actually experienced the Subversion to Git transition.

Lastly, we’ll even expose how you can leverage 75% of Git’s features against a Subversion repository without ever telling your bosses you are using it (they’ll start to wonder why you are so much more effective in your checkins than other members of your team though).