Python Fundamentals for the Pipeline

Taught by Michael Morehouse

Course Number:
PYT201
Software Version:
Python 2.7.2 
Original Run Date:
October 2011 
Duration:
12 hours 7 minutes 
pipeline
Taught by returning prof Michael Morehouse, PYT201 will explore the use (and, only occasionally, abuse) of Python in solving the fundamental problems of a VFX pipeline. Rather than focus on the various APIs of the dozens of proprietary and commercial packages you might encounter in your career moving from facility to facility, this course will emphasize the core fundamentals of building robust, efficient, well-documented and easily maintained modules and command-line tools that do the job well and do it often, and yet remain customizable enough to be empower future development as you build a library of useful tools. We will focus on keeping your code and skills as portable as possible, leveraging on the versatility of the core Python package and a few basic open source packages such as PyYAML. You will learn to document code using the Sphinx document generation system and ReStructured Text, and you will learn to check your good coding habits using Pylint.

Throughout the course the emphasis will remain on thinking through a problem and attacking it with a library or command line tool, then testing and optimizing your code, while documenting it all the way through. While some of the more glamorous and exotic solutions will have to wait for later courses, this course will give you the fundamental skills that will help get you and keep you employed in the pipeline.

For this course you should come prepared with a reasonable familiarity with the basic Python language. Be comfortable with the idea of creating several modules which import from each other. Be familiar with how to define functions and classes, and have at least a passing understanding of object-oriented inheritance. Be prepared to work along in the command line and be reasonably familiar with terminals and their operation. Some basic shell scripting and administrator knowledge is desirable, and ideally you are comfortable enough to download and compile some simple code from source. The lessons assume you are either working in a Unix-style environment, or have your Windows configured sufficiently to engage in Unix-like command line operations. Additionally it is presumed that you have a working copy of Python 2.X installed, preferably at least Python 2.6, as well as the text editor of your choice that supports Python syntax highlighting.

Having come from a life where he rose through the ranks and changed careers more often that some people changed their pants, it's pretty surprising to realize that Morehouse has been a 2D TD at Digital Domain for almost three years now. At Digital Domain he has been responsible for large projects involving cross-facility asset and software syncing, outsource data ingestion, and the overall Nuke pipeline while also supporting films such as Tron: Legacy and Thor. Prior to that he worked in motion tracking while teaching himself Python, and before that he did everything from build and prep cameras to inventory management, tax accounting, and Sarbanes-Oxley compliance in the production rentals end of the industry. In short Michael is far more flexible than a rubber sheet and equally as hard to pin down.
 
Python Fundamentals for the Pipeline
Watch our overview of the course

Class Listing

Class 1

Setting the Stage, Defining the Problem. A quick introduction to virtualenv and package management using PIL: we'll set up a clean Python environment to work in and install some of the base packages we will be using throughout the course, including PyYAML (and optional C extensions), Sphinx, pylint and others. Next we'll take a high level look at some basics of the fundamental problem we'll be attacking throughout the semester: exactly how do you find image sequences on disk and move, rename, and renumber them while maintaining some fundamental knowledge about these files?

Class 2

This Problem isn't so Simple: Thinking in Building Blocks.
Defining a pure Python class for thinking about Frame Ranges: a Sequence is a set of files with this one property in common, they all belong to some range with a first frame and a last frameā€š but what if the entire range isn't here yet? Tricking out a Class for handling this information.

Class 3

Command Line 1: Listing a Sequence.
Our first command line tool will simply list the Sequential files of any given directory path(s) we give it. We'll explore the basics of option parsing young the optparse module. We'll add some (very basic) logging and also look at the creation of a configuration file using Yaml.

Class 4

Command line 2: Making it Sing.
Make our Sequence lister walk directory trees, and dramatically expand it's option parsing to allow filtering based on extension, regular expressions, glob-style unix syntax, and other means. Display options together in functionally related groups, and generally trick out a command line tool interface.

Class 5

Command line 3: Wrapping up sequence listing.
Explain and expand on how itertools, ifilter, fnmatch, and regular expressions are used to progressively filter our discovered sequences, and show how all the new command line options do their thing. We also dive into using nose and the nostools command to discover and run automatable unit testing on our package to ensure that changes in our code don't lead to regressions or bugs in its behavior.

Class 6

Renaming and Renumbering Should be Easy. Expand our toolset to allow powerful renaming using Regular Expressions, and renumbering using our earlier FrameRange class and our newfound skills.

Class 7

Threads, Queues, and Hashing.
Explore simple parallelization using python's threading. Thread class to spawn a pool of workers to perform parallel tasks. Discover a simple pattern for using the Queue. Queue class to synchronize and control Thread behavior. Modify the rnames tool from last lesson to allow Threading, and add a new cksums tool to generate md5 checksums of a sequence.

Class 8

Wrapping up the command line: more hashing, and threads.
Enhance the cksums tool from last lesson with all of the standard checksum/hash algorithms from zlib and hashlib. Create smvs, a tool for securely moving sequences using a chained Queue structure to manage threads. Ensure that the new copy of the file exactly matches the old using checksums.

Class 9

Automated testing expanded; using pylint to verify coding standards. We dramatically extend our automated unit tests into class-based tests with setup and takedown of fixtures. Created test suites for all of our Model and Controller modules. We then use pylint to verify that all of the code in Pysequences is up to our facility coding standards. Good habits in testing and coding in a working world.

Class 10

Wrapping up; documentation and distribution.
We use Sphinx with Autodoc to slurp up all of those well crafted and verbose help strings we have been judiciously and religiously adding to our code in the hope that those who follow us will not have to deal with what we had to deal with. We wrap up PYT201 with a brief discussion of distribution systems and an overview of how the skills and ideas presented will help you in your quest to master the pipeline before it masters you.