Orlando Math and Physics Tutoring

November 10, 2013

The derivative as an Operator and the Chain Rule

Filed under: Calculus — Tags: , — Michael Bray @ 2:14 pm

Introduction

In math, an “operator” is a symbol that applies a well-known function or process to one or more inputs to generate one or more outputs, or conveys some relationship, meaning, or constraint to the inputs.  Some operators are ubiquitous: + means addition, means subtraction.  Both of these operators require two inputs and they generate one output.  There are other operators that are much more specialized and/or complicated – for example, the delta operator Δ represents the change in some variable, such as Δx.

All operators in math require at least one input – operators by themselves mean nothing – they have to “operate” on something.  For example, it means nothing to write “f(x) = +” because the + operator requires two inputs, or “arguments”.  Many operators are easy to recognize, but there are some operators that are very easy to confuse with functions, and indeed the distinction is very subtle.  One example is what we often call the “sine function”.  The “sine” however really should be considered an “operator” because it requires and input, and it establishes a well-known operation – that is, no one would ever redefine sine as “sin(x) = x2”.  As before, there is no sensible meaning to “f(x) = sin” because the sine operator requires an input.  In order to make this function a complete statement, you have to write sine “of something”.  In this example, trivially, this would be “f(x) = sin x” or “f(x) = sin(x)”, where ‘x’ is the argument to the function.  The argument to the sin operator could be a more complicated function, of course, such as x2 or any other function of x, but the point is that the argument is mandatory. 

The derivative as an operator

In calculus, the derivative is also an operator – that is, it establishes a well-known process to an input.  As with other operators, it requires an input – one must take the derivative “of something” because “the derivative” means nothing by itself.  In the case of the derivative operator, the input is function of one or more variables, and the output is a new function of those same variables.  However, it is rarely presented as such, and I believe that doing so would make understanding the Chain Rule much easier for students.  Part of the reason may be that there are at least 4 different ways to write derivatives, although only two are in common use in typical calculus courses (the others are more commonly found in specific engineering or specialized math classes).  One of the two common forms is called Lagrange’s notation, and uses tick-marks to indicate derivatives, such as y’, y’’, etc.  However, this notation does not have quite the same efficacy for this discussion, even though it is perfectly valid.  Instead, I will use Leibniz’s notation for the derivative:

One subtlety about this operator is that it can actually be written in two equivalent forms.  The form above is kind of like a “shorthand” version, and is typically used when the argument is a single-character variable.  The more general form exposes the fact that the derivative is actually an operator:

I shall refer to this as the “expanded” form of the derivative operator, and this is the one that will be of particular use, especially when you think of the d/dx as an operator, just as sin is an operator as discussed above.  The shorthand version would probably be read as “the derivative of y with respect to x”, but the expanded form should probably be read as “the derivative with respect to x of y”.  The change in the ordering of the words isn’t particularly significant, as they both mean the same thing, but this second statement, putting the “with respect to” first, seems to be more in the spirit of treating the derivative as an operator, so I recommend saying it this way, if only to reinforce that the derivative is an operator function.  (To be precise, the derivative operator is simply ‘d’, but for the purposes of this article, I’ll be treating the d/dx as the operator).

Note that unlike the shorthand form, the argument to the expanded form (the contents of the parentheses) can and will often be more complicated than a single-character variable; indeed it will often be an entire function.  For example, this is how one would write “the derivative with respect to x of sin(x)”:

The Chain Rule

Some textbooks, when presenting the Chain Rule, use some strange mixture of Leibniz’s notation and Lagrange’s notation, with the result of the derivative using the latter, I suppose because it’s typographically easier.  For example, they might present the derivative of the sine function like this:

This form, however, has borderline personality disorder.  The first derivative is specifying exactly what variable the derivative is with respect to, while the resulting u’ requires that the student take the derivative with respect to an assumed variable.  Even though the meaning here is obvious, it’s not explicit, and this sometimes tends to confuse students when they first learn the Chain Rule.  In addition, they occasionally have trouble keeping track of what they have assigned to ‘u’, and even when they do keep track, sometimes the meaning of u’ seems lost.  I have found that using Leibniz’s notation, and having the student write out the full rule for the derivative at hand helps them to understand exactly what the Chain Rule is telling them to do.  So instead of the above derivative, a more precise form is valuable:

Taking this even one step further, as we did in the introduction, I recommend that this be written using the expanded form:

By doing this, the student can now make direct substitutions and then simply expand the derivative operators as they have been taught.  For example, to take the derivative of the following function:

The derivative rule that applies is:

So by using the expanded form of Leibniz’s notation, the student could do a direct substitution for u=2x, and then continually apply the known derivative rules as needed:

This technique really shines when repeated use of the Chain Rule and/or other derivative rules is required.  This leads to a cascading “assign / evaluate / substitute” procedure.  In the ‘assign’ step, ‘u’ assignments are made such that the resulting rule will be one of the known derivative rules.  Next, the ‘evaluate’ step the appropriate derivate rule is expanded (keeping the Leibniz notation).  Finally, in the ‘substitute’ step, the chosen ‘u’ is re-substituted back into the equation (again, keeping the Leibniz notation in place).  For example:

The assign / evaluate / substitute process can be repeated as many times as needed until there are no derivatives left to be evaluated.  In particular, at each step, the equations are precise as to what derivatives still need to be evaluated in order to obtain the final derivative.  The process can easily be combined with other known derivative rules (such as product or quotient rule) to yield the final result.  As you can see, this technique is useful not only for those just learning the Chain Rule, but even for more experienced students, even if just to keep track of what their current ‘u’ is, and what rule they are evaluating. 

Create a free website or blog at WordPress.com.