Using the Derivative class
In this tutorial, we will review how to use the Derivative class
to define derivatives methods in Optimizable objects. For some optimization problems, finite difference
derivatives are sufficient. However, for many problems, especially those with a large number of degrees of freedom,
finite difference derivatives are not practical, and analytic derivatives or automatic differentiation should be
used. The Derivative class smooths the process of implementing derivatives for
Optimizable objects with a large dependency graph by coordinating the computation of
partial derivatives with respect to the fixed or free degrees of freedom. The
Derivative class also facilities summing and scaling
partial derivatives objects, and provides a convenient way to build the full gradient of an objective
function with respect to all (free) degrees of freedom.
In this tutorial, we will
create a custom
Optimizableobject with aJ()method,implement the partial derivatives using the
Derivativeclass,implement the gradient, the
dJ()method, using thederivative_decdecorator,and check the derivatives using finite difference methods.
First, lets import our some modules.
import numpy as np
from simsopt._core import Optimizable
from simsopt.field import Current, coils_via_symmetries, BiotSavart
from simsopt.geo import create_equally_spaced_curves, SurfaceRZFourier
from scipy.optimize import approx_fprime
from simsopt._core.derivative import Derivative, derivative_dec
Lets start by defining some degrees of freedom. Similar to a Stage-II or single stage optimization,
our objective function will depend on a BiotSavart object and a Surface object.
# define surface
nfp = 2
surf = SurfaceRZFourier(nfp=nfp, stellsym=True, mpol=1, ntor=1)
surf.unfix_all()
surf.get('rc(0,0)')
surf.set('rc(0,0)', 1.0)
surf.set('rc(1,0)', 0.2)
surf.set('zs(1,0)', 0.2)
# coils
ncoils = 4
base_curves = create_equally_spaced_curves(ncoils, nfp, stellsym=True, R0=1.0,
R1=0.5, order=3)
base_currents = [Current(1.0) * 1e5 for i in range(ncoils)]
coils = coils_via_symmetries(base_curves, base_currents, nfp, stellsym=True)
bs = BiotSavart(coils)
Notice how the BiotSavart object does not have any DOFs of its own, it simply holds curves and currents. Instead, all of the DOFs of a BiotSavart object are held by the Curve and Current object themselves. A Surface object, on the other hand, has its own DOFs.
Now lets define an objective function and walk through the process
of computing derivatives. In SIMSOPT, the objective function is typically defined
as the J() method. Our objective function is
\(J = \frac{1}{2 n_\theta n_\phi} \sum_{ij} B_{ij}^2\)
where the sum is taken over quadrature nodes on the surface and B is the
field strenth of the BiotSavart field. While it is not strictly required that an
objective function is defined as the .J() method, it is a good convention to follow,
since summing and scaling Optimizables will automatically sum and scale the .J() method.
The partial_derivatives() method is where we will define the
partial derivatives of the objective with respect to the surface and BiotSavart objects. The
dJ function will use the derivative_dec to convert the partial derivatives
from partial_derivatives() into a full gradient of the objective with respect to all free degrees of
freedom (DOFS). It is convention to call the gradient method dJ() in SIMSOPT.
Read the documentation of the functions in the object to learn the specifics of constructing a derivative.
class MyCustomObjective(Optimizable):
def __init__(self, bs, surf):
self.bs = bs
self.surf = surf
Optimizable.__init__(self, depends_on=[bs, surf])
def J(self):
"""
The objective function.
"""
xyz = self.surf.gamma()
nphi, ntheta, _ = np.shape(xyz)
xyz = xyz.reshape((-1,3)) # (nphi * ntheta, 3)
self.bs.set_points(xyz)
B = self.bs.B()
J = 0.5 * np.sum(B**2) / (nphi * ntheta)
return J
def partial_derivatives(self):
"""
Compute the partial derivatives of the J function.
Calling a function with no derivative_dec decorator, like partial_derivatives(), will return
a Derivative object. We can then get the partial derivatives by doing, for instance,
`opt = MyCustomObjective(bs, surf)`
`partials = opt.partial_derivatives()`
`dJ_by_dsurf = partials(surf)`
`dJ_by_dbs = partials(bs)`
This function will return a Derivative object created from a dictionary.
The keys for the dictionary are each Optimizable object that is used in the
J function which 'owns' a DOF. For example, the surf object owns the Fourier
coefficients which describe the surface. So we will add
`derivs = {}`
`derivs[surf] = array of surface derivatives`
On the other hand, the BiotSavart object does not 'own' any DOFs, so, in general, we would not add
anything to the dictionary for the BiotSavart object. All of the DOFs in in the BiotSavart
object are owned by the curves and currents which make up the coils. So for each
coil in the BiotSavart object we do,
`derivs[coil.current] = array of current derivatives`
`derivs[coil.curve] = array of curve derivatives`
The values of the dictionary should be the actual derivative arrays associated to that
Optimizable object.
Conveniently, for a BiotSavart object ALL derivatives can be accessed through the
B_vjp(...)
method. The B_vjp() method returns a Derivative object containing the vector jacobian
product of the derivatives with another vector. So for our case, we will directly use B_vjp
to compute the derivatives we need.
Finally, we return a Derivative object, created from the dictionary.
`Derivative(derivs)`
"""
xyz = self.surf.gamma()
nphi, ntheta, _ = np.shape(xyz)
xyz = xyz.reshape((-1,3)) # (nphi * ntheta, 3)
self.bs.set_points(xyz)
B = self.bs.B() # (ntheta * nphi, 3)
# make a dictionary of derivatives
derivs = {}
"""
derivative with respect to surface dofs
dJ/dcoeff = 1/(ntheta * nphi) sum_i (dX/dcoeff)^T(dB/dX)^T B
"""
dB_by_dX = self.bs.dB_by_dX() # (ntheta * nphi, 3, 3)
dgamma_by_dsurf = self.surf.dgamma_by_dcoeff() # (ntheta, nphi, 3, n_surf_dof)
dgamma_by_dsurf = dgamma_by_dsurf.reshape((nphi * ntheta, 3, -1))
n_surf_dofs = np.shape((dgamma_by_dsurf))[-1]
dJ_by_dsurf = np.zeros(n_surf_dofs)
for ii in range(len(B)):
dJ_by_dsurf += dgamma_by_dsurf[ii].T @ (dB_by_dX[ii].T @ B[ii]) / (ntheta * nphi)
derivs[self.surf] = dJ_by_dsurf
dJ_by_dsurf = Derivative(derivs)
""" derivative with respect to ALL curve/current dofs """
dJ_by_dbs = self.bs.B_vjp(B / (nphi * ntheta)) # Derivative object
""" Derivative objects are summable. In the case that an Optimizable object, such
as a surface, exists in both Derivative objects, then the derivatives of the Optimizable
will be summed. If it only exists in one Derivative object, then the derivative of the sum
will be just the sole derivative value. For example,
# if deriv1 and deriv2 are Derivative objects with an Optimizable surface then,
(deriv1 + deriv2)[surface] = deriv1[surface] + deriv2[surface]
# if the surface only exists in deriv1 and not deriv2
(deriv1 + deriv2)[surface] = deriv1[surface]
Derivatives can also be multiplied. However, the multiplication rules differ slightly from
the addition rules.
"""
dJ_by_all = dJ_by_dsurf + dJ_by_dbs
return dJ_by_all
"""
We did not use the derivative decorator when constructing the dJ function.
The derivative decorator is optional, but has a key impact on functionality:
a function wrapped with the `derivative_dec` will return a gradient array
with respect to all dofs, as opposed to a Derivative object. This is useful for numerical
optimization purposes where the full gradient is used, rather than partial derivatives.
ex:
gradient = self.dJ()
"""
@derivative_dec
def dJ(self):
return self.partial_derivatives()
Evaluating the objective and gradient is easy. As described above, we have two methods for
compute derivatives: dJ and partial_derivatives. dJ computes
the gradient with respect to all free DOFS, while partial_derivatives computes partial derivatives
with respect to parent Optimizable objects.
obj = MyCustomObjective(bs=bs, surf=surf)
# objective value
print(obj.J())
# gradient w.r.t all free dofs
print(obj.dJ())
# partials
partials = obj.partial_derivatives()
dJ_by_dsurf = partials(surf)
dJ_by_dbs = partials(bs)
print(dJ_by_dsurf) # partial
print(dJ_by_dbs) # partial
We can check that the derivatives are correct using finite differences.
# check derivative w.r.t. surface dofs w/ finite difference
obj.unfix_all()
bs.fix_all()
x = obj.x
def fun(x):
surf.x = x
return obj.J()
dJ_by_dsurf_fd = approx_fprime(x, fun, epsilon=1e-7)
print('surf dof finite difference error', np.max(np.abs(dJ_by_dsurf_fd - dJ_by_dsurf)))
# check derivative w.r.t. coil dofs w/ finite difference
obj.unfix_all()
surf.fix_all()
x = obj.x
def fun(x):
bs.x = x
return obj.J()
dJ_by_dbs_fd = approx_fprime(x, fun, epsilon=1e-6)
print('coil dof finite difference error', np.max(np.abs(dJ_by_dbs_fd - dJ_by_dbs)))
This tutorial covered the basic functionality of the Derivative class, but don’t forget that there
are additional features that could be useful.
For example, if two optimizable objects are added together, the dJ() methods will automatically be summed.
e.g. obj1 + 5*obj2 will return a new Optimizable object with the dJ() method equal to obj1.dJ() + 5*obj2.dJ().
Furthermore, another Optimizable object dependent on this one could use this dJ() method as part of its own gradient computation,
similar to how we used the B_vjp() method from the BiotSavart object.
Derivatives are implemented all over SIMSOPT. For more examples, see the SquaredFlux class, or the
BoozerSurface class.