P-Fea Course. Lesson 5- Miscelleneous Implementation Issues


Answer to Homework problem For Lesson 4: “using equation (2) above, if the normalized error in an element that is currently at p2 is 0.5, and the error tolerance is 5 percent (0.05), what is the estimate of the required p order in the next solution pass. Comment on how aggressive the use of this estimate is compared to simply increasing the p order by 1 on each solution pass.

In equation 2, p1 is 2 (the current p order). The current error, ε1 is 0.5, and the target error ε2 is 0.05. So the required p order is p2 = 2(10) 1/2= 6.3, which is rounded down to 6 because the fraction is less than 1/2. I would say this is pretty aggressive, especially considering some adaptive algorithms simply increase the p order by 1 each iteration, while this algorithm is jumping from p 2 straight to p 6.


There are a few ways in which p-adaptivity is a little trickier to implement than conventional finite elements. Also, there are some issues about reading meshes that were specifically created for conventional elements and using them for higher order elements. And there are some issues about element quality. These are the subjects of this lesson.

Point vs distributed Loads and Constraints

It is common for finite element preprocessors to convert distributed loads and constraints into equivalent nodal ones. But they are only equivalent for conventional elements. I’ll describe that in detail below for loads. The other important issue is that point loads and constraints are actually singular. If they are not processed correctly, there would be localized stress concentrations at them., the code that detects singularities would be creating sacrificial elements. So the first step is to detect the intent. If all the nodes of a face are loaded or constrained the same way, that is assumed to be a face boundary condition, not nodal. If a series of faces have the same boundary condition, and there is not a significant slope discontinuity at the shared edge of adjacent faces, that is assumed to be a boundary condition on the underlyings surface. The distributed face constraints can be handled readily in Cartesian coordinates. In curvilinear coordinates, the easiest way to handle them is with a penalty method. I’ll discuss that next. Handling conversion of point loads to equivalent distributed tractions is below.

Constraints in Curvilinear Coordinates

In a conventional code a curvilinear constraint such as shown on the left can be converted into nodal constraints as on the right. So for a node a 45 degrees, The constraint in the R direction means no translation in the “x-prime” direction at 45 degrees. This won’t work for higher order functions, unless they are constrained to 0, but that would make the elements adjacent to the constraint unable to adapt. The mathematical values of the constraints on the higher order functions can be calculated, and in general for curvilinear coordinates end up being equations involving multiple functions. These can be processed directly, but that can slow down solvers considerably. That was our experience with Mechanica. An easy to implement approach is a penalty method, which essentially uses stiff springs in constrained directions. As long as the springs are stiff enough compared to the adjacent elements this is accurate.

This approach is used in stressRefine. There is the further advantage that the penalty can be added at the element level, which is done in SRelement::AddPenaltyToStiffnessandEnfDisp. The penalty is calibrated by assuming there is a small layer if thickness equal to (element size/Penalty), of the same material as the element, attached at the boundary. So, for example, if Penalty = 10000, the layer is 0.0001 times the element size. The size is estimated from the distance from the middle of the constrained face to the element centroid. The higher the penalty, the more accurately is the constraint enforced, but too high of a penalty degrades the conditioning of the element stiffness matrix. 10000 is a good compromise. Experience with stressRefine has shown that this approach is accurate and does not affect the speed of the equation solution.

Distributed Tractions vs. Point Loads

Many finite elements accept distributed tractions as inputs, for example the PLOAD4 card in Nastran. But preprocessors often don’t use them, instead converting the tractions to equivalent nodal loads. They are “energy consistent”, in that the point loads do the same virtual work on the boundary as the original tractions, as is discussed in any finite element textbook. But they are consistent only for the p order for which this calculation was done, such as 2 for a conventional quadratic element. After it has been determined, as discussed above, that a series of nodal loads on a surface actually originated from distributed tractions, the traction values are recovered by doing the virtual work equivalency in reverse. In stressRefine this is called “energy smoothing”, and is done in SRinput::EnergySmoothNodalForces.

Element Quality

A major advantage of higher order elements, properly formulated, is that they are relatively distortion-insensitive. The quality of elements from typical automeshers is often plenty good enough, as long as a mesh setting “curvature-based” or equivalent is turned on. There is on exception that was discussed here: automeshers sometimes produce poor quality or even invalid elements when they first produce linear elements, then project the midnodes to curved surfaces. Many finite element codes will simply fail when presented with these elements, giving a cryptic error code, and users are typically advised to refine the mesh. This can be frustrating for large models, when you have trouble getting them to mesh in the first place. It is possible to recover from this situation by moving the midnodes away from the surface. The problem is you are no longer solving the right physical model. For example if you started out with the geometry on the left and ended up solving the one on the right because a midnode got moved because of an invalid element, you’d end up introducing a singulariy that did not exist in the original model.

One thing that can be done is to note that is not necessary to flatten a curved face entirely to make the element valid. That is depicted below in 2D, There is a vector from the original midnode position and the position of the middle of the straight chord. The midnode can be moved in intermediate steps along that vector

This is done in stressRefine. When an invalid element is encountered (the determinant of the Jacobian of the mapping is <= 0), curved boundary faces are flattened in successive steps, to try to find the smallest amount of flattening necessary to fix the element. This is carried out before the elements stiffness routines are called, in SRmodel::checkElementMapping. This calls SRelement::testMapping for each element, which does the flattening if necessary. Elements whose curved faces have been significantly flattened are made “sacrificial” (they are kept at p2 and excluded from calculation of the maximum stress in the model).

Often this has little effect on the accuracy of the solution. The important factor is whether any element that has been significantly flattened is near the maximum stress in the model. This is checked after the solution in SRmodel::checkFlattenedElementHighStress. A warning is issued if this situation is encountered, and it is recommended that the user refine the mesh. This procedure has been able to “rescue” a significant amount of models for which Nastran and other FEA codes would “choke” on the mesh. And in practice it is not necessary to issue the warning very often.


Homework Problem: Consider the first figure with the constrained hole above. Suppose instead that this was a distributed traction in R. Now suppose the preprocessor replaced the traction with nodal loads. We are using higher order elements, so they are not energy-equivalent to the original traction. Explain physically why these nodal loads introduce singularities.

P-Fea Course. Lesson 4- Adaptivity and Error Estimation

Answer to homework problem for Lesson3-

No-, the element stiffness routines in the simplified version in SRlibSimple are not thread-safe. To be thread-safe, the same memory cannot be accessed from adjacent threads. That is true for the element routines in the SRelement class with one exception- they use scratch space “eldata” provided by Srmodel. There is only one copy of this space. In the “full” version of StressRefine, the elements can be run multi-threaded because copies of the scratch space are provided for each thread. The elements are called from Srmodel in routine CalculateElementStiffnessesOmp, which calls the element routines inside “#pragma omp parallel”. There is a call to omp_get_thread_num() which is used to determine which thread each element is in, that is passed in to each element’s stiffness routine. Using this the element chooses the correct copy of the scratch space. Omp, short for OpenMp, is an easy to use parallel processing approach used in stressRefine.

By the end of lesson 3, we have seen how to calculate basis functions for elements whose edges have arbitrary polynomial orders, and how to use those functions in element stiffness routines and stress calculations. To apply this to adaptive stress analysis, we need to more ingredients

  1. A means for estimating errors in the solution
  2. A way to use the errors to estimate required polynomial orders for the next solution in the iterative solution process.
  3. A way to determine if the process has converged

First let me review that there are to main approaches to p-refinement, uniform and adaptive, which were described in detail here. In addition, the refinement process can be controlled differently in various regions of the model, known as local adaptivity. We can, for example, refine more in a region surrounding a critical stress concentration, or in a single part of an assembly.

Uniform refinement is a subset of adaptive refinement, in which all elements are set to the same updated p-order. In the simplest implementation, the refinement can be done in a loop, increasing the p-order uniformly by one in each iteration. This discussion will be for the more general case of adaptive refinement. Errors are estimated for each element, for which the polynomial orders are increased separately. At shared faces the polynomial order is set to the higher value from adjacent elements. For example, if an element that remains at p2 shares a face with an element that will require p4 in the next solution, the edges of the share face are set to p4. As discussed previously, convergence is not mathematically guaranteed for p-adaptive refinement but it works well in practice and is considerably less computationally intensive than uniform refinement.

Error Estimation

There are two types of error estimates available: multi-pass and single-pass. In multi-pass, you can compare the values from the current solution to a previous solution. For example, how much did the stress change at a point in an element between this and the previous solution? Changes in other measures like strain energy or displacement can also be used. One several passes have been made, there are techniques to extrapolate the measures using the results from all passes [1].

In single-pass adaptivity, information from a single solution is used to estimate the error. For example, we know that in the exact solution, Stress (more precisely traction) should be continuous. The stress from adjacent elements can be used to calculate the traction at share faces. This should be the same, so the discrepancy between the traction from the adjacent elements is a good error estimator. Also, at free boundaries the traction should be zero, while at boundaries with distributed loading, the traction should equal the applied value. The discrepancy between the computed traction and the applied boundary value is another error estimate. Traction discrepancies is one of the two main error estimators used in stressRefine.

Another famous error estimator is the Zienkewicz-Zhu norm [2]. As I discussed here, smoothed stress results are compared to directly computed (“raw”) stress results. The discrepancy gives an error estimate. This is the second estimator used by stressRefine. The worst-case of the traction error estimate and the Zienkewicz-Zhu estimate is used to determine the required p-order for the next solution pass. A problem with the Zienkewicz-Zhu estimate is that is has been found to not always be reliable at p2 [2]. StressRefine starts ar p2, because that is the order equivalent to conventional quadratic finite elements used for solids. So a reliable estimator at p2 is important. Using the traction estimator in additon to the Zienkewicz-Zhu estimator has been found in practice to work well to alleviate this problem.

Traction estimator: Calculate the components of the stress tensor at a point on an element face. The traction is the dot product between the stress and the normal to the face at that point. This is repeated for the adjacent element sharing the face. The maximum “jump” (discrepancy between adjacent elements) in any of the 3 traction components at that point is used as the error estimate. At an unconstrained boundary, the maximum discrepancy for any component between the calculated traction and the applied load (or 0 for a free boundary), is used as the estimate. This estimate is calculated at multiple sampling points on a face, and the worst value of the error from all sample points is used.

Zienkewicz-Zhu estimator: Although the original Zienkewicz-Zhu estimator was for stress, smoothed vs. raw strains are used in stressRefine, because strain is continuous in models with multiple materials, while stress is not. A various sample points within an element, the strain is directly computed. This is compared with the smoothed strain at the same point, and the discrepancy is the error measure. The maximum error for any strain component, at any of the sample points, is used for the error estimate.

Normalizing the Error Estimate:

We want to use a percentage, or equivalently a normalized error for convergence and estimating required p-order. For traction, it seems logical to use the maximum stress in the element for which the error has been calculated. This can be too conservative, especially for elements with low stress. There can be slight noise in the solution which would be magnified if divided by the local low stress. A better value to use is the maximum stress in the model. The problem there is if the maximum stress is at a singularity, where it is physically meaningless. This is avoided in stressRefine by identifying singularities automatically and excluding them when calculating the max stress in the model. For the Zienkewicz-Zhu estimate, where the discrepancy is in strain, the maximum strain in the model is used (again ignoring singularities). For models with multiple materials, the maximum for all elements that have the same material as the current element is used. von Mises stress and strain are used for normalization.

Estimating require p order from current errors

If we define ε1 as the error estimated from the previous solution, and ε2 as the desired error in the next solution, the the ratio ε12 can be used to estimate the required p-order, p2, for the next solution. For this purpose stressRefine uses the conservative estimate from [3]:

ε21 ≅ (p1/p2)p1 (1)

This can be manipulated to yield

p2 ≅ p112) 1/p1 (2)

p1 and ε1 are known, and ε2 is set to the desired error tolerance. This is carried out in FindRequiredPOrder in the SRerrorCheck class in stressRefine (SRerrorCheck.cpp).

In many models, a significant percentage of the elements do not have the polynomial orders of any of their edges raised because their errors are low. So on subsequent solution passes, their stiffness matrices do not need to be computed. For elements that have had changes in their p-order, the element stiffness matrix from the previous solution is a subset of the stiffness matrix for the next solution, because of the use of hierarchical basis functions. So for these elements only the stiffness components associated with the new functions introduced by increasing the p-order need to be computed.

Sacrificial Elements

As mentioned above, it is important to exclude elements adjacent to singularities when calculating the maximum stress in the model. The stress at singularities is not physically meaningful because it is theoretically infinite, so these elements are also exclude from p-adaptivity. Various types of singularities, such as point loads and constraints and reentrant corners, are pictured and descrubed here. These are detected automatically in class SRerrorCheck in routine AutoSacrificialElements, and the elements that touch the singularities are flagged as sacrificial. They remain at p2 and are skipped when calculating max stress in the model.

Convergence: Single- and multi-pass

The adaptive algorithm can be terminated when the error estimates at all elements (which are the measures from a single analysis pass described above) are within a specified tolerance. In practice this can be overly conservative. An additional measure is how measures such as the max stress in the model change between the previous and current solution pass. This is often less conservative, so the minimum of multi-pass estimates and single-pass is used to estimate when the adaptivity algorithm is complete. It has also been found in practice that a total of 3 solutions, the initial one at p2 and two subsequent solutions with adapted p-orders, is enough to achieve accurate stresses for the vast majority of models, so this is the default in stressRefine. This default was used for all the cass in the validation manual.

Putting it all together, the adaptivity algorithm looks like this:

This is carried out in class SRmodel in routine “run” in the full version of stressRefine. In that version SRmodel also includes analysis features. When I split stressRefine into library and executable branches, model entities like nodes,edges, faces, and elements were kept in SRmodel in the library, while analysis responsibilities were moved to a new class SRanalysis in the executable. So in that version the adaptive analysis routine “run” is in class SRanalysis.

Lessons 1 through 4 have provided all the ingredients for understanding how p-adaptivity is implemented. In future lessons I’ll go into more detail on some nuances of applying p-adaptivity using conventional finite element meshes. I’ll also cover how breakout regions can automatically be identified.

Homework problem: using equation (2) above, if the normalized error in an element that is currently at p2 is 0.5, and the error tolerance is 5 percent (0.05), what is the estimate of the required p order in the next solution pass. Comment on how aggressive the use of this estimate is compared to simply increasing the p order by 1 on each solution pass.


  1. Szabo, B, and Babusca, I, Finite Element Analysis, Wiley, NY, 1991
  2. Zienkiewicz, O, and Taylor, R, The Finite Element Method, Butterworth Heinemann, Oxford, 2000.
  3. Zienkiewicz, O, Zhu, J, and Gong, N, “Effective and Practical h-p-Version Adaptive Analysis Procedures for The Finite Element Method, Int J Num Meth Engr, 1989

P-Fea Course. Lesson 3- Converting Element Stiffness Routine to p-Adaptive

The stiffness matrix for a conventional isoparametric finite element takes the form:

Kel = ∫BTCBdV

where C is the element stiffness or constitutive matrix, and B expresses the strain-displacement relations. If the strain is stored as a 1D vector e = [ex,ey,ezxy,yxzyz]. and displacements as u = [ux,uy,uz], then

e = [B]u.

The strain-displacement matrix B has 3×6 submatrices for each basis function hj:

Evaluating B is the only difference between a conventional element routine and an element with p-adaptive functions. In a conventional routine, the shape functions are used for the mapping, and the same functions are used for the basis functions. In a p-adaptive element, different functions are used for mapping and displacement. In stressRefine, quadratic serendipity functions are used for the mapping, and higher order polynomials for the displacements.

where r,s,t are the natural coordinates in an element and J-1 is the inverse of the mapping from r,s,t at each integration point. The integral over the volume is also converted into an integral in natural coordinates using the determinant of the mapping |J|. J is computed from the shape functions. It is invertible to J-1 unless the element is invalid (e.g. too highly distorted).

Examining the code.

A simple implementation of a p-adaptive stiffness matrix is in the routine CalculateStiffnessMatrix in SRelement.cpp in the stressRefine library SRlibSimple.

The line

int nfun = globalFunctionNumbers.GetNum();

looks up the number of displacement basis functions in the element, which is calculated from the polynomial order of each edge of the element.

The integration points for the element are determined with the call to model.math.FillGaussPoints. The stressRefine library uses degenerated brick Gauss quadrature for tets and wedges. This could be made more efficient by using the triangular points developed by Cowper [1] for the r,s quadrature in wedges and the tetrahedral points developed by JinYun [2]. These do not go up to the higher polynomial orders needed by stressRefine, but could be used when the polynomial order is low enough, switching to the degenerated Gauss quadrature when needed.

There is a quadrature loop in the element matrices

for (gp = 0; gp < nint; gp++)

Inside that loop, the natural coordinates and the quadrature weight are determined in model.math.GetGP3d, and J-1 and |J| are calculated in FillMapping. An error is raised if |J| is too small. However, before the element routines are calculated, stressRefine tests the mapping of each element and attempts to recover by partially flattening curved elements as discussed previously.
The derivatives of the basis functions ∂hj/∂r, etc are calculated at each integration point with the call FillBasisFuncs.

After this the x,y and z derivatives of the basis functions can be calculated.

BTC is calculated in fillBTC, which accounts for the zeros in B and C.

The multiplication (BTC) times B is then calculated in FillKel33GenAnisoRowWise.

This returns a 3×3 submatrix kel33 of the element matrix which is stored in the appropriate location in the symmetcially-stored element stiffness matrix.

Converting an existing conventional element stiffness routine

Any stiffness routine will have a loop over the integration points, and loops for the rows and columns over the number of basis functions in the element.

A few modifications need to be made. This discussion assumes the setup functions for stressRefine have been called, and for each conventional element for which p-adaptivity will be used, a corresponding SRelement has been created, This was discussed in lesson 2. Then CountElementFunctions can be called in the library function in SRbasis to determine the number of element functions. FillGaussPoints in model.math will calculate and store the integration points and return the number. The numerical integration in the element must be modified to use the increased number appropriate to the number of basis functions corresponding to the polynomial order of the element.

The derivatives of the higher-order polynomial basis functions ∂hj/∂r, etc must be used instead of the shape functions. These can be calculated directly using ElementBasisFuncs in SRbasis. Everything else in the element routine is unchanged.

Stress Recovery

The stresses in an element can be directly computed from

σ = [D]ε = = [D][B]u once the displacements u are known.

This is computed the same way as for a conventional element, except, as for the case with the element stiffness matrix, the displacement gradients in [B] are calculated using the higher-order polynomial basis functions, not the conventional shape functions.

Adapting the polynomial order

Between this and the previous lesson, we can calculate the stiffness matrix for an element given the polynomial orders of each of its edges. In the next lesson we’ll cover adaptivity: after determining the solution for the current polynomial order, errors are calculated, and used to estimate the required polynomial order to achieve the desired accuracy.

Homework problem: Is the element routine CalculateStiffnessMatrix “thread-safe” if we want to calculate the elements in parallel?


  1. Cowper, G, “Gaussian Quadrature Formulae For Triangles”, Int J Num Meth Engr, 1973
  2. Jinyun, Yu, “Symmetric Gaussian Quadrature Formulae For Tetrahedral Regions”, Computer Meth. Appl. Mech. Engr, 1984.

P-Fea Course. Lesson 2- Basis Function Continuity

This course will continuously refer to [1]. That book is highly recommended as a reference. Detailed handwritten notes of important derivations for stressRefine are also available here.

For conventional finite elements the basis functions used to represent displacement are continuous at interfaces between elements as long as the adjacent elements have the same nodes at the interface. These functions are often called “shape functions” because they’re the same functions used to represent the element mapping from natural coordinates to physical coordinates, which determines the element’s physical shape. Here will make a distinction. “Basis functions” is used to describe functions representing displacement, while “mapping functions” (interchangeable with “shape functions”) are used to describe functions representing the element mapping.

Before I get to how continuity of basis functions, let me describe a slight tweak I made in stressRefine to the classic heirarchical basis functions developed by Szabo and Babusca [1].

This image has an empty alt attribute; its file name is image-11.png

Variation used in stressRefine:

In the classic functions the only nodes are at the element corners and the edge functions start at polynomial order 2. In the stressRefine functions, the nodal functions are the same as the conventional finite element quadratic functions, which requires the introduction of mid-edge nodes. The edge functions start at polynomial order 3. The advantage of this change is that if the elements is a polynomial order 2, its is identical to a conventional quadratic isoparametric element. A possible disadvantage is if this would degrade the element stiffness conditioning. (The element stiffness matrices are assembled into a linear system Ku = f. If K is not well-conditioned, slight errors in f can cause large errors in u). But I was able to prove numerically that the element stiffness matrices are as well conditioned with the modified functions as with the classic ones, at least as high as polynomial order 8.

So for the stressRefine functions, continuity across element interfaces is assured if the adjacent elements have the same nodes, up until polynomial order 2. Additional work is needed to assure continuity at shared edges and faces for polynomial orders higher than 2.

To show why this is so for the case of edges, consider the odd-numbered polynomial functions for polynomial orders 3,5,7, … In the figure shown (for a 2D mesh), the upper element has the shared edge defined between corner 1 and 2 based on the local element numbering, so it, while the lower elements has the shared edge defined between corners 2 and 3, so the edge runs backwards from the other element. This means that the local definition of the p3 (short for polynomial order 3) function is different, and for the lower element that function has to be multiplied by -1 to assure continuity.

The situation is more complicated for faces. Consider an exploded view of two adjacent elements that share a face with global nodes 10, 11, 12:

But for the left element these correspond to local nodes 1,2,3 while on the right element they correspond to local nodes 3,1,2. So for the two elements the natural coordinates on the face rf, sf are defined differently, and the basis functions are misoriented on the right face with respect to the left.

This brings up an interesting historical aside about Mechanica. My colleague and cofounder Christos Katsis handled everything related to coding up the basis functions. He introduced the concept that there is a global face, and local faces can be rotated (or even reflected) compared to it. There is then a coordinate transformation to make sure the local face is compatible with the global face. It is further complicated by the fact that we need to relate elements natural coordinates r,s,t to face natural coordinates rf, sf. So there was a 3×2 transformation relating r,s,t to the local face, and a second 2×2 transformation relating the local face to the global face. None of this is computationally intensive, but it was tricky to code. Christos took care of all of this while we were in full-blown early start-up mode, with tons of distractions. A major feat of concentration!

When I started to develop stressRefine I was not looking forward to the equivalent of all of that, but fortunately found there’s an easier way to handle it. There was a hint in Szabo and Babusca’s textbook [1], when describing implementation of the hierarchical basis functions. When discussing the 2D basis functions for a triangle, instead of referring to local coordinates r and s they used l2 – l1 and  2l3-1 where l1,l2, and l3 are the area coordinates of the face. From the definition of the area coordinates, r = l2 – l1 and s = 2l3-1. I realized that they had done this because it guarantees continuity of basis functions for elements sharing the same face as long as the nodes of the face are numbered consistently. So we just need to introduce the concept of how the local node numbering of the face relates to the node numbering of the global face.

The global face is defined by the first element that owns that faces. So in the example above, the left hand element is encountered first, and the global node numbering of the face is local node 1 = global node 10, local node 2 = global node 11, and local node 3 = global node 12. Now when we encounter the face again in the right hand element, the local nodes of the face are 3,1,2. When we work with the basis functions of the face, compatibility is assure if we use rf =l1– l3 and sf = 2l3-1 when referring to the face for the second element. This is for 2D, It turns out in generalizes to 3D if we use the same idea with volume coordinates instead of area coordinate. This also takes care of relating element r,s,t to face rf, sf.

I’ll show how this is coded up below. In short, we introduce a variable gno for “global node order”, so for the example face on the right hand element gno(1) = 3, gno(2) = 1, and gno(3) = 2.

All of this works for quadrilaterals in 2D, and bricks and wedges in 3D as well, But for bricks, for example, we define rf, sf using the linear mapping functions of the brick N1 through N8 instead of volume coordinates.

Details of Basis functions for a tet

Nodal functions: These have the value 1 at one node (corner or midedge) and 0 at all others, so they are the same as the quadratic shape functions for a 10 noded tet [2]. for example N1 = (2L1 -1)L1 is a typical function for a corner node, N5 = 4L1L2 is a typical function for a midedge node.

The edge, face, and volume functions are the same as described in [1] except the edge functions start at p3.

Edge Functions: These are nonzero on one element edge and zero on all others. They start at p3 because the quadratic edge functions are the same as nodal midedge functions. They are the integrals of Legendre polynomials shown here blended with volume coordinates so they are zeroes on all other edges:

bn = 4LILJφp(re)/(1-re2) where φp is the integral of Legendre polynomial over the edge for polynomial number p, re = LJ – LI is the natural coordinate on the edge, and n is the basis function number of the element. The term LILJ with the volume functions assures this reduces to the nonzero 1D function φp on the edge with corners I, J, and is 0 on all other edges, and zero on all faces except those with corner I,J.

Face Functions: These are nonzero interior to one face, zero on all edges, and blend to 0 on all other faces.

bn = 4LILJLkPpr(rf)Pps(sf) where Ppr is the Legendre polynomial in the rf direction and pr is the polynomial number in that direction, Pps is the Legendre polynomial in the sf direction and ps is the polynomial number in that direction, rf = LJ – LI and Sf = 2Lk-1 are the natural coordinates on the face, and n is the basis function number of the element The term LILJLk with the volume functions assures this is nonzero interior to the face with corners I, J,K, and is 0 on all edges,and all other faces,

Volume functions: These are 0 on all edges and faces, and nonzero in the interior to the element.

bn = LILJLkLlPpr(r)Pps(s)Ppt(t) where Ppr is the Legendre polynomial in the r direction and pr is the polynomial number in that direction, and similarly for s,t. The volume functions assure that these are 0 on all edges and faces.

Wedges and Bricks

Similar techniques are used to blend the 1D functions wedges, and bricks, For calculating element stiffnesses, we need the derivatives of the basis functions with respect to the element natural coordinates r,s,t. This is a little messy but not difficult to code once you have the formulae derived. The basis functions and their derivatives for tets, bricks, and wedges are described in my notes here. Note that we do not have to actually evaluate the integrals of Legendre polynomials, there are regression formulae for computing them described in [1].

Use of “global node order”: I described assuring continuity of the basis functions above. For edges we just need to correct if the local edge runs backwards from the global edge. A direction variable, which is +/- 1 is assigned to each edge. The basis function is multiplied by “direction” to assure continuity (the same thing could be achieved by switching I and J in the edge basis function if the edge runs backwards, but the direction flag is easier. For the face functions of a triangular face we need to use the “global node order” described. So in the face functions shown above, I,J,K would normally be the element local corner number associated with local corner numbers 1,2,3 of the face. Instead we use I =gno(1), J = gno(2), and K = gno(3). In the example above, for the face on the right hand element shared by the two elements in the exploded view, gno(1) = 3, gno(2) = 1, and gno(3) = 2. Note that the discussion here is “1-based” but the code is in c++ so in the code is is “0-based”, e.g. gno(1) is really gno[0] in the code.

This may all seem messy but it is only necessary if you dive deep into the bowels of the basis function routines, such as TetBasisFuncs is file basis.cpp in SRLibSimple. To actually use the library, two setup actions need to be performed:

Create Global Edges For all elements in your code, create corresponding SRelements. This is done by first allocating space for them:


where nel is the number of elements. Then loop over your elements, and, for each, call

model.CreateElem(id, userid, nnodes, nodes[], mat)

where nnodes is the number of corner and midnodes in the element, e,g, 10 for a quadratic tet, nodes is the vector of node numbers for the element (corners first followed by midedges), and “mat” contains the element material properties. This are all be described in a programmers guide I uploaded here. The global edges automatically get created on the fly, with the “direction” flag assigned.

Create Global Faces:

After nodes and elements have been defined, just call


This creates global faces for all element faces in the model, and assigns the global node order variable. It also determines which elements own each global face, and which faces are boundary faces (only have one owner).

Subsequently the basis function routines can be called as needs and continuity will be assured.

Homework problem: The volume coordinates for a tetrahedron have the property that they are 1 at a single node, and 0 at the opposite face. For example L1 is 1 at node 1, and 0 on face 2,3,4. Given that property, explain why the blending works properly for the edge, face, and volume functions: For edge functions, the functions are nonzero on the edge, and zero on all other edges and all faces that don’t touch the edge. For face functions, the functions are nonzero on the face, 0 on all other faces. For volume functions, the functions are only nonzero in the interior of the volume.


  1. Szabo, B, and Babuska, I, Finite Element Analysis, Wiley, NY, 1991

P-Fea Course: Introduction

Take this course to learn what’s “under the hood”

This course is intended to teach students and developers who want to learn how to write a p-adaptive code using the stressRefine library. It will start with the theory behind p-adaptivity, then take you through the practical aspects of creating p-adaptive elements using the library, and writing analysis routines that do p-adaptive analysis. It will continue on to show how to do analysis of local regions. This will all be presented first with simplified elements and solution assembly routines, then I’ll cover some code optimization and parallelization.

Having taken at least one finite element course will be assumed as a prerequisite. Familiarity with basic numerical methods concepts will be assumed, but concepts like efficient equation solution will be covered.

Suggested Reading:

Szabo, B, and Babuska, I, Finite Element Analysis, Wiley, NY, 1991. Superb text for understanding the implementation of heirarchical finite elements

Zienkiewicz, O, and Taylor, R, Finite Element Method. Volume 1- The Basis, Butterworth, 2000. Excellent introduction to Fea in general, plus error estimation and adaptivity concepts.

Lesson 1- FEA Displacement Approximations And Convergence

Approximating Functions

Finite Element Analysis uses an integral expression over the entire model being solved, rather than solving the governing equations (such as the equilibrium equations in elasticity) point by point throughout a model.  A series of functions is used to approximate the unknowns, such as the displacements u,v,w in the x,y,z directions in structural analysis:

u = ∑i = 1 to N aifi(x,y,z)   v = ∑i = 1 to N bigi(x,y,z) w = ∑i = 1 to N cihi(x,y,z)

These functions have to satisfy some conditions such as continuity, and displacement boundary conditions (for example if you have an immovable support, the displacements must be zero there).

The most famous use of this approach is the Ritz method, also known as Rayleigh-Ritz. A series of unknown functions like those in the equation above is inserted in an integral expression for the potential energy (strain energy plus work done on the boundary). If the energy is E, then the minimum potential energy is found by setting the derivatives of E with respect to the unknown coefficients to zero, for example

∂E/∂ai = 0 for I = 1 to N,

for the displacements in the x direction (u), and similar expressions for v and w. This results in a linear system of equations that is solved for the unknown coefficients.

Consider a one-D example, solving for the deflection of a beam of length 100. Now u represents the lateral deflection of the beam, and we approximate it with some functions:

u = ∑i = 1 to N aifi(x)

A popular choice is fi(x) = sin(i*π*x/L) where x is the length of the beam. Since L = 100, here’s what the first few functions look like:

These functions are not able to produce a nonzero value for displacement on the boundary. So if there’s an enforced displacement, we could introduce a linear function that’s nonzero on the boundary or a local function that’s nonzero on the boundary and decays rapidly into the interior, like the blue or orange functions shown:

These boundary functions can be used in addition to the sin functions.

This technique works well with functions that are smooth and linearly independent, and often gives accurate results with a few functions. It also helps if the functions are orthogonal to each other or close to it. We want ∫fi(x)fj(x)dx over the body, in this case the length of the beam, to be close to 0 if i≠j and close to 1 if i = j, and that works out well for the sin functions.

To illustrate the concept of completeness, suppose we omitted f1 in the figure above. The series would be incomplete, and even if we used an infinite number of functions, it might not converge to an accurate answer. The typical functions used in conventional finite elements codes are complete, and will converge to an accurate answer as long as enough elements are used (fine enough mesh). In a p-type finite element code, the mesh can be kept fixed, and an accurate answer will results as long as the polynomial order is high enough.

Now suppose we wanted to solve a 1D problem where the exact solution for displacement looks like this:

We could use a lot of the sin functions shown above, plus add in the boundary functions. But that’s a lot easier to do in 1D, and in 2D for simpler shapes like rectangles. It gets a lot harder for complex 3D shapes to come up with functions that can satisfy the boundary conditions and be continuous. One way to get around this is to use functions that are piecewise continuous. If we subdivide the line in the figure above into multiple segments, we can represent the displacement in each segment with more simple functions like polynomials. For example we can use linear functions or quadratic lagrange polynomials:

These functions have the property that they nonzero at a node (such as at the ends of the segment) and zero at all other nodes. Continuity is assured if the displacement is the same at nodes shared by adjacent segments. The segments are the “elements” in this 1D example. Now to follow the very wiggly displacement above well, we’d need a lot of the linear elements, Fewer of the quadratic elements would be needed. We can take this further by introducing higher order elements:

These are the 1D versions of the basis functions proposed by professors Szabo and Babusca [1]. In this example we could get by with only one of these segments if we went to high enough polynomial order. But there is a limit to how high the polynomial order can go in practice, because using very high polynomial order is computationally intensive. This means in more complicated problems we’d still need multiple elements. So there are two basis approaches to adaptivity: If we refer to the length of a segment as “h”, the basic idea with simpler elements, like the linear and quadratic ones above, is to achieve accuracy by keeping the type of element fixed (for example it remains quadratic), and to make “h” smaller by using more elements, This is h-adaptivity, which is achieved by adaptive mesh refinement. An alternative is to use higher order elements where the polynomial order can be raised. Keeping the number of elements fixed, we successively increase the p-order to achieve accuracy. This is p-adaptivity. It has the advantage that it can be achieved entirely inside the finite element analysis code, while h-adaptivity requires collaboration between the analysis code and the automesher. The disadvantage of the p-adaptive approach is that the elements are more complicated. This is especially true if you have a library of existing elements that would have to be converted to p-elements. The purpose of the stressRefine library is to provide tools to make such conversion easier, and in this course we’ll go into detail about showing how to use the library for that purpose.

The piecewise continuity approach is extended to 3D by using 3D elements. These are typically simple shapes like bricks (hexahedra), wedges (pentahedra), or “tets” (tetrahedra). The conventional h version of quadratic elements looks like this:

These a called “serendipity elements” which are based on a variation of the 1d Lagrange polynomials shown above. This is simplest to explain for the brick (c). Lagrange polynomials can be used in x,y, and z, and multiplied together. But this would require there to be additional nodes on the middle of the faces of the element, as well as the interior of the volume. It turns out the functions can be modified so that such nodes are not needed, which simplify the work of meshing, and good accuracy is still achieve. I’ll omit the details of that because my focus is on p-adaptivity, but it’s explained well in [2], and there’s a discussion online here. Until now I haven’t discussed mapping of elements. The basic shape shown above must be capable of being distorted into physical space, shown here in 2D:

The “natural coordinates” of the original elements are r,s,t. These need to be mapped to the physical coordinates x,y,z. To do this, we assume the position anywhere in the element is described by approximating functions called “shape functions”:

x = ∑i = 1 to N aifi(r,s,t)   y = ∑i = 1 to N bigi(r,s,t) z = ∑i = 1 to N cihi(r,s,t)

You can use more or less functions to represent the shape than the displacements. This means that more or less coefficients, or “parameter”, are used. If less are used for mapping than displacement, the element is “subparametric”. If the same number are used, the element is “isoparametric”, and if more are used for mapping the element is “superparametric”. Superparametric elements are not the best idea because they can have trouble representing rigid body motion. Isoparametric elements are the most commonly used conventional finite elements. The elements in stressRefine start out isoparametric at polynomial order 2, then become subparametric. I’ll go into more detail about that in a future lesson.

An obvious choice for the shape functions is to use the same functions used for the displacements, so not only as the same number of functions used, but the same functions. This is typically what is meant by a conventional isoparametric element.

Higher order polynomial elements are created by blending the 1D polynomials shown above in 2 and 3 dimensions. Here is what that looks like for the version created by Szabo and Babusca [1]:

Figure 1

I’ve discussed these functions in detail in a previous post. I discussed there how the number of each type of function must be chosen for completeness. Continuity requirements for these elements are trickier than for conventional elements. For conventional elements, continuity is assured as long as the element have the same nodes on shared faces and elements. An obvious violation of that would occur if a linear element were placed next to a quadratic one. If the displacement cases the quadratic element’s edge to become curved, the edge of the linear element cannot follow, so a crack opens up that is not present in the physical model:

The same issue occurs for an incomptibale mesh, which can occur, for example, at a part boundary in an assembly. There are methods to resolve this such as “glued contact” constraints that will be covered in a future lesson.

For p-elements, it is not enough to assure the nodes, which are only at the corners of the element, are compatible. If the adjacent elements are at different polynomial orders, discontinuity occurs at a shared edge. But they can be discontinuous at a shared edge or face even at the same p-order if you’re not careful in the definition of edges and faces, as discussed previously. Assuring continuity when using higher order polynomial functions will be covered in detail in the next lesson, where for the first time we’ll take a peak at some code.

Comments on the Ritz Method

Minimizing potential energy, as is done in the Ritz method, is only valid for linear problems and a subset of nonlinear problems for which a potential energy function exists. But there are more general alternatives, such as the Galerkin method that work in general nonlinear problems. The idea is the same, approximating functions are introduced into a global equation integrated over the body, and you come up with a set of equations to determine the unknown coefficients. However, if the governing equations are nonlinear and/or time dependent, then so will the equations for the coefficients. This requires “time marching” numerical techniques and solution methods like modfied Newton-Raphson. The Galerkin method is also readily generalized to other physical systems.

Up into the 1950s, before the advent of finite elements, a great deal of effort was put into coming up with functions that would work for more complex shapes. In the US, the National Advisory Committee for Aeronautics (NACA), predecessor to NASA, published a lot of solutions obtained this way for shapes useful in aeronautics. One of my professors at Stanford, Jean Mayers, was one of the experts who had contributed in that era. This seems to be all obsolete now with the advent of finite elements. But there is the sticky point that automeshers do not always succeed in creating the desired mesh, so it’s intriguing to revisit a meshless approach more similar to the Ritz method. That is exactly what was done with the external method in SimSolid. Implementing this requires sophisticated techniques to assure continuity at part boundaries, which is similar to the issue for incompatible meshes, which we’ll touch upon in a future lesson.


We’ve covered approaches to come up with approximation functions for displacements to be able to represent arbitrary motion in complex domains. We’re not done with that because the major issue of assuring continuity for higher order elements remains. That will be the subject of lesson 2. Once we have approximating functions, how to we choose how many are needed to achieve desired accuracy? We need a way to estimate errors in the current solution. Then, from the error-estimate, a way to figure out how many functions are needed. For h-adaptivity, we use that to estimate required element size, while in the p-method we use is to estimate require polynomial order. That is the subject of lesson 3.

Homework problem: The volume functions for hierarchical elements are 0 at all element nodes, edges, and faces. Do they play any role in continuity? If not, why are they needed?


  1. Szabo, B, and Babuska, I, Finite Element Analysis, Wiley, NY, 1991.
  2. Zienkiewicz, O, and Taylor, R, Finite Element Method. Volume 1- The Basis, Butterworth, 2000.