I am a Mechanical Engineer born in 1953, Ph. D, Stanford, 1980. I have been active in the mechanical CAE field for decades. I also have a lifelong interest in outdoor activities and fitness. I have had both hips replaced and a heart valve replacement due to a genetic condition. This blog chronicles my adventures in staying active despite these bumps in the road.
Global “wrapper” functions have been added so the StressRefine Library can now be called from C or Fortran as well as C++. A Guide for using the library has been added on github: “Using the StressRefine Library Template.pdf”. The changes are in the SRlibSimpleProj repository: https://github.com/StressRefine.
Writing the guide made it clear that the steps to use the stressRefine library to add p-adaptivity to an existing code are more involved than I would have liked. The library does the “heavy lifting”:
creating element edges and faces and ensuring basis function continuity
calculating basis functions and their derivatives for arbitrary polynomial orders
calculating solution errors and estimated required polynomial orders
This functionality can be integrated with your own element and stress routines, or the built-in element stiffness and stress routines for tets, bricks, and wedges in the library can be used.
There is non-trivial effort to hook up to all of this, but I tried to make it as straightforward as possible.
Zooming in closely to any of these nodal loads, they look like a point load applied to a boundary. This has finite load applied at a point, which has zero area, so the applied pressure is infinite. This is like the well-known Boussinesq problem in elasticity. Higher order elements will attempt to solve this singular problem.
In many stress analysis problems the maximum stress in the model is concentrated in a localized region, such as near a fillet, hole, or notch. It is much more economical to “break out” a local region for adaptive analysis rather than reanalyze the entire model. I’ve reviewed the theory behind this previously.
The first step for breakout analysis is determining the origin of the region of interest. This is usually the point of maximum stress in the model but can also be a user-specified point. Starting from that point, the next step is to identify how many elements are needed to retain in the breakout model to get an accurate answer from the adaptive stress analysis. The last step is to apply the displacements from the original solution as constraints at the interface between the break out model and the full model.
The intermediate step of determining how many elements to retain is the trickiest. Ideally, at the interface between the breakout model and the full model, the stress should not be varying too rapidly, and should be much lower than the maximum stress. The elements in the model are sorted in ascending order of distance from their centroids to the origin point. This sorting is very fast if done with the quicksort (“qsort”) algorithm. A minimum number of elements is retained. Originally, additional elements were retained until the stress was low enough compared to the maximum stress or the stress gradient was low enough. Experience showed that both of these tests can often give “false positives” and cause an insufficient number of elements to be retained. Testing also showed that an accurate results is obtained if several thousand elements are retained in the breakout model. So now a very simple approach is used: after the sorting, the first several thousand elements are retained in the breakout model. The default is 5000. A model of this size can be adaptively solved very quickly, typically in a minute or less. This simple algorithm is carried out in postprocess.autoBreakoutSphere.
I refer to this approach of extracting a breakout model from a single large model as “ragged breakout” because the interface between the breakout model and the full model is not smooth. An example of this is shown in the figure below. This turns out not to be problematic, as long as all elements that touch the interface are marked as sacrificial.
Breakout by part from Assemblies
A different approach is used when analyzing assemblies. Here the breakout model is usually a single part. Working with codes that have application-programming interfaces, it is straightforward to determine which part the point of maximum stress is in. Alternatively, the user can select a part to use for the local region. The api can then provide the list of elements that belong to that part. This was done for the interface between SOLIDWORKS Simulation and stressRefine
This information is not available if working with an input file such as the Nastran bulk data file. However, the elements that belong to the part in which the point of maximum stress resides can be determined another way. In Nastran models, the interface conditions between adjacent parts in an assembly are often specified by “glued contacts”. These are “bsurfs” in Nastran, but other codes have a similar interface condition. The boundary of a part consists of element faces that only belong to a single element. These faces are either free (unloaded and unconstrained), loaded, constrained, or at the interface with another part, which means they have “bsurfs” on them. So for the purposes of identifying the part boundary, we just have to look for the faces owned by only one elements, or the faces that have bsurfs. In Nastran models, the faces with bsurfs should only have one element owner also, but checking for the bsurf is still necessary to determine the interface constraints to apply to the model, described below.
The mesh for the part is found by starting at an element in which the point of max stress is found, then searching the mesh topologically: All faces of the starting element are searched. If any are shared with adjacent elements, those elements then have their faces searched. This proceeds until no more faces shared by more than one elements are found. It is easiest to program this recursively, but that might lead to a large number of elements on the stack, so it is done in a loop, with candidate elements added to a list. This is carried out in stressRefine in postprocess.topoFilterSaveBreakoutElems. Here is an example:
Applying interface Conditions
Element faces that are at the interface between the breakout model and the full model have to be identified. The displacements from the FEA solution to the full model are applied as constraints on the nodes of these faces. For a “ragged” breakout model, these are faces that are only owned by one element in the breakout model but are owned by two in the full model. This is carried out in model.FindElemsAdjacentToBreakout. For breakout models that are parts from an assembly, the faces that need the interface condition are those with bsurfs.
Breakout Extraction Architecture
Currently extraction of breakout models is performed by the full stressRefine executable as a special type of run. This executable currently only works on Windows, as discussed previously. It requires the Intel Mkl pardiso solver which is not linking properly to stressRefine on Linux.
For that reason I am going to make a special executable that only does the breakout extraction and does not require an equation solver. This will work on both Linux and Windows. This will have an additional use. Breakout models can be extracted with this executable and then adaptively analyzed with a different program such as Sparselizard by simply providing a translator.
This concludes this PFEA short course. I hope it has been helpful in illustrating some of the issues involved in performing p-adaptive stress analysis, and how they have been handled in stressRefine.
I ran into an issue trying to hook the stressRefine to another finite element code, which is written in c. Global “wrapper” functions which and be called from c are required. I am busy creating these, and after testing them they should be available in a few days and I’ll add them to the source. I’ll also provide the same functions callable from Fortran (using standard Fortran-to-c conventions).
It also became apparent that while the PFEA course has tried to explain conceptually what is needed to hook an external FEA code to the stressRefine library, detailed instructions are needed. I’ll provide them at the same time I provide the wrapper functions.
Another issue is that some FEA codes make use of iterative solvers. My experience in the past has shown that preconditioners for iterative solvers that work well with conventional elements may not lead to good convergence with higher-order p elements, because they result in smaller (fewer equations) but denser stiffness matrices. It is ok to try them with p elements but the iterative solver may not perform as well. Direct sparse solvers work fine with p elements.
I have modified the stressRefine source so it compiles on either windows or Linux. makefiles are included along with instructions in the readme.md file in the github repository (https://github.com/StressRefine).
Note: there is no linux version of SRui because it is written C#, which is windows-specifice. It would be relatively simple to recreate it in a portable language like Java, to which C# is quite similar. To run stressRefine under linux without a UI, there are instructions in readme.md in the SRUIProj repository.
The translator bdfTranslate and the stressRefine library SRlibSimple compile and run fine on Linux. I modified SRlibSimple so it now contains the same full functionality as the full stressRefine project.
The full stressRefine project “fullengine”, and the executable project “SRwithMklProj”, both compile and link fine on linux with the Intel mkl solver suppressed. However, I was unsuccessful at getting them to link with the Intel Mkl library. I followed the Linux instructions from Intel but the link failed. I gave a lot of details about this in the readme.md files of those two projects. If anyone figures this out please comment here or in the repository.
Answer to Homework problem For Lesson 4: “using equation (2) above, if the normalized error in an element that is currently at p2 is 0.5, and the error tolerance is 5 percent (0.05), what is the estimate of the required p order in the next solution pass. Comment on how aggressive the use of this estimate is compared to simply increasing the p order by 1 on each solution pass.“
In equation 2, p1 is 2 (the current p order). The current error, ε1 is 0.5, and the target error ε2 is 0.05. So the required p order isp2 = 2(10) 1/2= 6.3, which is rounded down to 6 because the fraction is less than 1/2. I would say this is pretty aggressive, especially considering some adaptive algorithms simply increase the p order by 1 each iteration, while this algorithm is jumping from p 2 straight to p 6.
There are a few ways in which p-adaptivity is a little trickier to implement than conventional finite elements. Also, there are some issues about reading meshes that were specifically created for conventional elements and using them for higher order elements. And there are some issues about element quality. These are the subjects of this lesson.
Point vs distributed Loads and Constraints
It is common for finite element preprocessors to convert distributed loads and constraints into equivalent nodal ones. But they are only equivalent for conventional elements. I’ll describe that in detail below for loads. The other important issue is that point loads and constraints are actually singular. If they are not processed correctly, there would be localized stress concentrations at them., the code that detects singularities would be creating sacrificial elements. So the first step is to detect the intent. If all the nodes of a face are loaded or constrained the same way, that is assumed to be a face boundary condition, not nodal. If a series of faces have the same boundary condition, and there is not a significant slope discontinuity at the shared edge of adjacent faces, that is assumed to be a boundary condition on the underlyings surface. The distributed face constraints can be handled readily in Cartesian coordinates. In curvilinear coordinates, the easiest way to handle them is with a penalty method. I’ll discuss that next. Handling conversion of point loads to equivalent distributed tractions is below.
Constraints in Curvilinear Coordinates
In a conventional code a curvilinear constraint such as shown on the left can be converted into nodal constraints as on the right. So for a node a 45 degrees, The constraint in the R direction means no translation in the “x-prime” direction at 45 degrees. This won’t work for higher order functions, unless they are constrained to 0, but that would make the elements adjacent to the constraint unable to adapt. The mathematical values of the constraints on the higher order functions can be calculated, and in general for curvilinear coordinates end up being equations involving multiple functions. These can be processed directly, but that can slow down solvers considerably. That was our experience with Mechanica. An easy to implement approach is a penalty method, which essentially uses stiff springs in constrained directions. As long as the springs are stiff enough compared to the adjacent elements this is accurate.
This approach is used in stressRefine. There is the further advantage that the penalty can be added at the element level, which is done in SRelement::AddPenaltyToStiffnessandEnfDisp. The penalty is calibrated by assuming there is a small layer if thickness equal to (element size/Penalty), of the same material as the element, attached at the boundary. So, for example, if Penalty = 10000, the layer is 0.0001 times the element size. The size is estimated from the distance from the middle of the constrained face to the element centroid. The higher the penalty, the more accurately is the constraint enforced, but too high of a penalty degrades the conditioning of the element stiffness matrix. 10000 is a good compromise. Experience with stressRefine has shown that this approach is accurate and does not affect the speed of the equation solution.
Distributed Tractions vs. Point Loads
Many finite elements accept distributed tractions as inputs, for example the PLOAD4 card in Nastran. But preprocessors often don’t use them, instead converting the tractions to equivalent nodal loads. They are “energy consistent”, in that the point loads do the same virtual work on the boundary as the original tractions, as is discussed in any finite element textbook. But they are consistent only for the p order for which this calculation was done, such as 2 for a conventional quadratic element. After it has been determined, as discussed above, that a series of nodal loads on a surface actually originated from distributed tractions, the traction values are recovered by doing the virtual work equivalency in reverse. In stressRefine this is called “energy smoothing”, and is done in SRinput::EnergySmoothNodalForces.
A major advantage of higher order elements, properly formulated, is that they are relatively distortion-insensitive. The quality of elements from typical automeshers is often plenty good enough, as long as a mesh setting “curvature-based” or equivalent is turned on. There is on exception that was discussed here: automeshers sometimes produce poor quality or even invalid elements when they first produce linear elements, then project the midnodes to curved surfaces. Many finite element codes will simply fail when presented with these elements, giving a cryptic error code, and users are typically advised to refine the mesh. This can be frustrating for large models, when you have trouble getting them to mesh in the first place. It is possible to recover from this situation by moving the midnodes away from the surface. The problem is you are no longer solving the right physical model. For example if you started out with the geometry on the left and ended up solving the one on the right because a midnode got moved because of an invalid element, you’d end up introducing a singulariy that did not exist in the original model.
One thing that can be done is to note that is not necessary to flatten a curved face entirely to make the element valid. That is depicted below in 2D, There is a vector from the original midnode position and the position of the middle of the straight chord. The midnode can be moved in intermediate steps along that vector
This is done in stressRefine. When an invalid element is encountered (the determinant of the Jacobian of the mapping is <= 0), curved boundary faces are flattened in successive steps, to try to find the smallest amount of flattening necessary to fix the element. This is carried out before the elements stiffness routines are called, in SRmodel::checkElementMapping. This calls SRelement::testMapping for each element, which does the flattening if necessary. Elements whose curved faces have been significantly flattened are made “sacrificial” (they are kept at p2 and excluded from calculation of the maximum stress in the model).
Often this has little effect on the accuracy of the solution. The important factor is whether any element that has been significantly flattened is near the maximum stress in the model. This is checked after the solution in SRmodel::checkFlattenedElementHighStress. A warning is issued if this situation is encountered, and it is recommended that the user refine the mesh. This procedure has been able to “rescue” a significant amount of models for which Nastran and other FEA codes would “choke” on the mesh. And in practice it is not necessary to issue the warning very often.
Homework Problem: Consider the first figure with the constrained hole above. Suppose instead that this was a distributed traction in R. Now suppose the preprocessor replaced the traction with nodal loads. We are using higher order elements, so they are not energy-equivalent to the original traction. Explain physically why these nodal loads introduce singularities.
I just added lesson 4 of the P-Fea course, on implementing error estimates and using them for p-adaptivity.
I am busily creating the Linux version of the source for stressRefine. I haven’t used Linux much, although I used Unix a lot in the past, so the old dog is dusting off some old skills as well as learning new tricks. I was able to recreate the stressRefine project on Linux using the Eclipse development environment, and I verified that the makefiles created by Eclipse work fine to “make all” the project from scratch at the command line, which is the goal. I got a lot of compiler warnings, however, because of code not compliant with Iso C++. These are mostly coming from old c-based string handling functions in SRstring.cpp. I’ve rewritten these, based on the C++ string library, to modernize them and make them more portable, which solves that problem. I had hoped to have the Linux version ready by today but I need to run all my regression tests on the rewritten version on both Windows and Linux, which should be done in a few days. Sorry for the delay, Linux users, I’m almost there.
No-, the element stiffness routines in the simplified version in SRlibSimple are not thread-safe. To be thread-safe, the same memory cannot be accessed from adjacent threads. That is true for the element routines in the SRelement class with one exception- they use scratch space “eldata” provided by Srmodel. There is only one copy of this space. In the “full” version of StressRefine, the elements can be run multi-threaded because copies of the scratch space are provided for each thread. The elements are called from Srmodel in routine CalculateElementStiffnessesOmp, which calls the element routines inside “#pragma omp parallel”. There is a call to omp_get_thread_num() which is used to determine which thread each element is in, that is passed in to each element’s stiffness routine. Using this the element chooses the correct copy of the scratch space. Omp, short for OpenMp, is an easy to use parallel processing approach used in stressRefine.
By the end of lesson 3, we have seen how to calculate basis functions for elements whose edges have arbitrary polynomial orders, and how to use those functions in element stiffness routines and stress calculations. To apply this to adaptive stress analysis, we need to more ingredients
A means for estimating errors in the solution
A way to use the errors to estimate required polynomial orders for the next solution in the iterative solution process.
A way to determine if the process has converged
First let me review that there are to main approaches to p-refinement, uniform and adaptive, which were described in detail here. In addition, the refinement process can be controlled differently in various regions of the model, known as local adaptivity. We can, for example, refine more in a region surrounding a critical stress concentration, or in a single part of an assembly.
Uniform refinement is a subset of adaptive refinement, in which all elements are set to the same updated p-order. In the simplest implementation, the refinement can be done in a loop, increasing the p-order uniformly by one in each iteration. This discussion will be for the more general case of adaptive refinement. Errors are estimated for each element, for which the polynomial orders are increased separately. At shared faces the polynomial order is set to the higher value from adjacent elements. For example, if an element that remains at p2 shares a face with an element that will require p4 in the next solution, the edges of the share face are set to p4. As discussed previously, convergence is not mathematically guaranteed for p-adaptive refinement but it works well in practice and is considerably less computationally intensive than uniform refinement.
There are two types of error estimates available: multi-pass and single-pass. In multi-pass, you can compare the values from the current solution to a previous solution. For example, how much did the stress change at a point in an element between this and the previous solution? Changes in other measures like strain energy or displacement can also be used. One several passes have been made, there are techniques to extrapolate the measures using the results from all passes .
In single-pass adaptivity, information from a single solution is used to estimate the error. For example, we know that in the exact solution, Stress (more precisely traction) should be continuous. The stress from adjacent elements can be used to calculate the traction at share faces. This should be the same, so the discrepancy between the traction from the adjacent elements is a good error estimator. Also, at free boundaries the traction should be zero, while at boundaries with distributed loading, the traction should equal the applied value. The discrepancy between the computed traction and the applied boundary value is another error estimate. Traction discrepancies is one of the two main error estimators used in stressRefine.
Another famous error estimator is the Zienkewicz-Zhu norm . As I discussed here, smoothed stress results are compared to directly computed (“raw”) stress results. The discrepancy gives an error estimate. This is the second estimator used by stressRefine. The worst-case of the traction error estimate and the Zienkewicz-Zhu estimate is used to determine the required p-order for the next solution pass. A problem with the Zienkewicz-Zhu estimate is that is has been found to not always be reliable at p2 . StressRefine starts ar p2, because that is the order equivalent to conventional quadratic finite elements used for solids. So a reliable estimator at p2 is important. Using the traction estimator in additon to the Zienkewicz-Zhu estimator has been found in practice to work well to alleviate this problem.
Traction estimator: Calculate the components of the stress tensor at a point on an element face. The traction is the dot product between the stress and the normal to the face at that point. This is repeated for the adjacent element sharing the face. The maximum “jump” (discrepancy between adjacent elements) in any of the 3 traction components at that point is used as the error estimate. At an unconstrained boundary, the maximum discrepancy for any component between the calculated traction and the applied load (or 0 for a free boundary), is used as the estimate. This estimate is calculated at multiple sampling points on a face, and the worst value of the error from all sample points is used.
Zienkewicz-Zhu estimator: Although the original Zienkewicz-Zhu estimator was for stress, smoothed vs. raw strains are used in stressRefine, because strain is continuous in models with multiple materials, while stress is not. A various sample points within an element, the strain is directly computed. This is compared with the smoothed strain at the same point, and the discrepancy is the error measure. The maximum error for any strain component, at any of the sample points, is used for the error estimate.
Normalizing the Error Estimate:
We want to use a percentage, or equivalently a normalized error for convergence and estimating required p-order. For traction, it seems logical to use the maximum stress in the element for which the error has been calculated. This can be too conservative, especially for elements with low stress. There can be slight noise in the solution which would be magnified if divided by the local low stress. A better value to use is the maximum stress in the model. The problem there is if the maximum stress is at a singularity, where it is physically meaningless. This is avoided in stressRefine by identifying singularities automatically and excluding them when calculating the max stress in the model. For the Zienkewicz-Zhu estimate, where the discrepancy is in strain, the maximum strain in the model is used (again ignoring singularities). For models with multiple materials, the maximum for all elements that have the same material as the current element is used. von Mises stress and strain are used for normalization.
Estimating require p order from current errors
If we define ε1 as the error estimated from the previous solution, and ε2 as the desired error in the next solution, the the ratio ε1/ε2 can be used to estimate the required p-order, p2, for the next solution. For this purpose stressRefine uses the conservative estimate from :
ε2/ε1 ≅ (p1/p2)p1 (1)
This can be manipulated to yield
p2 ≅ p1(ε1/ε2) 1/p1 (2)
p1 and ε1 are known, and ε2 is set to the desired error tolerance. This is carried out in FindRequiredPOrder in the SRerrorCheck class in stressRefine (SRerrorCheck.cpp).
In many models, a significant percentage of the elements do not have the polynomial orders of any of their edges raised because their errors are low. So on subsequent solution passes, their stiffness matrices do not need to be computed. For elements that have had changes in their p-order, the element stiffness matrix from the previous solution is a subset of the stiffness matrix for the next solution, because of the use of hierarchical basis functions. So for these elements only the stiffness components associated with the new functions introduced by increasing the p-order need to be computed.
As mentioned above, it is important to exclude elements adjacent to singularities when calculating the maximum stress in the model. The stress at singularities is not physically meaningful because it is theoretically infinite, so these elements are also exclude from p-adaptivity. Various types of singularities, such as point loads and constraints and reentrant corners, are pictured and descrubed here. These are detected automatically in class SRerrorCheck in routine AutoSacrificialElements, and the elements that touch the singularities are flagged as sacrificial. They remain at p2 and are skipped when calculating max stress in the model.
Convergence: Single- and multi-pass
The adaptive algorithm can be terminated when the error estimates at all elements (which are the measures from a single analysis pass described above) are within a specified tolerance. In practice this can be overly conservative. An additional measure is how measures such as the max stress in the model change between the previous and current solution pass. This is often less conservative, so the minimum of multi-pass estimates and single-pass is used to estimate when the adaptivity algorithm is complete. It has also been found in practice that a total of 3 solutions, the initial one at p2 and two subsequent solutions with adapted p-orders, is enough to achieve accurate stresses for the vast majority of models, so this is the default in stressRefine. This default was used for all the cass in the validation manual.
Putting it all together, the adaptivity algorithm looks like this:
This is carried out in class SRmodel in routine “run” in the full version of stressRefine. In that version SRmodel also includes analysis features. When I split stressRefine into library and executable branches, model entities like nodes,edges, faces, and elements were kept in SRmodel in the library, while analysis responsibilities were moved to a new class SRanalysis in the executable. So in that version the adaptive analysis routine “run” is in class SRanalysis.
Lessons 1 through 4 have provided all the ingredients for understanding how p-adaptivity is implemented. In future lessons I’ll go into more detail on some nuances of applying p-adaptivity using conventional finite element meshes. I’ll also cover how breakout regions can automatically be identified.
Homework problem: using equation (2) above, if the normalized error in an element that is currently at p2 is 0.5, and the error tolerance is 5 percent (0.05), what is the estimate of the required p order in the next solution pass. Comment on how aggressive the use of this estimate is compared to simply increasing the p order by 1 on each solution pass.
Szabo, B, and Babusca, I, Finite Element Analysis, Wiley, NY, 1991
Zienkiewicz, O, and Taylor, R, The Finite Element Method, Butterworth Heinemann, Oxford, 2000.
Zienkiewicz, O, Zhu, J, and Gong, N, “Effective and Practical h-p-Version Adaptive Analysis Procedures for The Finite Element Method, Int J Num Meth Engr, 1989