Abstract
From its introduction in the 1980s, the IEEE-754 standard for floating-point arithmetic has ably served a wide range of scientists and engineers. Even today, the vast majority of numerical computations employ either IEEE single or IEEE double, typically one or the other exclusively in a single application. However, recent developments have exhibited the need for a broader range of precision levels, and a varying level of precision within a single application. There are clear performance advantages to a variable precision framework: faster processing, better cache utilization, lower memory usage, and lower long-term data storage. But effective usage of variable precision requires a more sophisticated mathematical framework, together with corresponding software tools and diagnostic facilities.
At the low end, the explosive rise of graphics, artificial intelligence, and machine learning has underscored the utility of reduced precision levels. Accordingly, an IEEE 16-bit "half" precision standard has been specified, with five exponent bits and ten mantissa bits. Many in the machine learning community are using the "bfloat16" format, which has eight exponent bits and seven mantissa bits. Hardware such as NVIDIA's tensor core units can take advantage of these formats to significantly increase processing rates.
At the same time, researchers in the high-performance computing (HPC) field, in a drive to achieve exascale computing, are considering mixed-precision, such as in iterative refinement calculations where initial iterations are performed using half- or single-precision. Along this line, recognizing that for many simulations much of the data stored in a IEEE 64-bit double precision variable has low information content, researchers are exploring the use of lossy floating point compression, not only for I/O, but also for storing solution state variables.
Exascale computing has also exposed the need for even greater precision than IEEE 64-bit double in some cases, because greatly magnified numerical sensitivities often mean that one can no longer be certain that results are numerically reliable. One remedy is to use IEEE 128-bit quad precision in selected portions of the computation, which is now available via software in some compilers, notably the gfortran compiler. As a single example, researchers at Stanford have had remarkable success in using quad precision in multiscale linear programming applications in biology.
There has also been a rise in the usage of very high precision (hundreds or even thousands of digits). For example, numerous new results have been discovered by computing mathematical expressions to very high precision, and then using integer relation algorithms such as the "PSLQ" algorithm to recognize these numerical values in terms of simple mathematical formulas. Among the results that have been discovered in this fashion are new formulas connecting mathematical constants and the elucidation of polynomials connected to the Poisson potential function of mathematical physics (the latter requiring up to 64,000-digit precision). Such computations are most efficiently performed using a dynamically varying level of precision, doing as much computation as possible with standard precision and only invoking very high precision when necessary.
In summary, although the IEEE 754 floating-point standard has served the mathematical, scientific and engineering world very well for over 30 years, we now are seeing rapidly growing demand for reduced precision (machine learning, neural nets, graphics, etc.), a growing need for mixed 32-64-bit precision, and also a need for greater than 64-bit, all typically varying within a given application. To the extent that IEEE-754 fails to adequately meet new demands such as these, researchers are considering completely different alternatives, for which a flexible precision level is a fundamental feature of the design, and are exploring new mathematical and software frameworks to better understand and utilize such facilities.
This workshop is fully funded by a Simons Foundation Targeted Grant to Institutes.
