[Note: This paper contains information and proposals which may be outdated. It is retained for its discussion of the problems pertaining to Forth-94 REPRESENT. For the latest information please refer to the current proposal.]
2005-10-09 Background ---------- Forth-94 has few floating point display functions [none in the Floating-Point wordset, three in the Extension wordset]. To allow users to create their own output functions the Standard provides the float-to-string primitive REPRESENT . Though intended to be portable, REPRESENT 's loose definition has given rise to implementations that differ in critical respects. It is also lacks adequate functionality to implement fixed-point notation display. These deficiencies create significant obstacles for programmers attempting to write portable applications based on REPRESENT . While "work-arounds" may be used to minimize the problems, they tend to be complex and cumbersome (ref. 2). It is perhaps inevitable that REPRESENT will need revision in the future. To that end, I wish to propose the following redefinition. It addresses all the key issues while remaining compatible with the Forth-94 specification. As the changes largely reflect common practice, compatibility with existing applications is unlikely to be affected. Redefinition ------------ 12.6.1.xxxx REPRESENT FLOATING ( c-addr u -- n flag1 flag2 ) (F: r -- ) or ( r c-addr u -- n flag1 flag2 ) At c-addr, place the character-string external representation of the significand of the floating-point number r. Return the decimal-base exponent as n, the sign as flag1 and valid result as flag2. If u is greater than zero the character string shall consist of the u most significant digits of the significand represented as a decimal fraction with the implied decimal point to the left of the first digit, and the first digit zero only if all digits are zero. The significand is rounded to u digits. If u is zero the string shall consist of one digit representing the fractional significand r rounded to a whole number, either one or zero, with an implied decimal point to the left of the digit. Rounding follows the round to nearest rule; n is adjusted, if necessary, to correspond to the rounded magnitude of the significand. If r is zero or evaluates to zero after rounding, then n is 1 and the sign is implementation-defined. If flag2 is true then r was in the implementation-defined range of floating-point numbers. If flag1 is true then r is negative. An ambiguous condition exists if the value of BASE is not decimal ten. When flag2 is false, n and flag1 are implementation-defined, as are the contents of c-addr. The string at c-addr shall consist of graphic characters left-justified with any unused positions to the right filled with space characters. The number of characters is one if u is zero, or u characters otherwise. Rationale --------- 1. Floating point zero issues a) Exponent value Forth-94 does not explicitly indicate the exponent value (n) that should be returned in the case of floating-point zero. Mathematically any value would do since zero raised to any power remains zero. This has led to portability issues with various implementations returning n = 0, 1 or other value. From a number display perspective however, the exponent value is important since it determines where the decimal point shall be shown. The new specification requires REPRESENT to return n = 1. This corresponds to the common convention that zero has an implied decimal point to the right of the digit i.e. "0." [C library function ecvt similarly returns 1 for the exponent value - presumably for the same reasons.] b) Negative zero Certain floating point representations such as IEEE-754 allow "negative zero". While Forth-94 is silent regarding "negative zero" there appears to be nothing that would prevent its use. See Implementation Note: Forths that choose to implement negative zero should do so consistently. Not only must REPRESENT return the appropriate sign, so also should >FLOAT FROUND F. FS. FE. 2. Out of range numbers Forth-94 states that when flag2 = false, the number is not within the defined range and a user-defined string shall be returned in the buffer. a) String contents "the string at c-addr shall consist of graphic characters" Nothing is stated regarding the string's length, alignment or padding. Most forths adopt the following practice - - The length of the returned string is the same as when flag2 = true i.e. the length is determined by u. - Characters are positioned left-justified within the buffer with any unused positions to the right filled with space characters. A users may subsequently trim off the padding spaces with -TRAILING e.g. ( r ) PAD u REPRESENT ... PAD u -TRAILING ( c-addr u2 ) b) Value of n and flag1 "n and flag1 are implementation defined" Some forths have REPRESENT return a basic string ('NAN' 'INF' etc) and then use flag1 to later indicate the sign. While such practices are permitted under Forth-94, they are not portable. Portable applicationa can only make use of the returned string. If a sign needs to be passed to an application then it must be included in the string e.g. '+NAN' '-INF' c) String length A difficulty with the present system is that the returned string becomes increasingly truncated as u decreases. At one or two characters most strings are unintelligible. The situation could be alleviated somewhat by requiring REPRESENT to return a minimum string of [say] five characters. This would give a programmer greater scope in handling "not a number" situations since a usable string would always be available e.g. ( r ) PAD u REPRESENT 0= IF PAD u 5 MAX ... THEN Such a requirement could, however, break existing code and therefore has NOT been included in the new specification. It is mentioned here only for purposes of opening up discussion on the topic. 3. Rounding Applications often require floating point numbers to be displayed rounded to a lesser precision than the internal maximum allows. In Forth-94 this rounding is performed through REPRESENT with parameter 'u' controlling the amount of rounding e.g. ( r ) PAD u REPRESENT 2DROP PAD u TYPE SPACE . r u string n (exponent) -- -- ------ -- 0.6489 4 '6489' 0 0.6489 3 '649' 0 0.6489 2 '65' 0 0.6489 1 '6' 0 But what if we need to display 0.6489 rounded to 0 decimal places? This would require fraction .6489 be rounded to the nearest whole number i.e. 1. Such situations arise when displaying fixed-point notation to a given number of decimal places. Failure to round the entire significand when appropriate leads to incorrect results. Here are some examples. Display 0.009 to 2 decimal places in a field width of 5 characters - 0.009E 2 5 F.R 0.00 ok ( FPOUT 1.6 and prior ) 0.009E 5 2 0 F.RDP 0.00 ok ( Gforth 0.6.2 ) ( The result should have been 0.01 ) REPRESENT currently has no provision for such rounding and it is a serious omission. Luckily we can add the missing functionality by simply allowing 'u' to take the value zero e.g. ( r ) PAD u REPRESENT 2DROP PAD u 1 MAX TYPE SPACE . r u string n (exponent) -- -- ------ -- 0.6489 4 '6489' 0 0.6489 3 '649' 0 0.6489 2 '65' 0 0.6489 1 '6' 0 0.6489 0 '1' 1 As the above table shows we are merely extending REPRESENT in a logical direction. This makes it easy to implement and use. Usage: Existing applications using REPRESENT may be left alone and they will continue to function as before. Applications that will benefit from REPRESENT 's new rounding facility are those in which parameter 'u' takes the value zero. Previously such applications needed 1 MAX inserted before REPRESENT to mask the fact that it could not handle u = 0 e.g. c-addr u ... 1 MAX REPRESENT ... c-addr u ... Making these applications work correctly will require the following code re-arrangement in addition to the replacement REPRESENT . c-addr u ... REPRESENT ... c-addr u 1 MAX ... By way of demonstration, let's apply these principles to our earlier Gforth example. Step 1. Load a version of REPRESENT that is compliant with the new specification (the sample one given below will do). Step 2. Edit the source for F.RDP (located in Gforth 0.6.2 distribution file "stuff.fs"). Change line 211 from "1 max ur min" to "0 max ur min" and then re-load. Step 3. Trying our previous example: 0.009E 5 2 0 F.RDP 0.01 ok It now functions correctly. Implementation -------------- Implementing the new specification should not be difficult. A sample REPRESENT which includes all the critical features is given below. Tips: - Forth implementations that have "negative zero", and wish to display it, should ensure REPRESENT returns the appropriate sign even when rounding produces a result of zero e.g. -0.4E PAD 0 REPRESENT PAD 1 TYPE DROP SPACE . 0 -1 ok 0.4E PAD 0 REPRESENT PAD 1 TYPE DROP SPACE . 0 0 ok - "Forths written in C" may be able to implement REPRESENT using library functions fcvt and ecvt. A sample REPRESENT \ Assumes flag2 is always true and MPREC digits can be held \ as a double number. "Negative zero" is not implemented. 7 VALUE MPREC \ your maximum precision 2VARIABLE EXP \ exponent & sign : REPRESENT ( c-addr u -- n flag1 flag2 ) ( F: r -- ) 2DUP [CHAR] 0 FILL MPREC MIN 2>R FDUP F0< 0 EXP 2! FABS FDUP F0= 0= BEGIN WHILE FDUP 1.0E F< 0= IF 10.0E F/ 1 ELSE FDUP 0.1E F< IF 10.0E F* -1 ELSE 0 THEN THEN DUP EXP +! REPEAT 1.0E R@ 0 ?DO 10.0E F* LOOP F* FROUND F>D 2DUP <# #S #> DUP R@ - EXP +! 2R> ROT MIN 1 MAX CMOVE D0= EXP 2@ SWAP ROT IF 2DROP 1 0 THEN \ 0.0E fix-up TRUE ; Summary ------- A portable and functional REPRESENT reduces the burden on the programmer. It eliminates the need for work-arounds, simplifies application code and hides system-specific detail such as negative zero and rounding method. As author of the FPOUT package, it is perhaps fitting to finish off with a demonstration of what the code would have been had the REPRESENT proposed here been available. In the following (a) shows original subroutine (F1) with work-arounds to make it portable; while (b) shows (F1) as it would be using the proposed REPRESENT . a) Original subroutine (F1) with portability "work-arounds" 0 VALUE NZ# ( initialized to true if REPRESENT responds to "negative zero"; or false otherwise ) \ float to ascii : (F1) ( F: r -- ) ( places -- c-addr u flag ) TO PL# PRECISION TO BS# FDUP FBUF BS# REPRESENT SWAP ( r exp flag2 sgn ) \ save sign for negative zero systems [ NZ# ] [IF] TO NZ# [ELSE] DROP [THEN] FBUF C@ [CHAR] 0 = IF ( r=0 ) >R DROP FDROP 1 NZ# R> ( exp sgn flag2 ) ELSE AND ( exp & flag2 ) PL# 0< IF DROP PRECISION ELSE EF# 0> IF 1- (F0) DROP 1+ THEN PL# + THEN DUP ( size ) 0= >R 1 MAX PRECISION MIN TO BS# FBUF R@ IF PRECISION ELSE BS# THEN REPRESENT DUP R> AND IF ( flag2 & size=0 ) >R FBUF C@ DUP [CHAR] 5 = FBUF PRECISION 1 /STRING (T0) NIP 0= AND SWAP [CHAR] 5 < OR IF 2DROP 1 NZ# [CHAR] 0 ELSE SWAP 1+ SWAP [CHAR] 1 THEN FBUF C! R> THEN THEN >R TO SN# 1- TO EX# FBUF BS# -TRAILING R> <# ; b) Replacement subroutine (F1) using the new REPRESENT \ float to ascii : (F1) ( F: r -- ) ( places -- c-addr u flag ) TO PL# FDUP FBUF PRECISION REPRESENT NIP AND PL# 0< IF DROP PRECISION ELSE EF# 0> IF 1- (F0) DROP 1+ THEN PL# + THEN 0 MAX PRECISION MIN TO BS# FBUF BS# REPRESENT >R TO SN# 1- TO EX# FBUF BS# 1 MAX -TRAILING R> <# ; References ---------- 1. ANS Forth-94 Standard 2. FPOUT - a floating point output package ftp.taygeta.com/pub/Forth/Applications/ANS/fpout18.f Addendum -------- 1. Common questions answered Q. If only one float-to-string primitive is needed then why do certain C language libraries provide several? A. A good question and one that should be asked of the library designers! As applications such as FPOUT demonstrate it is not only easy to reproduce the functionality of ecvt fcvt gcvt etc with but a single primitive, it is more efficient to do so. Q. Rather than change REPRESENT isn't it better to leave it alone and introduce another function like fcvt which has the necessary rounding required for fixed-point notation? A. This is a restatement of the previous question in a different form. Reflection will reveal that in order to implement fcvt it would require, at minimum, all the functionality of the proposed REPRESENT . In other words, fcvt is not a primitive at all but rather an application built upon an underlying primitive. Q. "c-addr u" is a buffer address and length. The proposal has u = 0 write a character beyond the end of the buffer. A. The premise that "c-addr u" represents a buffer and length into which REPRESENT writes is incorrect. ANS defines c-addr as the buffer address where the character string is placed, and u as the number of "most significant digits" that shall be represented by the string. It was always the programmer's responsibility to ensure that adequate space was allocated to the buffer beforehand. Q. If u represents the number of significant digits then surely u = 0 means that no characters or a null string should result? A. This assumes that u is a length - which it is not. The usual result of rounding a number to n significant digits is another number which can be subsequently represented as an ascii string. Before one can assert a null string should result, one would need to demonstrate there exists a number which it represents and that this number is the valid result of rounding to zero significant digits. Q. Most forths currently have REPRESENT return a null string when u = 0. Even though such behaviour may be technically in doubt, won't changing it affect existing applications? A. No. It should be remembered that it is the application and not REPRESENT that determines how many characters will be extracted from the buffer. So, if an application [mistakenly] chooses to extract u characters when u = 0 then the result will be the same irrespective of which REPRESENT was used. Having said that, I am unaware of any applications which actually use the null string. Q. If the ANS requirements at u = 0 are undefined then what makes you so certain the rounding behaviour you're advocating should be the adopted one? A. Put simply, it produces the right outcome in applications on every occasion without contrivance or work-arounds. The proposed rounding turns REPRESENT into a true universal primitive - one that can be used to build any float-to-string function. History ------- 2005-09-02 First release 2005-10-09 Clarify number of characters at c-addr when flag2 is false. REPRESENT code cleaned-up. Addendum added.
Page updated: 14 Feb 2011