REPRESENT - A Report and ANS FORTH Proposal

[Note: This paper contains information and proposals which may be outdated. It is retained for its
discussion of the problems pertaining to Forth-94 REPRESENT. For the latest information please
refer to the current proposal.]

2005-10-09

Background
----------
Forth-94 has few floating point display functions [none in the
Floating-Point wordset, three in the Extension wordset].  To
allow users to create their own output functions the Standard
provides the float-to-string primitive  REPRESENT .

Though intended to be portable,  REPRESENT 's loose definition
has given rise to implementations that differ in critical
respects.  It is also lacks adequate functionality to implement
fixed-point notation display.

These deficiencies create significant obstacles for programmers
attempting to write portable applications based on  REPRESENT .
While "work-arounds" may be used to minimize the problems, they
tend to be complex and cumbersome (ref. 2).

It is perhaps inevitable that  REPRESENT  will need revision
in the future.  To that end, I wish to propose the following
redefinition.  It addresses all the key issues while remaining
compatible with the Forth-94 specification.  As the changes
largely reflect common practice, compatibility with existing
applications is unlikely to be affected.

Redefinition
------------

   12.6.1.xxxx  REPRESENT
   FLOATING

      ( c-addr u -- n flag1 flag2 )  (F: r -- )
      or ( r c-addr u -- n flag1 flag2 )

   At c-addr, place the character-string external representation
   of the significand of the floating-point number r.  Return the
   decimal-base exponent as n, the sign as flag1 and valid result
   as flag2.

   If u is greater than zero the character string shall consist
   of the u most significant digits of the significand represented
   as a decimal fraction with the implied decimal point to the
   left of the first digit, and the first digit zero only if all
   digits are zero.  The significand is rounded to u digits.

   If u is zero the string shall consist of one digit representing
   the fractional significand r rounded to a whole number, either
   one or zero, with an implied decimal point to the left of the
   digit.

   Rounding follows the round to nearest rule; n is adjusted, if
   necessary, to correspond to the rounded magnitude of the
   significand.  If r is zero or evaluates to zero after rounding,
   then n is 1 and the sign is implementation-defined.

   If flag2 is true then r was in the implementation-defined range
   of floating-point numbers.  If flag1 is true then r is negative.

   An ambiguous condition exists if the value of BASE is not
   decimal ten.

   When flag2 is false, n and flag1 are implementation-defined,
   as are the contents of c-addr.  The string at c-addr shall
   consist of graphic characters left-justified with any unused
   positions to the right filled with space characters.  The
   number of characters is one if u is zero, or u characters
   otherwise.

Rationale
---------

1. Floating point zero issues

a) Exponent value

   Forth-94 does not explicitly indicate the exponent value (n)
   that should be returned in the case of floating-point zero.

   Mathematically any value would do since zero raised to any
   power remains zero.  This has led to portability issues with
   various implementations returning n = 0, 1 or other value.

   From a number display perspective however, the exponent value
   is important since it determines where the decimal point shall
   be shown.

   The new specification requires  REPRESENT  to return n = 1.
   This corresponds to the common convention that zero has an
   implied decimal point to the right of the digit i.e. "0."

   [C library function  ecvt  similarly returns 1 for the
   exponent value - presumably for the same reasons.]

b) Negative zero

   Certain floating point representations such as IEEE-754
   allow "negative zero".  While Forth-94 is silent regarding
   "negative zero" there appears to be nothing that would
   prevent its use.  See Implementation

   Note: Forths that choose to implement negative zero should
   do so consistently.  Not only must  REPRESENT  return the
   appropriate sign, so also should  >FLOAT FROUND F. FS. FE.

2. Out of range numbers

   Forth-94 states that when flag2 = false, the number is not
   within the defined range and a user-defined string shall be
   returned in the buffer.

a) String contents

   "the string at c-addr shall consist of graphic characters"

   Nothing is stated regarding the string's length, alignment
   or padding.  Most forths adopt the following practice -

   - The length of the returned string is the same as when
     flag2 = true  i.e. the length is determined by u.

   - Characters are positioned left-justified within the buffer
     with any unused positions to the right filled with space
     characters.  A users may subsequently trim off the padding
     spaces with  -TRAILING  e.g.

     ( r ) PAD u REPRESENT ... PAD u -TRAILING ( c-addr u2 )

b) Value of n and flag1

   "n and flag1 are implementation defined"

   Some forths have  REPRESENT  return a basic string ('NAN'
   'INF' etc) and then use flag1 to later indicate the sign.
   While such practices are permitted under Forth-94, they are
   not portable.

   Portable applicationa can only make use of the returned
   string.  If a sign needs to be passed to an application
   then it must be included in the string e.g. '+NAN' '-INF'

c) String length

   A difficulty with the present system is that the returned
   string becomes increasingly truncated as u decreases.  At
   one or two characters most strings are unintelligible.

   The situation could be alleviated somewhat by requiring
   REPRESENT  to return a minimum string of [say] five
   characters.  This would give a programmer greater scope
   in handling "not a number" situations since a usable
   string would always be available e.g.

     ( r )  PAD u REPRESENT  0= IF  PAD u 5 MAX ...  THEN

   Such a requirement could, however, break existing code and
   therefore has NOT been included in the new specification.
   It is mentioned here only for purposes of opening up
   discussion on the topic.

3. Rounding

   Applications often require floating point numbers to be
   displayed rounded to a lesser precision than the internal
   maximum allows.

   In Forth-94 this rounding is performed through  REPRESENT
   with parameter 'u' controlling the amount of rounding e.g.

     ( r )  PAD u REPRESENT 2DROP  PAD u TYPE  SPACE .

     r         u      string    n (exponent)
     --        --     ------    --
     0.6489    4      '6489'    0
     0.6489    3      '649'     0
     0.6489    2      '65'      0
     0.6489    1      '6'       0

   But what if we need to display 0.6489 rounded to 0 decimal
   places?  This would require fraction .6489 be rounded to
   the nearest whole number i.e. 1.

   Such situations arise when displaying fixed-point notation
   to a given number of decimal places.  Failure to round the
   entire significand when appropriate leads to incorrect
   results.  Here are some examples.

   Display 0.009 to 2 decimal places in a field width of 5
   characters -

     0.009E  2 5 F.R       0.00 ok  ( FPOUT 1.6 and prior )
     0.009E  5 2 0 F.RDP   0.00 ok  ( Gforth 0.6.2 )

   ( The result should have been 0.01 )

   REPRESENT  currently has no provision for such rounding
   and it is a serious omission.

   Luckily we can add the missing functionality by simply
   allowing 'u' to take the value zero e.g.

     ( r )  PAD u REPRESENT 2DROP  PAD u 1 MAX TYPE  SPACE .

     r         u      string    n (exponent)
     --        --     ------    --
     0.6489    4      '6489'    0
     0.6489    3      '649'     0
     0.6489    2      '65'      0
     0.6489    1      '6'       0
     0.6489    0      '1'       1

   As the above table shows we are merely extending  REPRESENT
   in a logical direction.  This makes it easy to implement
   and use.

   Usage:

   Existing applications using  REPRESENT  may be left alone
   and they will continue to function as before.

   Applications that will benefit from  REPRESENT 's new
   rounding facility are those in which parameter 'u' takes
   the value zero.

   Previously such applications needed  1 MAX  inserted before
   REPRESENT  to mask the fact that it could not handle u = 0
   e.g.

     c-addr u ... 1 MAX REPRESENT ... c-addr u ...

   Making these applications work correctly will require the
   following code re-arrangement in addition to the replacement
   REPRESENT .

     c-addr u ... REPRESENT ... c-addr u  1 MAX ...

   By way of demonstration, let's apply these principles to our
   earlier Gforth example.

   Step 1.  Load a version of  REPRESENT  that is compliant with
            the new specification (the sample one given below
            will do).

   Step 2.  Edit the source for  F.RDP  (located in Gforth 0.6.2
            distribution file "stuff.fs").  Change line 211 from
            "1 max ur min" to "0 max ur min" and then re-load.

   Step 3.  Trying our previous example:

            0.009E  5 2 0 F.RDP   0.01 ok

            It now functions correctly.

Implementation
--------------
Implementing the new specification should not be difficult.  A
sample  REPRESENT  which includes all the critical features is
given below.

Tips:

 - Forth implementations that have "negative zero", and wish to
   display it, should ensure  REPRESENT  returns the appropriate
   sign even when rounding produces a result of zero e.g.

   -0.4E PAD 0 REPRESENT PAD 1 TYPE  DROP SPACE .   0 -1 ok
    0.4E PAD 0 REPRESENT PAD 1 TYPE  DROP SPACE .   0  0 ok

 - "Forths written in C" may be able to implement  REPRESENT
   using library functions fcvt and ecvt.

A sample REPRESENT

  \ Assumes flag2 is always true and MPREC digits can be held
  \ as a double number. "Negative zero" is not implemented.

   7 VALUE MPREC  \ your maximum precision
   2VARIABLE EXP  \ exponent & sign

   : REPRESENT  ( c-addr u -- n flag1 flag2 ) ( F: r -- )
     2DUP [CHAR] 0 FILL
     MPREC MIN  2>R
     FDUP F0<  0 EXP 2!
     FABS  FDUP F0= 0=
     BEGIN  WHILE
       FDUP 1.0E F< 0= IF
         10.0E F/
         1
       ELSE
         FDUP 0.1E F< IF
           10.0E F*
           -1
         ELSE
           0
         THEN
       THEN
       DUP EXP +!
     REPEAT
     1.0E  R@ 0 ?DO 10.0E F* LOOP  F*
     FROUND F>D
     2DUP <# #S #>  DUP R@ - EXP +!
     2R>  ROT MIN 1 MAX CMOVE
     D0=  EXP 2@ SWAP  ROT IF 2DROP 1 0 THEN  \ 0.0E fix-up
     TRUE ;

Summary
-------
A portable and functional  REPRESENT  reduces the burden on
the programmer.  It eliminates the need for work-arounds,
simplifies application code and hides system-specific detail
such as negative zero and rounding method.

As author of the FPOUT package, it is perhaps fitting to
finish off with a demonstration of what the code would have
been had the  REPRESENT  proposed here been available.
In the following (a) shows original subroutine (F1) with
work-arounds to make it portable; while (b) shows (F1) as
it would be using the proposed  REPRESENT .

a) Original subroutine (F1) with portability "work-arounds"

   0 VALUE NZ#  ( initialized to true if REPRESENT responds
                  to "negative zero"; or false otherwise )

   \ float to ascii
   : (F1)  ( F: r -- ) ( places -- c-addr u flag )
     TO PL#  PRECISION TO BS#
     FDUP FBUF BS# REPRESENT SWAP ( r exp flag2 sgn )
     \ save sign for negative zero systems
     [ NZ# ] [IF]  TO NZ#  [ELSE]  DROP  [THEN]
     FBUF C@ [CHAR] 0 = IF ( r=0 )
       >R  DROP FDROP  1 NZ#  R> ( exp sgn flag2 )
     ELSE
       AND ( exp & flag2 )  PL# 0< IF
         DROP PRECISION
       ELSE
         EF# 0> IF  1- (F0) DROP 1+  THEN  PL# +
       THEN
       DUP ( size ) 0= >R  1 MAX  PRECISION MIN  TO BS#
       FBUF R@ IF  PRECISION  ELSE  BS#  THEN  REPRESENT
       DUP  R> AND IF ( flag2 & size=0 )
         >R  FBUF C@  DUP [CHAR] 5 =
         FBUF PRECISION  1 /STRING (T0) NIP 0=  AND
         SWAP [CHAR] 5 <  OR
         IF    2DROP  1 NZ#  [CHAR] 0
         ELSE  SWAP 1+ SWAP  [CHAR] 1
         THEN  FBUF C!  R>
       THEN
     THEN
     >R  TO SN#  1- TO EX#  FBUF BS#  -TRAILING  R> <# ;

b) Replacement subroutine (F1) using the new REPRESENT

   \ float to ascii
   : (F1)  ( F: r -- ) ( places -- c-addr u flag )
     TO PL#  FDUP FBUF PRECISION REPRESENT NIP AND
     PL# 0< IF
       DROP PRECISION
     ELSE
       EF# 0> IF  1- (F0) DROP 1+  THEN  PL# +
     THEN  0 MAX  PRECISION MIN  TO BS#
     FBUF BS# REPRESENT >R  TO SN#  1- TO EX#
     FBUF BS#  1 MAX  -TRAILING  R> <# ;

References
----------
1. ANS Forth-94 Standard

2. FPOUT - a floating point output package
   ftp.taygeta.com/pub/Forth/Applications/ANS/fpout18.f

Addendum
--------
1. Common questions answered

Q. If only one float-to-string primitive is needed then why do
   certain C language libraries provide several?

A. A good question and one that should be asked of the library
   designers!  As applications such as FPOUT demonstrate it is
   not only easy to reproduce the functionality of ecvt fcvt gcvt
   etc with but a single primitive, it is more efficient to do so.

Q. Rather than change  REPRESENT  isn't it better to leave it
   alone and introduce another function like fcvt which has the
   necessary rounding required for fixed-point notation?

A. This is a restatement of the previous question in a different
   form.  Reflection will reveal that in order to implement fcvt
   it would require, at minimum, all the functionality of the
   proposed  REPRESENT .  In other words, fcvt is not a primitive
   at all but rather an application built upon an underlying
   primitive.

Q. "c-addr u" is a buffer address and length.  The proposal has
   u = 0 write a character beyond the end of the buffer.

A. The premise that "c-addr u" represents a buffer and length
   into which  REPRESENT  writes is incorrect.  ANS defines
   c-addr as the buffer address where the character string is
   placed, and u as the number of "most significant digits"
   that shall be represented by the string.  It was always the
   programmer's responsibility to ensure that adequate space
   was allocated to the buffer beforehand.

Q. If u represents the number of significant digits then surely
   u = 0 means that no characters or a null string should result?

A. This assumes that u is a length - which it is not.  The usual
   result of rounding a number to n significant digits is another
   number which can be subsequently represented as an ascii string.
   Before one can assert a null string should result, one would
   need to demonstrate there exists a number which it represents
   and that this number is the valid result of rounding to zero
   significant digits.

Q. Most forths currently have  REPRESENT  return a null string
   when u = 0.  Even though such behaviour may be technically in
   doubt, won't changing it affect existing applications?

A. No.  It should be remembered that it is the application and
   not  REPRESENT  that determines how many characters will be
   extracted from the buffer.  So, if an application [mistakenly]
   chooses to extract u characters when u = 0 then the result
   will be the same irrespective of which  REPRESENT  was used.
   Having said that, I am unaware of any applications which
   actually use the null string.

Q. If the ANS requirements at u = 0 are undefined then what makes
   you so certain the rounding behaviour you're advocating should
   be the adopted one?

A. Put simply, it produces the right outcome in applications on
   every occasion without contrivance or work-arounds.  The
   proposed rounding turns  REPRESENT  into a true universal
   primitive - one that can be used to build any float-to-string
   function.

History
-------
2005-09-02  First release
2005-10-09  Clarify number of characters at c-addr when flag2 is
            false.  REPRESENT code cleaned-up.  Addendum added.

Top Home Forth

Page updated: 14 Feb 2011