Custom Radix Columns (+Special Characters)

Custom radix columns (+special characters)

How about using the basic base 10 to any base conversion, modified for custom digits:

func numberToCustomRadix(_ number: Int, alphabet: String) -> String {
    let base = alphabet.count
    var number = number
    var result = ""
    repeat {
        let idx = alphabet.index(alphabet.startIndex, offsetBy: number % base)
        result = [alphabet[idx]] + result
        number /= base
    } while number > 0
    return result
}

numberToCustomRadix(3, alphabet: "012") // 10
numberToCustomRadix(4, alphabet: "abc") // bb
numberToCustomRadix(5, alphabet: "%#9") // #9

Note that the problem with a custom alphabet is the fact that it's hard to guarantee at compile time that the alphabet contains distinct characters. E.g. an "aaabbbccc" alphabet will generate all kind of conversion problems.

Inputting multi-radix multi-digit signed numbers with DOS

DOS has several input functions but all deal with characters exclusively.

If the number involved is small, like say 1 or 2 digits, many (new) programmers use the DOS.GetCharacter function 01h resulting in code like this:

    ; 1-digit number
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"]
    sub  al, "0"        ; -> AL=[0,9]

    ; 2-digit number
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"] (tens)
    mov  bl, al
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"] (ones)
    mov  ah, bl
    sub  ax, "00"       ; SIMD -> AH=[0,9] (tens), AL=[0,9] (ones)
    aad                 ; AL = AH * 10 + AL -> AL=[0,99]

This is the most basic way of inputting small numbers, but it lacks in many ways. As an example, consider what would happen to your program if the user made a mistake and accidently pressed a key for which DOS returns an extended ASCII character (a zero followed by a scancode).

Then think about the mess you would get if the above method were used to input numbers that have 3, 4, or 5 digits! Inputting a multi-digit number is best done using the DOS.BufferedInput function 0Ah. This function already gives your program a better chance at surviving since it allows keyboard users to correct their mistakes. To allow for an input of at most 5 decimal digits, the buffer that you submit to DOS could be defined with buf db 6, 0, 6 dup 0. How buffered input works has the details. Once the string of characters that represent the number has been entered, the text must get converted into a numeric value. Next code shows this:

snippet 1a

    mov  dx, buf
    mov  ah, 0Ah        ; DOS.BufferedInput
    int  21h
    xor  ax, ax         ; Result = 0
    mov  si, buf+1
    xor  cx, cx
    mov  cl, [si]       ; -> CX is number of characters entered
    jcxz .z             ; Return zero for an 'empty' input
    ; Decimal
.a: inc  si             ; Next character
    mov  dx, 10
    mul  dx             ; Result = Result * 10
    mov  dl, [si]       ; -> DX = ["0","9"] (NewDigit)
    sub  dl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
    add  ax, dx         ; Result = Result + NewDigit
    loop .a
.z:

Sometimes you will want to input numbers in the hexadecimal, octal, or binary formats, in which case you could use next calculation loops:

snippet 2a

    ; Hexadecimal
.a: inc  si             ; Next character
    shl  ax, 1          ; Result = Result * 16
    shl  ax, 1
    shl  ax, 1
    shl  ax, 1
    mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit)
    cmp  dl, "9"
    jbe  .b
    sub  dl, 7
.b: sub  dl, 48
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Octal
.a: inc  si             ; Next character
    shl  ax, 1          ; Result = Result * 8
    shl  ax, 1
    shl  ax, 1
    mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit)
    sub  dl, 48
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Binary
.a: inc  si             ; Next character
    cmp  byte [si], "1" ; -> CF=1 for "0", CF=0 for "1"
    cmc                 ; -> CF=0 for "0", CF=1 for "1"
    rcl  ax, 1          ; Result = Result * 2 + NewDigit
    loop .a

Even with the editing facilities that the DOS.BufferedInput function 0Ah offers it is not ok to just trust the user at the keyboard to supply your program the correct data. It is you that has to validate the input, and if you find that something is amiss, there're a number of ways to deal with it. You could exit the program with (or without) an error message, you could have the user redo the input, you could choose to deliver some special value like the '8000h integer indefinite' that the FPU uses, or you could return a saturated result. The important thing is that you deal with the situation.

Building a better number input routine

To improve on the code that we have so far, we could

write the code such that the user can freely choose the number base that they want to use. All it will take is allowing the input to contain an additional numeric affix. I have always preferred the one character suffixes that Intel uses, so 'h' for hexadecimal, 'o' for octal, 'b' for binary, and 'd' or none for decimal.
add a further suffix in order to shorten long numbers that are multiples of 1000 ('K' for Kilo) or 1024 ('KB' for KiloByte). eg. 60K is 60000 and 6KB is 6144
allow the user to use the so-called 'thousands separator', and make long runs of digits become easier to read/write. A nice thing about it, is that it need not separate at the thousands at all! We can apply this to any of the number bases. FASM uses the apostrophe ' for this.
allow the user to use any case for the suffixes and the hexadecimal digits A through F, making the text case-insensitive.
allow the user to have leading whitespace in their input. Sounds silly? Well not so much if you have your inputs stored in an history of some kind, and later recall that list. You would appreciate the nice right alignment that you could get.
allow the user to have trailing whitespace in their input. Ask yourself whether you'd hate the program to disapprove of an input like 123 or even 20 years.
allow the user to prefix the number with a minus sign -, so that they can start working with negative numbers in their code.
extend the range of numbers that we can process. Instead of storing the result in the 16-bit AX register, we will store it in the 32-bit EAX register. If the code is to run on the 8086 cpu, then we would store in the 32-bit DX:AX register pair!

but we must

verify that the input is composed of valid characters so as to not spend effort processing garbage
detect numeric overflow so as to not deliver bogus results to the program

Applying validation and overflow detection turns snippet 1a into

snippet 1b

    mov  dx, buf
    mov  ah, 0Ah        ; DOS.BufferedInput
    int  21h
    xor  ax, ax         ; Result = 0
    mov  si, buf+1
    xor  cx, cx
    mov  cl, [si]       ; -> CX is number of characters entered
    jcxz .z             ; Return zero for an 'empty' input
    ; Decimal
.a: inc  si             ; Next character
    xor  bx, bx
    mov  bl, [si]       ; -> BX = ["0","9"] (NewDigit) ?
    sub  bl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
    cmp  bl, 9
    ja   .z             ; Stop if not a digit
    mov  dx, 10
    mul  dx             ; Result = Result * 10
    jc   .o
    add  ax, bx         ; Result = Result + NewDigit
    jc   .o
    loop .a
    jmp  .z
.o: mov  ax, 65535      ; Saturated result is MAXUINT
.z:

For the hexadecimal, octal, or binary formats, substitute next loops:

snippet 2b

    ; Hexadecimal
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit) ?
    cmp  dl, "9"
    jbe  .b
    sub  dl, 7
.b: sub  dl, 48
    cmp  dl, 15
    ja   .z             ; Stop if not a digit
    rol  ax, 1          ; Result = Result * 16
    rol  ax, 1
    rol  ax, 1
    rol  ax, 1
    test al, 15
    jnz  .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Octal
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit) ?
    sub  dl, 48
    cmp  dl, 7
    ja   .z             ; Stop if not a digit
    rol  ax, 1          ; Result = Result * 8
    rol  ax, 1
    rol  ax, 1
    test al, 7
    jnz  .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Binary
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = ["0","1"] (NewDigit) ?
    sub  dl, 48
    cmp  dl, 1
    ja   .z             ; Stop if not a digit
    shl  ax, 1          ; Result = Result * 2
    jc   .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

The de luxe version of inputting a number applies everything that was mentioned above. It is important to note that next program will not run on a 8086 cpu because it uses 32-bit registers and instructions introduced with later processors. (For sure a nice exercise to rewrite for 8086!) The program runs in a DOS window, and of course also in the true real address mode of an x86 cpu.

The InputEAX routine sets the carry flag if the input turns out to be syntactically wrong (EAX=0), or the input leads to a value that exceeds the 32-bit range [-(4GB-1),+(4GB-1)] (EAX=80000000h).

This inputting code does not pretend to be gospel! If you don't need a certain feature, then just remove it. And if for your particular use case some feature is missing, then just add it. Leave a comment if this happens...

        ORG     256

again:  mov     dx, msg1
        mov     ah, 09h                 ; DOS.PrintString
        int     21h
        call    InputEAX                ; -> EAX CF
        ; ignoring the CF for the purpose of the demo
        push    ax                      ; (1)
        mov     dx, msg2
        mov     ah, 09h                 ; DOS.PrintString
        int     21h
        pop     ax                      ; (1)
        call    PrintEAX
        cmp     eax, 27                 ; Arbitrarily chosen, but 27 == ESC
        jne     again

exit:   mov     ax, 4C00h               ; DOS.TerminateWithReturnCode
        int     21h
; --------------------------------------
msg1    db      13, 10, 'Input a number : $'
msg2    db      10, 'The number is $'
; --------------------------------------
; IN (eax) OUT ()
PrintEAX:
        pushad
        test    eax, eax
        jns     .a
        push    ax                      ; (1)
        mov     dl, "-"
        mov     ah, 02h                 ; DOS.PrintCharacter
        int     21h
        pop     ax                      ; (1)
        neg     eax
.a:     mov     ebx, 10
        push    bx                      ; (2a) Sentinel
.b:     xor     edx, edx
        div     ebx
        push    dx                      ; (2b) Remainder
        test    eax, eax
        jnz     .b
        pop     dx                      ; (2)
.c:     add     dl, "0"
        mov     ah, 02h                 ; DOS.PrintCharacter
        int     21h
        pop     dx
        cmp     dx, bx
        jb      .c
        popad
        ret
; --------------------------------------
; IN () OUT (eax,CF)
InputEAX:
        xor     eax, eax                ; In case of CF=1 on exit
        pushad
        sub     sp, 44+44               ; 2 local buffers
        mov     bp, sp
        push    44                      ; Buffer header 44, 0
        mov     dx, sp
        mov     ah, 0Ah                 ; DOS.BufferedInput
        int     21h
        mov     si, bp                  ; Where the string of characters begins

; Leading whitespace
.a:     lodsb
        call    IsWhitespace            ; -> ZF
        je      .a
        dec     si

; Unary
        mov     al, [si]
        push    ax                      ; Possible UNARY at [bp-4]
        cmp     al, "+"
        je      .b
        cmp     al, "-"
        jne     .c
.b:     inc     si

; Digits followed by base-suffix, in turn for Hex, Oct, Bin, and Dec
.c:     mov     cx, 16+256*'H'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 8+256*'O'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 2+256*'B'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 10+256*'D'
        call    GetDigits               ; -> SI DI CF (AX)
        jc      .NOK
.d:     call    LodsUCasedChar          ; -> AL SI

; [option] K, M, G, KB, MB, GB order-suffixes
        mov     ebx, 1                  ; Multiplier
        mov     ch, 3                   ; ORDER
        cmp     al, "G"                 ; Giga
        je      .e
        mov     ch, 2                   ; ORDER
        cmp     al, "M"                 ; Mega
        je      .e
        mov     ch, 1                   ; ORDER
        cmp     al, "K"                 ; Kilo
        jne     .f
.e:     mov     bx, 1000                ; Multiplier
        call    LodsUCasedChar          ; -> AL SI
        cmp     al, "B"
        jne     .f
        mov     bx, 1024                ; Multiplier
        lodsb

; Trailing whitespace or end-of-input
.f:     call    IsWhitespace            ; -> ZF
        je      .OK
        cmp     al, 13                  ; Terminating carriage return
        je      .OK

; Failed to extract any series of digits, or excess characters in string
.NOK:   stc
        jmp     .END

; Building the integer in EAX
.OK:    mov     byte [bp+44+44+31], 80h ; pushad.EAX = 80000000h (Integer
        xor     si, si                  ;       indefinite in case of overflow)
        xor     eax, eax                ; Result
.g:     movzx   edx, cl                 ; CL is RADIX {16,8,2,10}
        mul     edx
        jc      .END
        movzx   edx, byte [bp+44+si]    ; NewDigit [0,15]
        add     eax, edx
        jc      .END
        inc     si
        cmp     si, di                  ; DI is NumberOfDigits
        jb      .g

; [option] Applying the multipliers repeatedly
.h:     mul     ebx                     ; EBX={1,1000,1024}
        jc      .END
        dec     ch                      ; CH is ORDER [1,3]
        jnz     .h

; Negating as required
        cmp     byte [bp-4], "-"        ; UNARY
        jne     .CLC
        neg     eax                     ; Valid range [-(4GB-1),+(4GB-1)]
.CLC:   clc

; Returning the result
        mov     [bp+44+44+28], eax      ; pushad.EAX
.END:   lea     sp, [bp+44+44]
        popad
        ret
; --------------------------------------
; IN (al) OUT (ZF)
IsWhitespace:
        cmp     al, " "
        je      .a
        cmp     al, 9                   ; Tab
.a:     ret
; --------------------------------------
; IN (si) OUT (al,si)
LodsUCasedChar:
        lodsb
        cmp     al, "a"
        jb      .a
        cmp     al, "z"
        ja      .a
        and     al, 1101'1111b          ; UCase
.a:     ret
; --------------------------------------
; IN (cx,si) OUT (si,di,CF) MOD (ax)
GetDigits:
        push    si                      ; (1)
        xor     di, di                  ; NumberOfDigits
.a:     call    LodsUCasedChar          ; -> AL SI
        cmp     al, "'"                 ; 'Thousands' separator (apostrophe)
        je      .a
        mov     ah, al
        cmp     al, "0"
        jb      .c
        cmp     al, "9"
        jbe     .b
        cmp     al, "A"
        jb      .c
        cmp     al, "F"
        ja      .c
        sub     al, 7
.b:     sub     al, 48                  ; -> AL=[0,15]
        cmp     al, cl                  ; CL is RADIX {16,8,2,10}
        jnb     .c
        mov     [bp+44+di], al
        inc     di
        jmp     .a

.c:     test    di, di                  ; Any digits found ?
        jz      .NOK
        cmp     ah, ch                  ; CH is BASE-SUFFIX {HOBD}
        je      .OK
        cmp     ch, "D"                 ; Decimals need not be suffixed
        jne     .NOK
        dec     si
.OK:    ;;clc
        pop     ax                      ; (1a) This throws away `push si`
        ret                             ; CF=0
.NOK:   stc
        pop     si                      ; (1b)
        ret                             ; CF=1
; --------------------------------------

A word on segment registers

The ORG 256 directive on top tells you that this program is a .COM program for DOS where the segment registers are all set equal to each other. If you were to use the InputEAX routine in an .EXE program that you write, you would have to temporarily set the DS segment register equal to SS because the local buffers have been
placed on the stack and normally SS will be different from DS.

; IN () OUT (eax,CF)
InputEAX:
        push    ds
        push    ss                      ; DS = SS
        pop     ds
        xor     eax, eax                ; In case of CF=1 on exit
        pushad

        ...

        popad
        pop     ds
        ret

What is the significance of radix in Character.fordigit() in Java?

This is tricky because the significance isn't as obvious as it first appears. When converting a string to an integer, of course the radix matters a lot. If you are converting "101" to an integer, you will get different answers depending on whether the radix (base) is binary (2), decimal (10), octal (8), hex (16), or any other base. Similarly, when converting an integer to a string, the results (when the source is >= MAX_RADIX) are all different for the different radices.

For forDigit, the answer isn't as clear. When you're converting a number to a single character representing a digit, the answer is always the same as long as the digit is valid for the radix. Thus, Character.forDigit(11,radix) always returns 'b' for all radices 12 and up. So the only significance is in how it handles the case when the digit is not valid for the radix? That is, for binary (radix=2), forDigit only works if the digit is 0 or 1; so what should it do if you say Character.forDigit(2,2), since 2 is not a valid binary digit?

There are a few things the language designers could have done: (1) get rid of the radix parameter and put the onus on the programmer to make sure the digit is in range (which in many cases will be a given anyway); (2) throw an exception; (3) return some special value. They chose (3): if you give it a digit that isn't valid for the radix, it returns '\0', the null character. This doesn't seem to be the best choice--you're unlikely to really want to use the null character for anything, which means you have to make your own check, which means they probably should have had the method throw an exception. But there it is.

But anyway, that's the significance of radix for this method: it performs a check to make sure the argument is in range, based on the radix.

How to change an integer into a string with specified character set in JavaScript

This is a variant of a "Base64" conversion question that can be answered by "base n" libraries. However, these libraries may be "overkill" for this question, so below is modified code based on a simple & elegant solution by @Reb.Cabin. Credit also to editors @callum, @Philip Kaplan, @Oka on this code.

In this response, vowels and various "problem letters" commonly used to create curse words are removed, so a random integer hash will not create an offensive short URL.

// Based on Base64 code by @Reb.Cabin, edits by @callum, @philip Kaplan, @Oka available at https://stackoverflow.com/a/6573119/3232832BaseN = {    _Rixits ://   0       8       16      24      32      40      48      56     63//   v       v       v       v       v       v       v       v      v    "0123456789BDGHJKLMNPQRTVWXYZbdghjklmnpqrtvwxyz-_",//  original base64//  "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_",    // You have the freedom, here, to choose the glyphs you want for     // representing your base-64 numbers.    // This cannot handle negative numbers and only works on the     //     integer part, discarding the fractional part.    fromNumber : function(number) {        if (isNaN(Number(number)) || number === null ||            number === Number.POSITIVE_INFINITY)            throw "The input is not valid";        if (number < 0)            throw "Can't represent negative numbers now";
        var rixit; // like 'digit', only in some non-decimal radix         var residual = Math.floor(number);        var result = '';        var rixitLen = this._Rixits.length;        while (true) {            rixit = residual % rixitLen;            result = this._Rixits.charAt(rixit) + result;            residual = Math.floor(residual / rixitLen);
            if (residual === 0)                break;            }        return result;    },
    toNumber : function(rixits) {        var result = 0;        for (var e = 0; e < rixits.length; e++) {            result = (result * this._Rixits.length) + this._Rixits.indexOf(rixits[e]);        }        return result;    }};
var i = 1234567890;var encoded = BaseN.fromNumber(1234567890);var decoded = BaseN.toNumber(encoded);document.writeln('Given character set "' + BaseN._Rixits + '", the number ' + i + ' is encoded to ' + encoded + ' then back again to ' + decoded + '.');

How to convert AnyBase to Base10?

Based on the original algorithm you need to iterate through each character of the encoded string, find the location of that character within the alphabet, and calculate the new result.

Here are both methods and some test code:

func stringToCustomBase(encode: Int, alphabet: String) -> String {
    var base = alphabet.count, string = encode, result = ""
    repeat {
        let index = alphabet.index(alphabet.startIndex, offsetBy: (string % base))
        result = [alphabet[index]] + result
        string /= base
    } while (string > 0)
    return result
}

func customBaseToInt(encoded: String, alphabet: String) -> Int? {
    let base = alphabet.count
    var result = 0
    for ch in encoded {
        if let index = alphabet.index(of: ch) {
            let mult = result.multipliedReportingOverflow(by: base)
            if (mult.overflow) {
                 return nil
            } else {
                let add = mult.partialValue.addingReportingOverflow(alphabet.distance(from: alphabet.startIndex, to: index))
                if (add.overflow) {
                    return nil
                } else {
                    result = add.partialValue
                }
            }
        } else {
            return nil
        }
    }

    return result
}

let startNum = 234567
let alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
let codedNum = stringToCustomBase(encode: startNum, alphabet: alphabet)
let origNun = customBaseToInt(encoded: codedNum, alphabet: alphabet)

I made the customBaseToInt method return an optional result in case there are characters in the encoded value that are not in the provided alphabet.

Radix sorting strings

Yes, strings can be sorted with radix sort. In fact, radix sort is extremely effective on strings!

Given a collection of strings, you can radix sort them by first sorting the strings by their first letter (use any sorting algorithm you'd like, like counting sort or even insertion sort), breaking the strings into groups by their first letter, then recursively sorting all of the strings in each group. (This would be a most-significant-digit radix sort). You could also do a least-significant-digit radix sort. Imagine that all strings are padded up to the length of the maximum string with some special character ❤ that lexicographically precedes all the other characters. Then just do a regular LSD radix sort. When you're done, everything will be in sorted order!

Custom Radix Columns (+Special Characters)