Parse currency values from CSV, convert numerical suffixes for Million and Billion
We could use gsubfn
to replace the 'B', 'M' with 'e+9', 'e+6' and convert to numeric
(as.numeric
).
is.na(v1) <- v1=='N/A'
options(scipen=999)
library(gsubfn)
as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1))
#[1] 1200000 3100000000 NA
EDIT: Modified based on @nicola's suggestion
data
v1 <- c('$1.2M', '$3.1B', 'N/A')
How to convert strings with billion or million abbreviation into integers in a list
You can use list comprehension with a dict mapping:
l = ["150M", "360M", "2.6B", "3.7B"]
m = {'K': 3, 'M': 6, 'B': 9, 'T': 12}
print([int(float(i[:-1]) * 10 ** m[i[-1]] / 1000) for i in l])
This outputs:
[150000, 360000, 2600000, 3700000]
Excel custom formatting positive/negative numbers with Thousand/Million/Billion (K/M/B) suffixes
Since it appears to not be possible with a one-stop solution (which while I think doing this without a one-stop is a little messy, but I also understand why they can't just magically understand every conceivable custom format iteration), I am opting for a two-step approach:
I will have 3 custom formats. One for the positive numbers with suffixes, another for the negative numbers with suffixes, and a third that is just the "standard" positive/negative number format (displayed in the question). I will then use a series of two or three conditional formatting rules to determine which of these custom formats will be displayed.
Personally, I am going to use the +/- format as the cell's format, then apply two conditional rules that change it to the two suffix variations, but I could see the argument for using conditional formats for all three.
Thanks for the feedback and the reminder that conditional formatting exists to aid with this very kind of issue.
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
assuming you have the following DF:
In [30]: df
Out[30]:
Date Val
0 2016-09-23 100
1 2016-09-22 9.60M
2 2016-09-21 54.20K
3 2016-09-20 115.30K
4 2016-09-19 18.90K
5 2016-09-16 176.10K
6 2016-09-15 31.60K
7 2016-09-14 10.00K
8 2016-09-13 3.20M
you can do it this way:
In [31]: df.Val = (df.Val.replace(r'[KM]+$', '', regex=True).astype(float) * \
....: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False)
....: .fillna(1)
....: .replace(['K','M'], [10**3, 10**6]).astype(int))
In [32]: df
Out[32]:
Date Val
0 2016-09-23 100.0
1 2016-09-22 9600000.0
2 2016-09-21 54200.0
3 2016-09-20 115300.0
4 2016-09-19 18900.0
5 2016-09-16 176100.0
6 2016-09-15 31600.0
7 2016-09-14 10000.0
8 2016-09-13 3200000.0
Explanation:
In [36]: df.Val.replace(r'[KM]+$', '', regex=True).astype(float)
Out[36]:
0 100.0
1 9.6
2 54.2
3 115.3
4 18.9
5 176.1
6 31.6
7 10.0
8 3.2
Name: Val, dtype: float64
In [37]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False)
Out[37]:
0 NaN
1 M
2 K
3 K
4 K
5 K
6 K
7 K
8 M
Name: Val, dtype: object
In [38]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1)
Out[38]:
0 1
1 M
2 K
3 K
4 K
5 K
6 K
7 K
8 M
Name: Val, dtype: object
In [39]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1).replace(['K','M'], [10**3, 10**6]).astype(int)
Out[39]:
0 1
1 1000000
2 1000
3 1000
4 1000
5 1000
6 1000
7 1000
8 1000000
Name: Val, dtype: int32
JavaScript numbers to Words
JavaScript is parsing the group of 3 numbers as an octal number when there's a leading zero digit. When the group of three digits is all zeros, the result is the same whether the base is octal or decimal.
But when you give JavaScript '009' (or '008'), that's an invalid octal number, so you get zero back.
If you had gone through the whole set of numbers from 190,000,001 to 190,000,010 you'd hav seen JavaScript skip '...,008' and '...,009' but emit 'eight' for '...,010'. That's the 'Eureka!' moment.
Change:
for (j = 0; j < finlOutPut.length; j++) {
finlOutPut[j] = triConvert(parseInt(finlOutPut[j]));
}
to
for (j = 0; j < finlOutPut.length; j++) {
finlOutPut[j] = triConvert(parseInt(finlOutPut[j],10));
}
Code also kept on adding commas after every non-zero group, so I played with it and found the right spot to add the comma.
Old:
for (b = finlOutPut.length - 1; b >= 0; b--) {
if (finlOutPut[b] != "dontAddBigSufix") {
finlOutPut[b] = finlOutPut[b] + bigNumArry[bigScalCntr] + ' , ';
bigScalCntr++;
}
else {
//replace the string at finlOP[b] from "dontAddBigSufix" to empty String.
finlOutPut[b] = ' ';
bigScalCntr++; //advance the counter
}
}
//convert The output Arry to , more printable string
for(n = 0; n<finlOutPut.length; n++){
output +=finlOutPut[n];
}
New:
for (b = finlOutPut.length - 1; b >= 0; b--) {
if (finlOutPut[b] != "dontAddBigSufix") {
finlOutPut[b] = finlOutPut[b] + bigNumArry[bigScalCntr]; // <<<
bigScalCntr++;
}
else {
//replace the string at finlOP[b] from "dontAddBigSufix" to empty String.
finlOutPut[b] = ' ';
bigScalCntr++; //advance the counter
}
}
//convert The output Arry to , more printable string
var nonzero = false; // <<<
for(n = 0; n<finlOutPut.length; n++){
if (finlOutPut[n] != ' ') { // <<<
if (nonzero) output += ' , '; // <<<
nonzero = true; // <<<
} // <<<
output +=finlOutPut[n];
}
Related Topics
How to Reorder Data.Table Columns (Without Copying)
Create Empty Data Frame with Column Names by Assigning a String Vector
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.
Ggplot2 Heatmaps: Using Different Gradients for Categories
Global Variables in Packages in R
Creating a Local R Package Repository
Convert All Data Frame Character Columns to Factors
Extract Prediction Band from Lme Fit
Code Chunk Font Size in Rmarkdown with Knitr and Latex
Error: Could Not Find Function "%>%"
Saving Multiple Ggplots from Ls into One and Separate Files in R
Similarity Scores Based on String Comparison in R (Edit Distance)
Percentage on Y Lab in a Faceted Ggplot Barchart
Network Chord Diagram Woes in R
How to Remove Columns from a Data.Frame
Remove All Line Breaks (Enter Symbols) from the String Using R