The problem
Often I need to suffix large numbers with k
, M
, G
… e.g. for printing number of reads or bases in sequencing runs. While there are options in many languages for this, I needed a general soluiton which can be used in all contexts - shell, python, R, shiny apps, nextflow etc.
The solution
Use the korn
shell!
It turned out that this shell has a variation of printf
which has inbuilt abilities to do exactly this, using the %#d
as format specification. Interestingly, other shells (bash
and zsh
tested) do not have this printf
flavour. An example:
# switch to ksh
$ printf "%#d\n" 1234567
1.2M
# and even this
printf "%#d\n" 1234 1234567 1.23e9
1.2k
1.2M
1.2G
Might sound a bit strange at first, but it is a general solution which can easily be used in any context. For example, here is how I use it in R
:
- write a small ksh script and put it in your path, I’ve called mine
si_format.sh
#!/usr/bin/env ksh
printf "%#d\n" "$@"
- in
R
, I use it like this
# system call to the ksh script
system2("bin/si_format.sh", c(12345, 12345678), stdout = T)
[1] "12k" "12M"
# pack it in a function
si_fmt <- function(x) { system2("bin/si_format.sh", x, stdout = TRUE) }
runif(10, 1e3, 1e7) %>% si_fmt()
[1] "7.3M" "7.4M" "8.0M" "6.6M" "1.4M" "7.0M" "874k" "4.2M" "9.3M" "9.5M"
# use in tidy workflows
df <- data.frame(
a = runif(10, 1e3, 1e6),
b = runif(10, 1e6, 1e8)
)
df
a b
1 222739.1 1691506
2 492221.2 41578973
3 232525.8 66668424
4 871410.3 21227623
5 142999.3 91327876
df %>%
mutate(a = si_fmt(a), b = si_fmt(b))
a b
1 223k 1.7M
2 492k 42M
3 233k 67M
4 871k 21M
5 143k 91M
Also, ksh
is already available on most unix systems by default, so no need to use external stuff.
Enjoy!