Simple variable interpolation


There are several ways to construct strings in R such as paste(). However, when the string which is going to be constructed is too complex, using paste() can be a pain. For example, we want to put some parameters as title in a plot.

region = c(1, 2)
value = 4
name = "name"
str = paste("region = (", region[1], ", ", region[2], "), value = ", value, 
            ", name = '", name, "'", sep = "")
cat(str)
## region = (1, 2), value = 4, name = 'name'

As you can see, it is hard to read and very easy to make mistakes. (Syntax highlighting may be helpful to match brackets, but it is still quite annoying to see so many commas and quotes.)

In Perl, we always use variable interpolation to construct complex strings in which variables are started with special marks (sigil), and variables will be replaced with their real values. In this package, we aim to implement variable interpolation in R. The idea is rather simple: use special marks to identify variables and then replace with their values. The function here is qq() which is named from the subroutine with the same name in Perl (It stands for double quote). Using variable interpolation, above example can be written as:

library(GetoptLong)

str = qq("region = (@{region[1]}, @{region[2]}), value = @{value}, name = '@{name}'")
cat(str)
## region = (1, 2), value = 4, name = 'name'

Or use the shortcut function qqcat():

qqcat("region = (@{region[1]}, @{region[2]}), value = @{value}, name = '@{name}'")
## region = (1, 2), value = 4, name = 'name'

One feature of qqcat() is you can set a global prefix to the messages by qq.options("cat_prefix"), either a string or a function. If it is set as a function, the value will be generated at real time by executing the function.

qq.options("cat_prefix" = "[INFO] ")
qqcat("This is a message")
## [INFO] This is a message
qq.options("cat_prefix" = function() format(Sys.time(), "[%Y-%m-%d %H:%M:%S] "))
qqcat("This is a message")
## [2024-10-24 02:57:56] This is a message
Sys.sleep(2)
qqcat("This is a message after 2 seconds")
## [2024-10-24 02:57:58] This is a message after 2 seconds
qq.options("cat_prefix" = "")
qqcat("This is a message")
## This is a message

You can shut down all messages produced by qqcat() by qq.options("cat_verbose" = FALSE).

qq.options("cat_prefix" = "[INFO] ", "cat_verbose" = FALSE)
qqcat("This is a message")

Also you can set a prefix which has local effect.

qq.options(RESET = TRUE)
qq.options("cat_prefix" = "[DEBUG] ")
qqcat("This is a message", cat_prefix = "[INFO] ")
## [INFO] This is a message
qqcat("This is a message")
## [DEBUG] This is a message

From version 1.1.2, qq.options() can work in a local mode in which the copy of the options only work in a local chunk.

qq.options("cat_prefix" = "[DEBUG] ")
qq.options(LOCAL = TRUE)
qq.options("cat_prefix" = "[INFO] ")
qqcat("This is the first message")
## [INFO] This is the first message
qqcat("This is the second message")
## [INFO] This is the second message
qq.options(LOCAL = FALSE)
qqcat("This is the third message")
## [DEBUG] This is the third message

Reset the options so that it does not affect example code in following part of the vignette.

qq.options(RESET = TRUE)

Not only simple scalars but also pieces of codes can be interpolated:

n = 1
qqcat("There @{ifelse(n == 1, 'is', 'are')} @{n} dog@{ifelse(n == 1, '', 's')}.\n")
## There is 1 dog.
n = 2
qqcat("There @{ifelse(n == 1, 'is', 'are')} @{n} dog@{ifelse(n == 1, '', 's')}.\n")
## There are 2 dogs.

If the text is too long, it can be wrapped into lines.

qq.options("cat_strwrap" = TRUE)
qqcat("one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty.")
## one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen,
## fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty.

There can be multiple templates:

a = 1; b = 2; c = 3
txt = qq("command -a @{a}",
         "        -b @{b}",
         "        -c @{c}\n", sep = " \\\n")
cat(txt)
## command -a 1 \
##         -b 2 \
##         -c 3

NOTE: Since qq as the function name is very easy to be used by other packages (E.g., in lattice, there is a qq() function as well) and if so, you can enforce qq() in your working environment as the function in GetoptLong by:

qq = GetoptLong::qq

Code patterns

In above exmaple, @{} is used to mark variables. Later, variable names will be extracted from these marks and replaced with their real values.

The marking code pattern can be any type. But you should make sure it is easy to tell the difference from other part in the string. You can set your code pattern as an argument in qq(). The default pattern is @\\{CODE\\} because we only permit CODE to return simple vectors and @ is a sigil representing array in Perl.

In following example, the code pattern is #{}.

x = 1
qqcat("x = #{x}", code.pattern = "#\\{CODE\\}")
## x = 1

Or set in qq.options() as a global setting:

qq.options("code.pattern" = "#\\{CODE\\}")

As you can guess, in @\\{CODE\\}, CODE will be replaced with .*? to construct a regular expression and to match variable names in the string. So if your code.pattern contains special characters, make sure to escape them. Some candidate code.pattern are:

code.pattern = "@\\{CODE\\}"    # default style
code.pattern = "@\\[CODE\\]"
code.pattern = "@\\(CODE\\)"
code.pattern = "%\\{CODE\\}"
code.pattern = "%\\[CODE\\]"
code.pattern = "%\\(CODE\\)"
code.pattern = "\\$\\{CODE\\}"
code.pattern = "\\$\\[CODE\\]"
code.pattern = "\\$\\(CODE\\)"
code.pattern = "#\\{CODE\\}"
code.pattern = "#\\[CODE\\]"
code.pattern = "#\\(CODE\\)"
code.pattern = "\\[%CODE%\\]"  # Template Toolkit (Perl module) style :)

Since we just replace CODE to .*?, the function will only match to the first right parentheses/brackets. (In Perl, I always use recursive regular expression to extract such pairing parentheses. But in R, it seems difficult.) So, for example, if you are using @\\[CODE\\] and your string is "@[a[1]]", it will fail to extract the correct variable name while only extracts a[1, finally it generates an error when executing a[1. In such condition, you should use other pattern styles that do not contain [].

Finally, I suggest a more safe code pattern style that you do not need to worry about parentheses stuff:

code.pattern = "`CODE`"

Where to look for variables

It will first look up in the envoking environment, then through searching path. Users can also pass values of variables as a list like:

x = 1
y = 2
qqcat("x = @{x}, y = @{y}", envir = list(x = "a", y = "b"))
## x = a, y = b

If variables are passed through list, qq() only looks up in the specified list.

Variables should only return vectors

qq() only allows variables to return vectors. The whole string will be interpolated repeatedly according to longest vectors, and finally concatenated into a single long string.

x = 1:6
qqcat("@{x} is an @{ifelse(x %% 2, 'odd', 'even')} number.\n")
## 1 is an odd number. 2 is an even number. 3 is an odd number. 4 is an even number. 5 is an
## odd number. 6 is an even number.
y = c("a", "b")
z = c("A", "B", "C", "D", "E")
qqcat("@{x}, @{y}, @{z}\n")
## 1, a, A 2, b, B 3, a, C 4, b, D 5, a, E 6, b, A

This feature is especially useful if you want to generate a report such as formatted in a HTML table:

name = letters[1:4]
value = 1:4
qqcat("<tr><td>@{name}</td><td>@{value}</td><tr>\n")
## <tr><td>a</td><td>1</td><tr> <tr><td>b</td><td>2</td><tr> <tr><td>c</td><td>3</td><tr>
## <tr><td>d</td><td>4</td><tr>

The returned value can also be a vector while not collapsed into one string:

str = qq("@{x}, @{y}, @{z}", collapse = FALSE)
length(str)
## [1] 6
str
## [1] "1, a, A" "2, b, B" "3, a, C" "4, b, D" "5, a, E" "6, b, A"

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] markdown_1.13    GetoptLong_1.1.0 knitr_1.48      
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37       R6_2.5.1            GlobalOptions_0.1.2 fastmap_1.2.0      
##  [5] xfun_0.48           rjson_0.2.23        maketools_1.3.1     cachem_1.1.0       
##  [9] htmltools_0.5.8.1   rmarkdown_2.28      buildtools_1.0.0    lifecycle_1.0.4    
## [13] cli_3.6.3           sass_0.4.9          jquerylib_0.1.4     compiler_4.4.1     
## [17] highr_0.11          sys_3.4.3           tools_4.4.1         evaluate_1.0.1     
## [21] bslib_0.8.0         yaml_2.3.10         crayon_1.5.3        jsonlite_1.8.9     
## [25] rlang_1.1.4