Output in R, Avoid Writing "[1]"

Why is message() a better choice than print() in R for writing a package?

TL;DR

You should use cat() when making the print.*() functions for S3 objects. For everything else, you should use message() unless the state of the program is problematic. e.g. bad error that is recoverable gives warning() vs. show stopping error uses stop().

Goal

The objective of this post is to provide feedback on the different output options a package developer has access to and how one should structure output that is potentially on a new object or based upon strings.

R Output Overview

The traditional output functions are:

  1. print()
  2. cat()
  3. message()
  4. warning()
  5. stop()

Now, the first two functions (print() and cat()) send their output to stdout or standard output. The last three functions (message(), warning(), and stop()) send their output to stderr or the standard error. That is, the result output from a command like lm() is sent to one file and the error output - if it exists - is sent to a completely separate file. This is particularly important for the user experience as diagnostics then are not cluttering the output of the results in log files and errors are then available to search through quickly.

Designing for Users and External Packages

Now, the above is framed more in a I/O mindset and not necessarily a user-facing frameset. So, let's provide some motivation for it in the context of an everyday R user. In particular, by using 3-5 or the stderr functions, their output is able to be suppressed without tinkering with the console text via sink() or capture.output(). The suppression normally comes in the form of suppressWarnings(), suppressMessages(), suppressPackageStartupMessages(), and so on. Thus, users are only confronted with result facing output. This is particularly important if you plan to allow users the flexibility of turning off text-based output when creating dynamic documents via either knitr, rmarkdown, or Sweave.

In particular, knitr offers chunk options such as error = F, message = F, and warning = F. This enables the reduction of text accompanying a command in the document. Furthermore, this prevents the need from using the results = "hide" option that would disable all output.

Specifics of Output

print()

Up first, we have an oldie but a goodie, print(). This function has some severe limitations. One of them being the lack of embedded concatenation of terms. The second, and probably more severe, is the fact that each output is preceded by [x] followed by quotations around the actual content. The x in this case refers to the element number being printed. This is helpful for debugging purposes, but outside of that it doesn't serve any purpose.

e.g.

print("Hello!")

[1] "Hello!"

For concatenation, we rely upon the paste() function working in sync with print():

print(paste("Hello","World!"))

[1] "Hello World!"

Alternatively, one can use the paste0(...) function in place of paste(...) to avoid the default use of a space between elements governed by paste()'s sep = " " parameter. (a.k.a concatenation without spaces)

e.g.

print(paste0("Hello","World!"))

[1] "HelloWorld!"

print(paste("Hello","World!", sep = ""))

[1] "HelloWorld!"

cat()

On the flip side, cat() addresses all of these critiques. Most notably, the sep=" " parameter of the paste() functionality is built in allowing one to skip writing paste() within cat(). However, the cat() function's only downside is you have to force new lines via \n appended at the end or fill = TRUE (uses default print width).

e.g.

cat("Hello!\n")
Hello!

cat("Hello","World!\n")
Hello World!

cat("Hello","World!\n", sep = "")
HelloWorld!

It is for this very reason why you should use cat() when designing a print.*() S3 method.

message()

The message() function is one step better than even cat()! The reason why is the output is distinct from traditional plain text as it is directed to stderr instead of stdout. E.g. They changed the color from standard black output to red output to catch the users eye.

Message Output

Furthermore, you have the built in paste0() functionality.

message("Hello ","World!") # Note the space after Hello
"Hello World!"

Moreover, message() provides an error state that can be used with tryCatch()

e.g.

 tryCatch(message("hello\n"), message=function(e){cat("goodbye\n")})
goodbye

warning()

The warning() function is not something to use casually. The warning function is differentiated from the message function primarily by having a line prefixed to it ("Warning message:") and its state is consider to be problematic.

warning output

Misc: Casual use in a function may inadvertently trigger heartbreak while trying to upload the package to CRAN due to the example checks and warnings normally being treated as "errors".

stop()

Last but not least, we have stop(). This takes warnings to the next level by completely killing the task at hand and returning control back to the user. Furthermore, it has the most serious prefix with the term "Error:" being added.

Error Output

list output truncated - How to expand listed variables with str() in R

You can use the argument list.len:

str(df, list.len=ncol(df))

and if you want to print more observations you could set the argument vec.len, also have a look at ?str for documentation of all arguments.

Why am I getting X. in my column names when reading a data frame?

read.csv() is a wrapper around the more general read.table() function. That latter function has argument check.names which is documented as:

check.names: logical.  If ‘TRUE’ then the names of the variables in the
data frame are checked to ensure that they are syntactically
valid variable names. If necessary they are adjusted (by
‘make.names’) so that they are, and also to ensure that there
are no duplicates.

If your header contains labels that are not syntactically valid then make.names() will replace them with a valid name, based upon the invalid name, removing invalid characters and possibly prepending X:

R> make.names("$Foo")
[1] "X.Foo"

This is documented in ?make.names:

Details:

A syntactically valid name consists of letters, numbers and the
dot or underline characters and starts with a letter or the dot
not followed by a number. Names such as ‘".2way"’ are not valid,
and neither are the reserved words.

The definition of a _letter_ depends on the current locale, but
only ASCII digits are considered to be digits.

The character ‘"X"’ is prepended if necessary. All invalid
characters are translated to ‘"."’. A missing value is translated
to ‘"NA"’. Names which match R keywords have a dot appended to
them. Duplicated values are altered by ‘make.unique’.

The behaviour you are seeing is entirely consistent with the documented way read.table() loads in your data. That would suggest that you have syntactically invalid labels in the header row of your CSV file. Note the point above from ?make.names that what is a letter depends on the locale of your system; The CSV file might include a valid character that your text editor will display but if R is not running in the same locale that character may not be valid there, for example?

I would look at the CSV file and identify any non-ASCII characters in the header line; there are possibly non-visible characters (or escape sequences; \t?) in the header row also. A lot may be going on between reading in the file with the non-valid names and displaying it in the console which might be masking the non-valid characters, so don't take the fact that it doesn't show anything wrong without check.names as indicating that the file is OK.

Posting the output of sessionInfo() would also be useful.

R - write to file without colum 0 (write.csv or write.table)

Just add row.names = FALSE into write.table().

Avoid scientific characters in R output files

  print( format(283187433.2, nsmall=1), quote=FALSE)
#[1] 283187433.2

Ouput to the console but this would also be what you saw in a text file.

write.table(file="", 283187433.2, quote=FALSE, row.names=FALSE, col.names=FALSE)
283187433.2

R writing a function to avoid for loop

You can use apply function to reach out each individual observation of your dataframe.

For instance, you can multiplicate Values and Sales columns for no reason at all with following:

apply(df,1, function(x){ as.numeric(x["Sales"])*as.numeric(x["Value"])})

Edit:

Now you just need to use dplyr package

zz=apply(df,1, function(x){ 
data_company=df[(df$Company)==x[1] & !df$CityID==1 & !df$CityID==2,]
x[5] = max(data_company[data_company$Sales==1,]$Value) #Note we take the maximum value here
x
}) %>% as.data.frame %>% t

Correct way to write line to file?

This should be as simple as:

with open('somefile.txt', 'a') as the_file:
the_file.write('Hello\n')

From The Documentation:

Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single '\n' instead, on all platforms.

Some useful reading:

  • The with statement
  • open()
    • 'a' is for append, or use
    • 'w' to write with truncation
  • os (particularly os.linesep)

Avoid writing large number of column names in a model formula with bs() terms

We can use some string manipulation with sprintf, together with reformulate:

predictors <- c("a", "b", "d", "e")
bspl.terms <- sprintf("bs(%s, df = 2)", predictors)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 2) + bs(b, df = 2) + bs(d, df = 2) + bs(e,
# df = 2) + factor(f)

If you want to use a different df and degree for each spline, it is also straightforward (note that df can not be smaller than degree).

predictors <- c("a", "b", "d", "e")
dof <- c(3, 4, 3, 6)
degree <- c(2, 2, 2, 3)
bspl.terms <- sprintf("bs(%s, df = %d, degree = %d)", predictors, dof, degree)
other.terms <- "factor(f)"
form <- reformulate(c(bspl.terms, other.terms), response = "output")
#output ~ bs(a, df = 3, degree = 2) + bs(b, df = 4, degree = 2) +
# bs(d, df = 3, degree = 2) + bs(e, df = 6, degree = 3) + factor(f)

Prof. Ben Bolker: I was going to something a little bit fancier, something like predictors <- setdiff(names(df)[sapply(df, is.numeric)], "output").

Yes. This is good for safety. And of course, an automatic way if OP wants to include all numerical variables other than "output" as predictors.



Related Topics



Leave a reply



Submit