Why and where are \n newline characters getting introduced to c()?
I doubt this is a bug. Instead, it looks like you're running into a known limitation of the console. As it says in Section 1.8 - R commands, case sensitivity, etc. of An Introduction to R:
Command lines entered at the console are limited[3] to about 4095 bytes (not characters).
[3] some of the consoles will not allow you to enter more, and amongst those which do some will silently discard the excess and some will use it as the start of the next line.
Either put the command in a file and source
it, or break the code into multiple lines by inserting your own newlines at appropriate points (between commas). For example:
column_names <-
c("County Code/DFG/Aggregation Code", "District Code", "School Code",
"County Name", "District Name", "School Name", "DFG", "Special Needs",
"TOTAL POPULATION TOTAL POPULATION Number Enrolled LAL", ...)
What is the newline character in the C language: \r or \n?
It's \n
. When you're reading or writing text mode files, or to stdin/stdout etc, you must use \n
, and C will handle the translation for you. When you're dealing with binary files, by definition you are on your own.
What does gets() save when it reads just a newline
This part in the description of gets
might be confusing:
It takes all the characters up to (but not including) the newline
It might be better to say that it takes all the characters including the newline but stores all characters not including the newline.
So if the user enters some string
, the gets
function will read some string
and the newline character from the user's terminal, but store only some string
in the buffer - the newline character is lost. This is good, because no one wants the newline character anyway - it's a control character, not a part of the data that user wanted to enter.
Therefore, if you only press enter, gets
interprets it as an empty string. Now, as noted by some people, your code has multiple bugs.
printf("This is the input as a string: %s\n", input);
No problem here, though you might want to delimit your string by some artificial characters for better debugging:
printf("This is the input as a string: '%s'\n", input);
printf("Is it the string end character? %d\n", input == '\0');
Not good: you want to check 1 byte here, not the whole buffer. If you try to compare the whole buffer with 0, the answer is always false
because the compiler converts \0
to NULL
and interprets the comparison like "does the buffer exist at all?".
The right way is:
printf("Does the first byte contain the string end character? %d\n", input[0] == '\0');
This compares just 1 byte to \0
.
printf("Is it a newline string? %d\n", input == "\n");
Not good: this compares the address of the buffer with the address of "\n"
- the answer is always false
. The right way to compare string in C is strcmp
:
printf("Is it a newline string? %d\n", strcmp(input, "\n") == 0);
Note the peculiar usage: strcmp
returns 0 when the strings are equal.
printf("Is it the empty string? %d\n", input == "");
The same bug here. Use strcmp
here too:
printf("Is it the empty string? %d\n", strcmp(input, "") == 0);
BTW as people always say, gets
cannot be used in a secure way, because it doesn't support protection from buffer overflow. So you should use fgets
instead, even though it's less convenient:
char input[100];
while (fgets(input, sizeof input, stdin))
{
...
}
This leads to possible confusion: fgets
doesn't delete the newline byte from the input it reads. So if you replace gets
in your code by fgets
, you will get different results. Fortunately, your code will illustrate the difference in a clear way.
Can't figure out why getchar() is picking up newline for first occurence in C
On the first prompt, you type something like aEnter, so your input stream contains the characters 'a', '\n'
. The first getchar
call reads the a
and leaves the newline in the input stream.
In response to the second prompt, you type bcEnter, so your input stream now contains '\n', 'b', 'c', '\n'
.
You can probably figure out what happens from here - the next getchar
call reads that newline character from the input stream.
There are a couple of ways to deal with this. One is to test your input, and try again if it's a newline:
do
{
a = getchar();
} while ( a == '\n' ); // or while( isspace( a )), if you want to reject
// any whitespace character.
Another is to not use getchar
; instead, use scanf
with the %c
conversion specifier and a blank space in the format string:
scanf( " %c", &c ); // you will need to change the types of your
... // variables from int to char for this.
scanf( " %c", &a );
scanf( " %c", &b );
scanf( " %c", &c );
The leading space in the format string tells scanf
to ignore any leading whitespace, so you won't pick up the newline character.
Does gets() stops reading when it reaches '\r' or '\n' or '\r\n'?
From the C Standard (5.2.2 Character display semantics)
\n (new line) Moves the active position to the initial position of the
next line.
And (7.21.2 Streams)
2 A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one correspondence
between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily
compare equal to the data that were earlier written out to that stream
only if: the data consist only of printing characters and the control
characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a
new-line character. Whether space characters that are written out
immediately before a new-line character appear when read in is
implementation-defined.
Thus the new line character is the character '\n'
.
Take into account that the function gets
is unsafe and is not supported any more by the C Standard.
scanf() leaves the newline character in the buffer
The scanf()
function skips leading whitespace automatically before trying to parse conversions other than characters. The character formats (primarily %c
; also scan sets %[…]
— and %n
) are the exception; they don't skip whitespace.
Use " %c"
with a leading blank to skip optional white space. Do not use a trailing blank in a scanf()
format string.
Note that this still doesn't consume any trailing whitespace left in the input stream, not even to the end of a line, so beware of that if also using getchar()
or fgets()
on the same input stream. We're just getting scanf to skip over whitespace before conversions, like it does for %d
and other non-character conversions.
Note that non-whitespace "directives" (to use POSIX scanf terminology) other than conversions, like the literal text in scanf("order = %d", &order);
doesn't skip whitespace either. The literal order
has to match the next character to be read.
So you probably want " order = %d"
there if you want to skip a newline from the previous line but still require a literal match on a fixed string, like this question.
fgets() goes newline when storing string but gets() no issue about newline
"...but gets() no issue about newline"
Note, that although your observation about gets()
being preferable in this case over fgets()
for handling newline, the unfavorable behaviors that come with gets() make it dangerous to use, with the result that "it was officially removed by the 2011 standard." (credit) Even without the \n
mitigations mentioned below, fgets() is highly preferred over gets()
.
"fgets() goes newline when storing string..." and "...why does fgets() automatically enter a newline for each input of string"fgets()
does not enter the newline upon reading the line, rather if one exists, the newline is picked up as part of the line when fgets()
called. For example in this case, when using stdin
as the input method, the user clicks the <return>
to finish inputting text. Upon hitting the <return>
key, a \n
is entered just like any other character, and becomes the last character entered. When the line is read using fgets()
, if the \n
is seen before any of its other stop reading criteria, fgets()
stops reading, and stores all characters, including \n
, terminates line with \0
and stores into the buffer. (If sizeof(buffer) - 1
or EOF is seen first, fgets()
will never see the newline.)
To easily eliminate the \n
, (or other typical unwanted line endings), use the following single line statements after each of your calls to fgets():
fgets(user.id, ID_SIZE, stdin);
user.id[strcspn(user.id, "\n")] = 0;
//fflush(stdin);//UB, should not be called
...
fgets(user.fname, MAX_FNAME_SIZE, stdin);
user.fname[strcspn(user.fname, "\n")] = 0;
...
fgets(user.lname, MAX_LNAME_SIZE, stdin);
user.lname[strcspn(user.lname, "\n")] = 0;
...
This technique works for truncating any string by searching for the unwanted char
, whether it be "\n"
, "\n\r"
, "\r\n"
, etc. When using more than one search character, eg "\r\n"
, it searches until it reaches either the \r
or the \n
and terminates at that position.
"This [method] handles the rare buffer than begins with '\0'
, something that causes grief for the buffer[strlen(buffer) - 1] = '\0';
[method]." (@Chux - comment section of link below.)
Credit here
Behavior of fgets(), newline character and how it gets stored in the memory
fgets(buf, n, stream)
reads input and saves it into buf
until 1 of 4 things happen.
Buffer is nearly full. Once n-1 characters read (and saved),
'\0'
is appended tobuf
. Function returnsbuf
. There is likely remaining characters instream
to read.1'\n'
is read fromstream
.'\n'
is appended tobuf
.'\0'
is appended tobuf
. Function returnsbuf
. The line has been completely read.End-of-file occurs. Had some characters been read before,
'\0'
is appended tobuf
. Function returnsbuf
. ElseNULL
is returned.Input error (rare).
NULL
is returned. state ofbuf
is indeterminate.
The only thing different about reading '\n'
versus other characters is that it informs fgets()
to stop reading.
1 Should a full buffer get read without '\n'
, to read and toss the rest of the line:
int ch;
while ((ch = fgetc(stream)) != '\n' && c != EOF) {
;
}
Removing trailing newline character from fgets() input
The elegant way:
Name[strcspn(Name, "\n")] = 0;
The slightly ugly way:
char *pos;
if ((pos=strchr(Name, '\n')) != NULL)
*pos = '\0';
else
/* input too long for buffer, flag error */
The slightly strange way:
strtok(Name, "\n");
Note that the strtok
function doesn't work as expected if the user enters an empty string (i.e. presses only Enter). It leaves the \n
character intact.
There are others as well, of course.
Related Topics
Combining Bar and Line Chart (Double Axis) in Ggplot2
Count Number of Columns by a Condition (>) for Each Row
Passing Command Line Arguments to R Cmd Batch
Backtransform 'Scale()' for Plotting
Getting Strings Recognized as Variable Names in R
Element-Wise Mean Over List of Matrices
File Path Issues in R Using Windows ("Hex Digits in Character String" Error)
Plot 4 Curves in a Single Plot with 3 Y-Axes
How to Overlay Density Plots in R
How to Put a Geom_Sf Produced Map on Top of a Ggmap Produced Raster
How to Add Table of Contents in Rmarkdown
Create a Data Frame of Unequal Lengths
Plotting Multiple Time-Series in Ggplot
How to Check Whether a Function Call Results in a Warning