How to efficiently read the first character from each line of a text file?
01/04/2015 Edited to bring the better solution to the top.
Update 2 Changing the scan()
method to run on an open connection instead of opening and closing on every iteration allows to read line-by-line and eliminates the looping. The timing improved quite a bit.
## scan() on open connection
conn <- file("bigtest.txt", "rt")
substr(scan(conn, what = "", sep = "\n", quiet = TRUE), 1, 1)
close(conn)
I also discovered the stri_read_lines()
function in the stringi package, Its help file says it's experimental at the moment, but it is very fast.
## stringi::stri_read_lines()
library(stringi)
stri_sub(stri_read_lines("bigtest.txt"), 1, 1)
Here are the timings for these two methods.
## timings
library(microbenchmark)
microbenchmark(
scan = {
conn <- file("bigtest.txt", "rt")
substr(scan(conn, what = "", sep = "\n", quiet = TRUE), 1, 1)
close(conn)
},
stringi = {
stri_sub(stri_read_lines("bigtest.txt"), 1, 1)
}
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# scan 50.00170 50.10403 50.55055 50.18245 50.56112 54.64646 100
# stringi 13.67069 13.74270 14.20861 13.77733 13.86348 18.31421 100
Original [slower] answer :
You could try read.fwf()
(fixed width file), setting the width to a single 1 to capture the first character on each line.
read.fwf("test.txt", 1, stringsAsFactors = FALSE)[[1L]]
# [1] "A" "B" "C" "D" "E"
Not fully tested of course, but works for the test file and is a nice function for getting substrings without having to read the entire file.
Update 1 : read.fwf()
is not very efficient, calling scan()
and read.table()
internally. We can skip the middle-men and try scan()
directly.
lines <- count.fields("test.txt") ## length is num of lines in file
skip <- seq_along(lines) - 1 ## set up the 'skip' arg for scan()
read <- function(n) {
ch <- scan("test.txt", what = "", nlines = 1L, skip = n, quiet=TRUE)
substr(ch, 1, 1)
}
vapply(skip, read, character(1L))
# [1] "A" "B" "C" "D" "E"
version$platform
# [1] "x86_64-pc-linux-gnu"
Read first character on each line in a file
Using a Scanner
would make the code considerably cleaner:
private static openandprint() throws IOException {
int i = 0;
try (Scanner s = new Scanner("final.txt"))) {
String line;
while (s.hasNextLine()) {
int change2Int = s.nextInt();
s.nextLine(); // ignore the rest of the line
figures [i] = change2Int;
i++;
}
}
}
C read lines of file to array prints only first character from line
You want this:
#include <stdio.h>
#include <string.h>
#if 0 // uncomment this if strdup is not available on your platform
// Duplicate a string (google strdup for more information)
char **strdup(const char *source)
{
char* newstring = malloc(strlen(source) + 1);
if (newstring)
strcpy(newstring, source);
return newstring;
}
#endif
char **loadNames(FILE* file) {
char **list = malloc(sizeof(char*) * 42); // you want 42 pointers to char, not 42 chars
const int BUFFER_SIZE = 256; // just 256, not {256}
char buffer[BUFFER_SIZE]; // no need to allocate dynamically here
char count = 0; // you forgot to initialize to 0
while (fgets(buffer, BUFFER_SIZE, file)) {
list[count] = strdup(buffer); // you need to duplicate the string,
// not just assign a char
count++;
}
return list;
}
There are still a problem with this code:
- if there are more than 42 lines in the file, you're in trouble.
loadnames
returns a pointer to an array of pointers, each of which point to a line of the file. The problem is once you've calledloadNames
you don't know how many lines have actually been read.- there is no error checking whatsoever for brevity.
Fortran: How do I read the first character from each line of a text file?
Although the suggestions were in place, there were also several things that were forgotten. Range of the REAL kind, and some formatting problems.
Anyways, here's one patched up solution, compiled and working, so try to see if this will work for you. I've took the liberty of choosing my own method for fibonacci numbers calculation.
program SO1658805
implicit none
integer, parameter :: iwp = selected_real_kind(15,310)
real(iwp) :: fi, fib
integer :: i
character(60) :: line
character(1) :: digit
integer :: n0=0, n1=0, n2=0, n3=0, n4=0, n5=0, n6=0, n7=0, n8=0, n9=0
open(unit=1, file='temp.txt', status='replace')
rewind(1)
!-------- calculating fibonacci numbers -------
fi = (1+5**0.5)/2.
do i=0,1477
fib = (fi**i - (1-fi)**i)/5**0.5
write(1,*)fib,i
end do
!----------------------------------------------
rewind(1)
do i=0,1477
read(1,'(a)')line
line = adjustl(line)
write(*,'(a)')line
read(line,'(a1)')digit
if(digit.eq.' ') n0=n0+1
if(digit.eq.'1') n1=n1+1
if(digit.eq.'2') n2=n2+1
if(digit.eq.'3') n3=n3+1
if(digit.eq.'4') n4=n4+1
if(digit.eq.'5') n5=n5+1
if(digit.eq.'6') n6=n6+1
if(digit.eq.'7') n7=n7+1
if(digit.eq.'8') n8=n8+1
if(digit.eq.'9') n9=n9+1
end do
close(1)
write(*,'("Total number of different digits")')
write(*,'("Number of digits 0: ",i5)')n0
write(*,'("Number of digits 1: ",i5)')n1
write(*,'("Number of digits 2: ",i5)')n2
write(*,'("Number of digits 3: ",i5)')n3
write(*,'("Number of digits 4: ",i5)')n4
write(*,'("Number of digits 5: ",i5)')n5
write(*,'("Number of digits 6: ",i5)')n6
write(*,'("Number of digits 7: ",i5)')n7
write(*,'("Number of digits 8: ",i5)')n8
write(*,'("Number of digits 9: ",i5)')n9
read(*,*)
end program SO1658805
Aw, ... I just read you need the number of digits stored in to an array. While I just counted them.
Oh well, ... "left as an exercise for the reader ..." :-)
Get first character of each line in text file
Try this:
setlocal EnableDelayedExpansion
set file=c:\klantenlijst.txt
FOR /F "delims=~" %%i IN (%file%) DO (
set var=%%i
set var=!var:~0,1!
echo !var!
)
You can't do string manipulations with for loop variables.
Manipulating first character of each line in a file with .Replace()
Try the following:
Get-Content C:\Users\Administrator\Desktop\123.txt | ForEach-Object {
if ($_) {
$_.Substring(0, 1).ToUpper() + $_.Substring(1)
} else {
$_
}
} > .\Desktop\finish.txt
Get-Content
reads the input file line by line and sends each line - stripped of its line terminator - through the pipeline.ForEach-Object
processes each line in the associated script block, in which$_
represents the line at hand:if ($_)
tests if the line is nonempty, i.e. if there's at least 1 character; if not, theelse
block simply passes the empty line through.$_.Substring(0, 1).ToUpper()
converts the line's 1st character to uppercase, implicitly using the current culture (with a single character, this is equivalent to applyingGet-Culture).TextInfo.ToTitleCase()
).+ $_.Substring(1)
appends the rest of the line.
Only
>
rater than>>
is needed to write to the output file, because the entire pipeline's output is written at once.
Related Topics
Ggplot2': Label Values of Barplot That Uses 'Fun.Y="Mean"' of 'Stat_Summary'
Removing Traces by Name Using Plotlyproxy (Or Accessing Output Schema in Reactive Context)
Select Columns by Class (E.G. Numeric) from a Data.Table
Changing Multiple Column Values Given a Condition in Dplyr
Does Installing Blas/Atlas/Mkl/Openblas Will Speed Up R Package That Is Written in C/C++
Setting Individual Y Axis Limits with Facet Wrap Not with Scales Free_Y
How to Colour the Labels of a Dendrogram by an Additional Factor Variable in R
R Dataframe: Aggregating Strings Within Column, Across Rows, by Group
Possible Issue About Random Number Generator
How to Create an Infix %Between% Operator
R, Sweave, Latex - Escape Variables to Be Printed in Latex
Calculate Summary Statistics (E.G. Mean) on All Numeric Columns Using Data.Table
Multi Line Title in Ggplot 2 with Multiple Italicized Words
Installing Rcppeigen on Amazon Ec2
\Sexpr{} Special Latex Characters ($, &, %, # etc.) in .Rnw-File
Why Doesn't Comparison Between Numeric and Character Variables Give a Warning