How to Efficiently Read the First Character from Each Line of a Text File

How to efficiently read the first character from each line of a text file?

01/04/2015 Edited to bring the better solution to the top.


Update 2 Changing the scan() method to run on an open connection instead of opening and closing on every iteration allows to read line-by-line and eliminates the looping. The timing improved quite a bit.

## scan() on open connection 
conn <- file("bigtest.txt", "rt")
substr(scan(conn, what = "", sep = "\n", quiet = TRUE), 1, 1)
close(conn)

I also discovered the stri_read_lines() function in the stringi package, Its help file says it's experimental at the moment, but it is very fast.

## stringi::stri_read_lines()
library(stringi)
stri_sub(stri_read_lines("bigtest.txt"), 1, 1)

Here are the timings for these two methods.

## timings
library(microbenchmark)

microbenchmark(
scan = {
conn <- file("bigtest.txt", "rt")
substr(scan(conn, what = "", sep = "\n", quiet = TRUE), 1, 1)
close(conn)
},
stringi = {
stri_sub(stri_read_lines("bigtest.txt"), 1, 1)
}
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# scan 50.00170 50.10403 50.55055 50.18245 50.56112 54.64646 100
# stringi 13.67069 13.74270 14.20861 13.77733 13.86348 18.31421 100

Original [slower] answer :

You could try read.fwf() (fixed width file), setting the width to a single 1 to capture the first character on each line.

read.fwf("test.txt", 1, stringsAsFactors = FALSE)[[1L]]
# [1] "A" "B" "C" "D" "E"

Not fully tested of course, but works for the test file and is a nice function for getting substrings without having to read the entire file.


Update 1 : read.fwf() is not very efficient, calling scan() and read.table() internally. We can skip the middle-men and try scan() directly.

lines <- count.fields("test.txt")   ## length is num of lines in file
skip <- seq_along(lines) - 1 ## set up the 'skip' arg for scan()
read <- function(n) {
ch <- scan("test.txt", what = "", nlines = 1L, skip = n, quiet=TRUE)
substr(ch, 1, 1)
}
vapply(skip, read, character(1L))
# [1] "A" "B" "C" "D" "E"

version$platform
# [1] "x86_64-pc-linux-gnu"

Read first character on each line in a file

Using a Scanner would make the code considerably cleaner:

private static openandprint() throws IOException {
int i = 0;
try (Scanner s = new Scanner("final.txt"))) {
String line;
while (s.hasNextLine()) {
int change2Int = s.nextInt();
s.nextLine(); // ignore the rest of the line
figures [i] = change2Int;
i++;
}
}
}

C read lines of file to array prints only first character from line

You want this:

#include <stdio.h>
#include <string.h>

#if 0 // uncomment this if strdup is not available on your platform
// Duplicate a string (google strdup for more information)
char **strdup(const char *source)
{
char* newstring = malloc(strlen(source) + 1);
if (newstring)
strcpy(newstring, source);

return newstring;
}
#endif

char **loadNames(FILE* file) {
char **list = malloc(sizeof(char*) * 42); // you want 42 pointers to char, not 42 chars
const int BUFFER_SIZE = 256; // just 256, not {256}
char buffer[BUFFER_SIZE]; // no need to allocate dynamically here
char count = 0; // you forgot to initialize to 0

while (fgets(buffer, BUFFER_SIZE, file)) {
list[count] = strdup(buffer); // you need to duplicate the string,
// not just assign a char
count++;
}

return list;
}

There are still a problem with this code:

  • if there are more than 42 lines in the file, you're in trouble.
  • loadnames returns a pointer to an array of pointers, each of which point to a line of the file. The problem is once you've called loadNames you don't know how many lines have actually been read.
  • there is no error checking whatsoever for brevity.

Fortran: How do I read the first character from each line of a text file?

Although the suggestions were in place, there were also several things that were forgotten. Range of the REAL kind, and some formatting problems.

Anyways, here's one patched up solution, compiled and working, so try to see if this will work for you. I've took the liberty of choosing my own method for fibonacci numbers calculation.

  program SO1658805
implicit none

integer, parameter :: iwp = selected_real_kind(15,310)
real(iwp) :: fi, fib
integer :: i
character(60) :: line
character(1) :: digit
integer :: n0=0, n1=0, n2=0, n3=0, n4=0, n5=0, n6=0, n7=0, n8=0, n9=0

open(unit=1, file='temp.txt', status='replace')
rewind(1)
!-------- calculating fibonacci numbers -------
fi = (1+5**0.5)/2.
do i=0,1477
fib = (fi**i - (1-fi)**i)/5**0.5
write(1,*)fib,i
end do
!----------------------------------------------
rewind(1)

do i=0,1477
read(1,'(a)')line
line = adjustl(line)
write(*,'(a)')line

read(line,'(a1)')digit

if(digit.eq.' ') n0=n0+1
if(digit.eq.'1') n1=n1+1
if(digit.eq.'2') n2=n2+1
if(digit.eq.'3') n3=n3+1
if(digit.eq.'4') n4=n4+1
if(digit.eq.'5') n5=n5+1
if(digit.eq.'6') n6=n6+1
if(digit.eq.'7') n7=n7+1
if(digit.eq.'8') n8=n8+1
if(digit.eq.'9') n9=n9+1
end do
close(1)

write(*,'("Total number of different digits")')
write(*,'("Number of digits 0: ",i5)')n0
write(*,'("Number of digits 1: ",i5)')n1
write(*,'("Number of digits 2: ",i5)')n2
write(*,'("Number of digits 3: ",i5)')n3
write(*,'("Number of digits 4: ",i5)')n4
write(*,'("Number of digits 5: ",i5)')n5
write(*,'("Number of digits 6: ",i5)')n6
write(*,'("Number of digits 7: ",i5)')n7
write(*,'("Number of digits 8: ",i5)')n8
write(*,'("Number of digits 9: ",i5)')n9

read(*,*)

end program SO1658805

Aw, ... I just read you need the number of digits stored in to an array. While I just counted them.

Oh well, ... "left as an exercise for the reader ..." :-)

Get first character of each line in text file

Try this:

setlocal EnableDelayedExpansion
set file=c:\klantenlijst.txt
FOR /F "delims=~" %%i IN (%file%) DO (
set var=%%i
set var=!var:~0,1!
echo !var!
)

You can't do string manipulations with for loop variables.

Manipulating first character of each line in a file with .Replace()

Try the following:

Get-Content C:\Users\Administrator\Desktop\123.txt | ForEach-Object {
if ($_) {
$_.Substring(0, 1).ToUpper() + $_.Substring(1)
} else {
$_
}
} > .\Desktop\finish.txt
  • Get-Content reads the input file line by line and sends each line - stripped of its line terminator - through the pipeline.

  • ForEach-Object processes each line in the associated script block, in which $_ represents the line at hand:

    • if ($_) tests if the line is nonempty, i.e. if there's at least 1 character; if not, the else block simply passes the empty line through.
    • $_.Substring(0, 1).ToUpper() converts the line's 1st character to uppercase, implicitly using the current culture (with a single character, this is equivalent to applying Get-Culture).TextInfo.ToTitleCase()).
    • + $_.Substring(1) appends the rest of the line.
  • Only > rater than >> is needed to write to the output file, because the entire pipeline's output is written at once.



Related Topics



Leave a reply



Submit