Issue When Importing Dataset: 'Error in Scan(...): Line 1 Did Not Have 145 Elements'

Issue when importing dataset: `Error in scan(...): line 1 did not have 145 elements`

This error is pretty self-explanatory. There seem to be data missing in the first line of your data file (or second line, as the case may be since you're using header = TRUE).

Here's a mini example:

## Create a small dataset to play with
cat("V1 V2\nFirst 1 2\nSecond 2\nThird 3 8\n", file="test.txt")

R automatically detects that it should expect rownames plus two columns (3 elements), but it doesn't find 3 elements on line 2, so you get an error:

read.table("test.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
# line 2 did not have 3 elements

Look at the data file and see if there is indeed a problem:

cat(readLines("test.txt"), sep = "\n")
# V1 V2
# First 1 2
# Second 2
# Third 3 8

Manual correction might be needed, or we can assume that the value first value in the "Second" row line should be in the first column, and other values should be NA. If this is the case, fill = TRUE is enough to solve your problem.

read.table("test.txt", header = TRUE, fill = TRUE)
# V1 V2
# First 1 2
# Second 2 NA
# Third 3 8

R is also smart enough to figure it out how many elements it needs even if rownames are missing:

cat("V1 V2\n1\n2 5\n3 8\n", file="test2.txt")
cat(readLines("test2.txt"), sep = "\n")
# V1 V2
# 1
# 2 5
# 3 8
read.table("test2.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
# line 1 did not have 2 elements
read.table("test2.txt", header = TRUE, fill = TRUE)
# V1 V2
# 1 1 NA
# 2 2 5
# 3 3 8

Confusing error in R: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 42 elements)

read.table wants to return a data.frame, which must have an element in each column. Therefore R expects each row to have the same number of elements and it doesn't fill in empty spaces by default. Try read.table("/PathTo/file.csv" , fill = TRUE ) to fill in the blanks.

e.g.

read.table( text= "Element1 Element2
Element5 Element6 Element7" , fill = TRUE , header = FALSE )
# V1 V2 V3
#1 Element1 Element2
#2 Element5 Element6 Element7

A note on whether or not to set header = FALSE... read.table tries to automatically determine if you have a header row thus:

header is set to TRUE if and only if the first row contains one fewer field than the number of columns

R read.table error: line 1 did not have 52 elements


read.table(text="KL02222200001010000101061101131101471101224  661  321 344  30 1  411  551  571  524  34 6  55 6  56 6 734 904 904 844 8941004 994 964 891 991 99116 2120 1132 21 174 81-99-99 81-99-99 81-99-99 804  001 83 43 53 11 11 12 11 01 13   061  1861   461  2261  001  001 501-9999-99999-99999
KL02222200001020000101731101631101591101654 911 241 674 15 1 321 891 621 614 31 6 67 6 53 6 764 834 834 814 984 734 884 864 981 731 87116 2116 2116 31 234 71-99-99 71-99-99 71-99-99 704 2211 73 83 83 11 11 11 01 11 13 001 001 001 001 001 001 701-9999-99999-99999
KL02222200001030000101371101211100991101194 821 581 244 52 1 651 751 641 674 51 6 55 6 49 6 784 774 774 774 814 744 804 784 811 741 80116 3120 4116 31 334 71-99-99 71-99-99 71-99-99 704 001 81 81 81 11 11 11 01 01 01 001 001 001 2461 001 0011001-9999-99999-99999",
as.is=T, sep = "", head=F, strip.white = T, fill=T)

Result (converted to tibble for better readability)

 A tibble: 3 x 52
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 KL022222~ 661 321 344 30 1 411 551 571 524 34 6
2 KL022222~ 911 241 674 15 1 321 891 621 614 31 6
3 KL022222~ 821 581 244 52 1 651 751 641 674 51 6
# ... with 40 more variables: V13 <int>, V14 <int>, V15 <int>, V16 <int>,
# V17 <int>, V18 <int>, V19 <int>, V20 <int>, V21 <int>, V22 <int>,
# V23 <int>, V24 <int>, V25 <int>, V26 <int>, V27 <int>, V28 <int>,
# V29 <int>, V30 <int>, V31 <chr>, V32 <chr>, V33 <chr>, V34 <chr>,
# V35 <int>, V36 <int>, V37 <int>, V38 <int>, V39 <int>, V40 <int>,
# V41 <int>, V42 <int>, V43 <int>, V44 <int>, V45 <int>, V46 <int>,
# V47 <int>, V48 <int>, V49 <int>, V50 <int>, V51 <chr>, V52 <chr>

read.table not working for importing a .dat

The problem is there are spaces in the header row, so just skip that with skip = 1.

From there, we can extract the even and odd rows using a repeating logical vector c(TRUE, FALSE) and c(FALSE, TRUE).

The final line of the data has some empty values, so remove those with complete.cases().

data <- read.table("https://www2.isye.gatech.edu/~jeffwu/book/data/BrainandBodyWeight.dat",
header = FALSE, fill = TRUE, skip = 1)

result <- data.frame(Body.Wt = unname(unlist(data[,c(T,F)])),
Brain.Wt = unname(unlist(data[,c(F,T)])))

result <- result[complete.cases(result),]
head(result)
Body.Wt Brain.Wt
1 3.385 44.5
2 0.480 15.5
3 1.350 8.1
4 465.000 423.0
5 36.330 119.5
6 27.660 115.0

R: read .tab files

We can use the fill argument in read.table for rows that have less number of elements to be filled by NA

data <- read.table(functions.tab, header = F, sep = "\t", fill = TRUE)


Related Topics



Leave a reply



Submit