Duplicate 'Row.Names' Are Not Allowed Error

duplicate 'row.names' are not allowed error

Then tell read.table not to use row.names:

systems <- read.table("http://getfile.pl?test.csv", 
                      header=TRUE, sep=",", row.names=NULL)

and now your rows will simply be numbered.

Also look at read.csv which is a wrapper for read.table which already sets the sep=',' and header=TRUE arguments so that your call simplifies to

systems <- read.csv("http://getfile.pl?test.csv", row.names=NULL)

R duplicate 'row.names' are not allowed

According to the R documentation here,

If there is a header and the first row contains one fewer field 
than the number of columns, the first column in the input is used
for the row names. Otherwise if row.names is missing, the rows are numbered.

... therefore I'd suggest that the first row may have one fewer field than the number of columns, so read.table() is selecting the first column (which contains more than one copy of molecular_function) as the row names.

Desperate to solve duplicate 'row.names' are not allowed in R plm package: There are no duplicates

This looks like a bug in the plm function. Your du column in gust has named values; that is causing plm to crash.

You can work around the bug by removing those names:

gust$du <- unname(gust$du)

After I do that, I get successful results:

> summary(plm(du ~ g, data = gust, index = c("ETreg", "year"), model = "pooling"))
Pooling Model

Call:
plm(formula = du ~ g, data = gust, model = "pooling", index = c("ETreg", 
    "year"))

Balanced Panel: n = 3, T = 11, N = 33

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-2.53175 -1.02819  0.27557  0.77953  3.84676 

Coefficients:
             Estimate Std. Error t-value  Pr(>|t|)    
(Intercept)  0.851158   0.263640  3.2285  0.002939 ** 
g           -0.365347   0.053228 -6.8639 1.079e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    157.46
Residual Sum of Squares: 62.488
R-Squared:      0.60314
Adj. R-Squared: 0.59034
F-statistic: 47.1127 on 1 and 31 DF, p-value: 1.0794e-07

duplicate 'row.names' are not allowed -- still killing me

I believe the problem is caused by the fact that the row names are numbers, e.g., 199990901000, which are larger than the greatest integer value .Machine$integer.max which is 2147483647. While row names of a data.frame are of type character it might cause a problem in later processing steps, perhaps.

Therefore, I suggest to treat the first column as a regular data column and not as row.names.

The code below worked for me to read the file and to coerce many columns to factor:

library(data.table)
url <- sprintf("https://docs.google.com/uc?id=%s&export=download", 
               "1NwcvwwaPLWaSmKOuQiVrWAK4iKn9f10S")
d5_17cou  <- fread(url, dec = ",", colClasses = list(character = 1L))
cols <- names(d5_17cou)[8:37]
d5_17cou[, (cols) := lapply(.SD, as.factor), .SDcols = cols]
str(d5_17cou)

Classes ‘data.table’ and 'data.frame':    22431 obs. of  39 variables:
 $ S007      : chr  "199905600001" "199905600002" "199905600003" "199905600004" ...
 $ S003A     : int  56 56 56 56 56 56 56 56 56 56 ...
 $ cou.year  : int  561999 561999 561999 561999 561999 561999 561999 561999 561999 561999 ...
 $ year      : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
 $ s017ay    : num  0.692 1.051 1.051 0.752 0.752 ...
 $ uitem     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ item      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ a025r     : Factor w/ 2 levels "1","3": 2 2 2 2 1 1 1 2 1 2 ...
 $ a034r     : Factor w/ 2 levels "1","3": 1 2 1 1 1 2 1 2 1 1 ...
 $ a038r     : Factor w/ 2 levels "1","3": 1 2 2 2 2 1 2 1 2 1 ...
 $ a040r     : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ a041r     : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ a042r     : Factor w/ 2 levels "1","3": 2 1 2 2 1 1 1 1 1 1 ...
 $ c001r     : Factor w/ 2 levels "1","3": 1 1 2 1 1 1 1 1 1 1 ...
 $ c024r     : Factor w/ 2 levels "1","3": 2 2 2 2 1 2 2 1 2 2 ...
 $ c037r     : Factor w/ 2 levels "1","3": 1 1 1 1 2 2 1 2 2 1 ...
 $ charity   : Factor w/ 2 levels "1","3": 1 1 2 1 1 1 1 2 2 2 ...
 $ clz.outgr4: Factor w/ 2 levels "1","3": 2 1 1 1 1 1 1 2 1 1 ...
 $ d019r     : Factor w/ 2 levels "1","3": 2 2 2 2 2 2 2 2 2 2 ...
 $ d023r     : Factor w/ 2 levels "1","3": 2 2 1 1 2 2 2 2 2 2 ...
 $ e014r     : Factor w/ 2 levels "1","3": 1 2 1 1 2 2 2 2 2 2 ...
 $ e018r     : Factor w/ 2 levels "1","3": 2 2 2 2 1 1 1 2 2 1 ...
 $ e035r     : Factor w/ 2 levels "1","3": 2 2 1 2 2 1 1 2 2 2 ...
 $ e114r     : Factor w/ 2 levels "1","3": 2 1 1 1 1 1 1 1 1 1 ...
 $ e143r     : Factor w/ 2 levels "1","3": 2 1 1 2 2 1 1 1 1 2 ...
 $ e146r     : Factor w/ 2 levels "1","3": 1 1 2 1 1 2 1 1 2 1 ...
 $ e190rr    : Factor w/ 2 levels "1","3": 2 1 2 2 2 1 2 2 2 2 ...
 $ f022r     : Factor w/ 2 levels "1","3": 1 1 1 2 1 1 1 1 1 1 ...
 $ f028r     : Factor w/ 2 levels "1","3": 1 2 2 2 2 1 2 1 1 1 ...
 $ f051r     : Factor w/ 2 levels "1","3": 1 2 2 2 2 1 1 2 1 1 ...
 $ f064r     : Factor w/ 2 levels "1","3": 1 2 2 2 1 1 2 2 2 1 ...
 $ f066r     : Factor w/ 2 levels "1","3": 1 2 2 2 2 1 2 2 2 2 ...
 $ f121r     : Factor w/ 2 levels "1","3": 1 2 2 1 1 2 2 1 2 2 ...
 $ helpef    : Factor w/ 2 levels "1","3": 1 2 2 2 2 2 2 2 2 2 ...
 $ jpay      : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ prices1   : Factor w/ 2 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ psub.all  : Factor w/ 2 levels "1","3": 2 1 1 1 1 2 1 2 2 1 ...
 $ oriend    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dupl      : int  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, ".internal.selfref")=<externalptr>

Note that the first column S007 is explicitely read in as character column (otherwise fread() uses int64) and is part of the dataset, now. Consequently, the numbering of all subsequent columns is changed.

BTW, fread() is much faster than read.table().

Error in `.rowNamesDF -`(x, value = value) : 'row.names' duplicate are not allowed. In addition: Warning message: non-unique values

We don't need a for loop here. Just index the data.frame to subset the columns, unlist and construct data.frame directly

out <-  data.frame(country = unlist(total_authority[c(1,3)]), 
         score = unlist(total_authority[c(2,4)]),
       year = rep(names(total_authority)[c(2,4)], each = nrow(total_authority)))
row.names(out) <- NULL

-output

> out
          country                     score year
1         Albania 0.00000000000000003122502 1994
2         Algeria 0.00000000000000003122502 1994
3  American Somoa 0.00000000000000003122502 1994
4          Angola 0.00000000000000003122502 1994
5        Anguilla 0.00000000000000003122502 1994
6         Antigua 0.00000000000000003122502 1994
7       Argentina 0.00289122132708816018468 1994
8         Armenia 0.00000000000000003122502 1994
9           Aruba 0.00000528966979389429013 1994
10      Australia 0.00622391681538347982944 1994
11        Albania 0.00000320558770721281009 1995
12        Algeria 0.00000000000000002775558 1995
13 American Somoa 0.00000000000000002775558 1995
14         Angola 0.00000000000000002775558 1995
15       Anguilla 0.00000000000000002775558 1995
16        Antigua 0.00000000000000002775558 1995
17      Argentina 0.02245380108584869860433 1995
18        Armenia 0.00000000000000002775558 1995
19          Aruba 0.00000000000000002775558 1995
20      Australia 0.40763348337921900821357 1995

Regarding the error with duplicate row.names, it occurs because the authority created is a data.frame with a single column ([), instead, we need a vector by extracting the column ([[)

final_output<-data.frame()
for (count in 1:2) {
   df <- data.frame(country=actors)
   df$year=rep(names(total_authority)[2*count],nrow(df))
   df$authority<-total_authority[[2*count]]
   final_output <- rbind(final_output, df)
 }

-output

> final_output
          country year                 authority
1         Albania 1994 0.00000000000000003122502
2         Algeria 1994 0.00000000000000003122502
3  American Somoa 1994 0.00000000000000003122502
4          Angola 1994 0.00000000000000003122502
5        Anguilla 1994 0.00000000000000003122502
6         Antigua 1994 0.00000000000000003122502
7       Argentina 1994 0.00289122132708816018468
8         Armenia 1994 0.00000000000000003122502
9           Aruba 1994 0.00000528966979389429013
10      Australia 1994 0.00622391681538347982944
11        Albania 1995 0.00000320558770721281009
12        Algeria 1995 0.00000000000000002775558
13 American Somoa 1995 0.00000000000000002775558
14         Angola 1995 0.00000000000000002775558
15       Anguilla 1995 0.00000000000000002775558
16        Antigua 1995 0.00000000000000002775558
17      Argentina 1995 0.02245380108584869860433
18        Armenia 1995 0.00000000000000002775558
19          Aruba 1995 0.00000000000000002775558
20      Australia 1995 0.40763348337921900821357

Duplicate 'Row.Names' Are Not Allowed Error