How to Write Data from R to Postgresql Tables with an Autoincrementing Primary Key

How to add an auto-incrementing primary key to an existing table, in PostgreSQL?

(Updated - Thanks to the people who commented)

Modern Versions of PostgreSQL

Suppose you have a table named test1, to which you want to add an auto-incrementing, primary-key id (surrogate) column. The following command should be sufficient in recent versions of PostgreSQL:

   ALTER TABLE test1 ADD COLUMN id SERIAL PRIMARY KEY;

Older Versions of PostgreSQL

In old versions of PostgreSQL (prior to 8.x?) you had to do all the dirty work. The following sequence of commands should do the trick:

  ALTER TABLE test1 ADD COLUMN id INTEGER;
CREATE SEQUENCE test_id_seq OWNED BY test1.id;
ALTER TABLE test1 ALTER COLUMN id SET DEFAULT nextval('test_id_seq');
UPDATE test1 SET id = nextval('test_id_seq');

Again, in recent versions of Postgres this is roughly equivalent to the single command above.

How to write a table in PostgreSQL from R?

Ok, I'm not sure why dbWriteTable() would be failing; there may be some kind of version/protocol mismatch. Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.

Regarding the insert into workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing. Basically, you divide the data into smaller chunks and send each chunk one at a time.

As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time. So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism) and accumulating the records into a local database until the server reported the query complete.

Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size. It's mostly built around calls to paste() to build the SQL strings, and dbSendQuery() to send the actual queries. I also use postgresqlDataType() for the table creation.

## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);

## define helper functions
createEmptyTable <- function(con,tn,df) {
sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
dbSendQuery(con,sql);
invisible();
};

insertBatch <- function(con,tn,df,size=100L) {
if (nrow(df)==0L) return(invisible());
cnt <- (nrow(df)-1L)%/%size+1L;
for (i in seq(0L,len=cnt)) {
sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
dbSendQuery(con,sql);
};
invisible();
};

## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));

## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE

Note that I didn't bother prepending a row.names column to the database table, unlike dbWriteTable(), which always seems to include such a column (and doesn't seem to provide any means of preventing it).

Change primary key to auto increment

I figure it out: just add an auto-increment default value to the playerID:

create sequence player_id_seq;
alter table player alter playerid set default nextval('player_id_seq');
Select setval('player_id_seq', 2000051 ); --set to the highest current value of playerID

How to set auto increment primary key in PostgreSQL?

Try this command:

ALTER TABLE your_table ADD COLUMN key_column BIGSERIAL PRIMARY KEY;

Try it with the same DB-user as the one you have created the table.

postgres autoincrement not updated on explicit id inserts

That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".

You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:

SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));

The name of the sequence is autogenerated and is always tablename_columnname_seq.

PostgreSQL - create an auto-increment column for non-primary key

You may try making the item_id column SERIAL. I don't know whether or not it's possible to alter the current item_id column to make it serial, so we might have to drop that column and then add it back, something like this:

ALTER TABLE yourTable DROP COLUMN item_id;
ALTER TABLE yourTable ADD COLUMN item_id SERIAL;

If there is data in the item_id column already, it may not make sense from a serial point of view, so hopefully there is no harm in deleting it.

What's the PostgreSQL datatype equivalent to MySQL AUTO INCREMENT?

Yes, SERIAL is the equivalent function.

CREATE TABLE foo (
id SERIAL,
bar varchar
);

INSERT INTO foo (bar) VALUES ('blah');
INSERT INTO foo (bar) VALUES ('blah');

SELECT * FROM foo;

+----------+
| 1 | blah |
+----------+
| 2 | blah |
+----------+

SERIAL is just a create table time macro around sequences. You can not alter SERIAL onto an existing column.

loading dataframe into table postgres and pandas with auto-incrementing id

Try using if_exists="append" in your to_sql function.
If you use "replace" instead it might recreate the table using only the columns in the excel file.



Related Topics



Leave a reply



Submit