Is it possible to insert (add) a row to a SQLite db table using dplyr package?
You can perform SQL operations on a database/table created via dplyr
, but you have to revert to RSQLite/DBI calls and change how you made the database/table:
library(dplyr)
my_db <- src_sqlite("my_db.sqlite3", create=TRUE)
copy_to(my_db, iris, "my_table", temporary=FALSE) # need to set temporary to FALSE
# grab the db connection from the object created by src_sqlite
# and issue the INSERT That way
res <- dbSendQuery(my_db$con,
'INSERT INTO my_table VALUES (9.9, 9.9, 9.9, 9.9, "new")')
Can't append data to sqlite3 with dplyr db_write_table()
No, you shouldn't use db_write_table() instead of db_insert_table(), since it can't be generalized across backends.
And you shouldn't use the tidyverse versions rather than the relevant DBI::: versions, since the tidyverse helper functions are for internal use, and not designed to be robust enough for use by users. See the discussion at https://github.com/tidyverse/dplyr/issues/3120#issuecomment-339034612 :
Actually, I don't think you should be using these functions at all. Despite that SO post, these are not user facing functions. You should be calling the DBI functions directly.
-- Hadley Wickham, package author.
Adding column to sqlite database
Not aware of a way of doing this with dyplr
, but you can do it with RSQLite
directly. The problem is not actually with RSQLite
, but the fact that I don't know how to pass a list to mutate
. Note that, in your code, something like this would work:
cars_tbl %>% mutate(new_col = another_column / 3.14)
Anyway, my alternative. I've created a toy cars
dataframe.
cars <- data.frame(year=c(1999, 2007, 2009, 2017), model=c("Ford", "Toyota", "Toyota", "BMW"))
I open connection and actually create the table,
dbcon <- dbConnect(RSQLite::SQLite(), "cars.db")
dbWriteTable(dbcon, name = "cars", value = cars)
Add the new column and check,
dbGetQuery(dbcon, "ALTER TABLE cars ADD COLUMN new_col TEXT")
dbGetQuery(dbcon, "SELECT * FROM cars")
year model new_col
1 1999 Ford <NA>
2 2007 Toyota <NA>
3 2009 Toyota <NA>
4 2017 BMW <NA>
And then you can update the new column, but the only tricky thing is that you have to provide a where
statement, in this case I use the year.
new_values <- sample(c("A","B","C"), nrow(cars), replace = TRUE)
new_values
[1] "C" "B" "B" "B"
dbGetPreparedQuery(dbcon, "UPDATE cars SET new_col = ? where year=?",
bind.data=data.frame(new_col=new_values,
year=cars$year))
dbGetQuery(dbcon, "SELECT * FROM cars")
year model new_col
1 1999 Ford C
2 2007 Toyota B
3 2009 Toyota B
4 2017 BMW B
As a unique index, you could always use rownames(cars)
, but you would have to add it as a column in your dataframe and then in your table.
EDIT after suggestion by @krlmlr: indeed much better using dbExecute
instead of deprecated dbGetPreparedQuery
,
dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where year = :year",
params=data.frame(new_col=new_values,
year=cars$year))
EDIT after comments: I did not think about this a few days ago, but even if it is a SQLite
you can use the rowid
. I've tested this and it works.
dbExecute(dbcon, "UPDATE cars SET new_col = :new_col where rowid = :id",
params=data.frame(new_col=new_values,
id=rownames(cars)))
Although you have to make sure that the rowid's in the table are the same as your rownames. Anyway you can always get your rowid's like this:
dbGetQuery(dbcon, "SELECT rowid, * FROM cars")
rowid year model new_col
1 1 1999 Ford C
2 2 2007 Toyota B
3 3 2009 Toyota B
4 4 2017 BMW B
Select specific rows from sqlite database using dbplyr in R
filter
would work with sqlitedb
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
copy_to(con, iris, "iris")
iris_db <- tbl(con, "iris")
iris_db %>%
filter(row_number() %in% c(1, 8, 20))
# Source: lazy query [?? x 5]
# Database: sqlite 3.29.0 [:memory:]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <chr>
#1 5.1 3.5 1.4 0.2 setosa
#2 5 3.4 1.5 0.2 setosa
#3 5.1 3.8 1.5 0.3 setosa
How can I append data to a PostgreSQL table with `dplyr` without `collect()`?
dplyr
does not include commands to insert or update records in a database, so there is not a complete native dplyr
solution for this. But you could combine dplyr with regular SQL statements to avoid bringing the data to R.
Let's start by reproducing your steps before the collect()
statement
library(dplyr)
pg <- src_postgres()
reg_data <- tbl(pg, "reg_data")
reg_results <-
reg_data %>%
summarize(r_squared=regr_r2(y, x),
num_obs=regr_count(y, x),
constant=regr_intercept(y, x),
slope=regr_slope(y, x),
mean_analyst_fog=regr_avgx(y, x),
mean_manager_fog=regr_avgy(y, x))
Now, you could use compute()
instead of collect()
to create a temporary table in the database.
temp.table.name <- paste0(sample(letters, 10, replace = TRUE), collapse = "")
reg_results <- reg_results %>% compute(name=temp.table.name)
Where temp.table.name
is a random table name. Using the option name = temp.table.name
in compute we assign this random name to the temporary table created.
Now, we will use the library RPostgreSQL
to create an insert query that uses the results stored in the temporary table. As the temporary table only lives in the connection created by src_postgresql()
we need to reuse it.
library(RPostgreSQL)
copyconn <- pg$con
class(copyconn) <- "PostgreSQLConnection" # I get an error if I don't fix the class
Finally the insert query
sql <- paste0("INSERT INTO destination_table SELECT * FROM ", temp.tbl.name,";")
dbSendQuery(copyconn, sql)
So, everything is happening in the database and the data is not brought into R.
EDIT
Previous versions of this post did break encapsulation when we obtained temp.tbl.name
from reg_results
. This is avoided using the option name=
in compute.
dplyr:: append to postgresql remote source if uid not present
Per @hadley's comment to this answer, you can use dbWriteTable(append = TRUE)
. Here's an example with SQLite:
library(dplyr)
# Create some example data
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
dd <- data.frame(uid = 1:20, value = LETTERS[1:20])
dat <- data.frame(uid = 15:26, value = LETTERS[15:26])
copy_to(con, dd, "my_data",
temporary = FALSE,
indexes = list("uid", "value")
)
# Identify rows to append
dd_db <- tbl(con, "my_data")
dd_uid <- select(dd_db, uid) %>%
collect() %>%
unlist()
dat_to_append <- filter(dat, !(uid %in% dd_uid))
DBI::dbWriteTable(con, "my_data", dat_to_append, append = TRUE)
Related Topics
Dplyr - Summary Table for Multiple Variables
Monitoring for Changes in File(S) in Real Time
Display and Save the Plot Simultaneously in R, Rstudio
Generating a Vector of Difference Between Two Vectors
Pandoc Insert Appendix After Bibliography
Clipping Raster Using Shapefile in R, But Keeping the Geometry of the Shapefile
How to Skip an Error in a Loop
Logistic Regression - Defining Reference Level in R
How to Compute Roc and Auc Under Roc After Training Using Caret in R
Marking Specific Tiles in Geom_Tile()/Geom_Raster()
Rescaling the Y Axis in Bar Plot Causes Bars to Disappear:R Ggplot2
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
What Does the Diff() Function in R Do
What Does the Double Percentage Sign (%%) Mean
R: Generate All Permutations of Vector Without Duplicated Elements