How to open a huge .sql file
SQL editor can open a file upto 500 mb without very very good specs, this seems to be going something wrong, if you want to insert the data from one database to another, try to use import/export wizard, SSIS package or command line utility creating a script of table and then insert is not a good approach.
How to open and work with a very large .SQL file that was generated in a dump?
2 things that might be helpful:
- Use
pv
to see how much of the .sql file has already been read. This can give you a progress bar which at least tells you it's not suck. - Log into MySQL and use
SHOW PROCESSLIST
to see what MySQL currently is executing. If it's still running, just let it run to completion.
If turned on, it might really help to turn off the binlog for the duration of the restore. Another thing that may or may not be helpful... if you have the choice, try to use the fastest disks available. You may have this kind of option if you're running on hosters like Amazon. You're going to really feel the pain if you're (for example) doing this on a standard EC2 host.
Edit very large sql dump/text file (on linux)
Rather than removing the first few lines, try editing them to be whitespace.
The hexedit
program can do this-- it reads files in chunks, so opening a 10GB file is no different from opening a 100KB file to it.
$ hexedit largefile.sql.dump
tab (switch to ASCII side)
space (repeat as needed until your header is gone)
F2 (save)/Ctrl-X (save and exit)/Ctrl-C (exit without saving)
How do you import a large MS SQL .sql file?
From the command prompt, start up sqlcmd
:
sqlcmd -S <server> -i C:\<your file here>.sql
Just replace <server>
with the location of your SQL box and <your file here>
with the name of your script. Don't forget, if you're using a SQL instance the syntax is:
sqlcmd -S <server>\instance.
Here is the list of all arguments you can pass sqlcmd:
Sqlcmd [-U login id] [-P password]
[-S server] [-H hostname] [-E trusted connection]
[-d use database name] [-l login timeout] [-t query timeout]
[-h headers] [-s colseparator] [-w screen width]
[-a packetsize] [-e echo input] [-I Enable Quoted Identifiers]
[-c cmdend] [-L[c] list servers[clean output]]
[-q "cmdline query"] [-Q "cmdline query" and exit]
[-m errorlevel] [-V severitylevel] [-W remove trailing spaces]
[-u unicode output] [-r[0|1] msgs to stderr]
[-i inputfile] [-o outputfile] [-z new password]
[-f | i:[,o:]] [-Z new password and exit]
[-k[1|2] remove[replace] control characters]
[-y variable length type display width]
[-Y fixed length type display width]
[-p[1] print statistics[colon format]]
[-R use client regional setting]
[-b On error batch abort]
[-v var = "value"...] [-A dedicated admin connection]
[-X[1] disable commands, startup script, environment variables [and exit]]
[-x disable variable substitution]
[-? show syntax summary]
Manipulation of Large Files in R
As the @Dominic Comtois I would also recommend to use SQL.
R can handle quite a biggish data - there is nice benchmark of 2 billions rows which beats python - but because R run mostly in memory you need to have a good machine to make it work. Still your case don't need to load more than 4.5GB file at once so it should be well doable on personal computer, see second approach for fast non-database solution.
You can utilize R to load data to SQL database and later to query them from database.
If you don't know SQL you may want to use some simple database. The simplest way from R is to use RSQLite (unfortunately since v1.1 it is not lite any more). You don't need to install or manage any external dependency. The RSQLite package contains the database engine embedded.
library(RSQLite)
library(data.table)
conn <- dbConnect(dbDriver("SQLite"), dbname="mydbfile.db")
monthfiles <- c("month1","month2") # ...
# write data
for(monthfile in monthfiles){
dbWriteTable(conn, "mytablename", fread(monthfile), append=TRUE)
cat("data for",monthfile,"loaded to db\n")
}
# query data
df <- dbGetQuery(conn, "select * from mytablename where customerid = 1")
# when working with bigger sets of data I would recommend to do below
setDT(df)
dbDisconnect(conn)
Thats all. You use SQL without really having to do much overhead usually related to databases.
If you prefer to go with the approach from your post I think you can dramatically speed up by doing write.csv
by groups while aggregation in data.table.
library(data.table)
monthfiles <- c("month1","month2") # ...
# write data
for(monthfile in monthfiles){
fread(monthfile)[, write.csv(.SD,file=paste0(CustomerID,".csv"), append=TRUE), by=CustomerID]
cat("data for",monthfile,"written to csv\n")
}
So you utilize fast unique from data.table and perform subsetting while grouping which is also ultra fast. Below is working example of the approach.
library(data.table)
data.table(a=1:4,b=5:6)[,write.csv(.SD,file=paste0(b,".csv")),b]
Update 2016-12-05:
Starting from data.table 1.9.8+ you can replace write.csv
with fwrite
, example in this answer.
Related Topics
Copy N Bytes of Data X to File
Bash Script: Can Not Properly Handle Sigtstp
Save Multiple Password Accounts for Git
Change Script Directory to User's Homedir in a Shell Script
Linux - Run Process on Several Cores
Haskell Cabal: Mysterious Missing or Recursive Dependencies
How to Convince Z/Os Scp to Transfer Binary Files
How to Pass Input to a Running Service or Daemon
Launching Linux from Busybox (Pivot_Root or Switch_Root, or ? )
How to Send Data Using Curl from Linux Command Line
How to Run Jprofiler from Windows Machine to Remote Linux Jvm
Increase Mongodb Maximum Number of Connections
One-Shot *Level*-Triggered Epoll(): Does Epolloneshot Imply Epollet
How to Use Sed to Delete Leading Digits
Updating Shiny Server Config to Change Timeout Error
What's The Meaning of 'C' in Result of Command "Ls -L /Dev/Tty'