Decompress Gz File Using R

Decompress gz file using R

If you really want to uncompress the file, just use the untar function which does support gzip.
E.g.:

untar('chadwick-0.5.3.tar.gz')

Decompress .tif.gz file using R

The following works for me:

url <- paste0("https://data.chc.ucsb.edu/products/CHIRPS-2.0/",
              "global_daily/tifs/p05/1981/chirps-v2.0.1981.01.01.tif.gz")
download.file(url, "chirps-v2.0.1981.01.01.tif.gz")
R.utils::gunzip("chirps-v2.0.1981.01.01.tif.gz", remove = FALSE)

File: chirps-v2.0.1981.01.01.tif

Sample Image

Downloading and extracting .gz data file using R

Some additional options, with base R:

url <- "http://cbio.mskcc.org/microrna_data/human_predictions_S_C_aug2010.txt.gz"
tmp <- tempfile()
##
download.file(url,tmp)
##
data <- read.csv(
  gzfile(tmp),
  sep="\t",
  header=TRUE,
  stringsAsFactors=FALSE)
names(data)[1] <- sub("X\\.","",names(data)[1])
##
R> head(data)
   mirbase_acc mirna_name gene_id gene_symbol transcript_id ext_transcript_id           mirna_alignment
1 MIMAT0000062 hsa-let-7a    5270    SERPINE2    uc002vnu.2         NM_006216   uuGAUAUGUUGGAUGAU-GGAGu
2 MIMAT0000062 hsa-let-7a  494188      FBXO47    uc002hrc.2      NM_001008777 uugaUA-UGUU--GGAUGAUGGAGu
3 MIMAT0000062 hsa-let-7a   80025       PANK2    uc002wkc.2         NM_153638   uugauaUGUUGG-AUGAUGGAgu
4 MIMAT0000062 hsa-let-7a   26036      ZNF451    uc003pdp.2          AK027074    uuGAUAUGUUGGAUGAUGGAGu
5 MIMAT0000062 hsa-let-7a     586       BCAT1    uc001rgd.3         NM_005504    uugaUAUGUUGGAUGAUGGAGu
6 MIMAT0000062 hsa-let-7a   22903       BTBD3    uc002wnz.2         NM_014962  uuGAUAUGUUGGAU-GAUGG-AGu
                  alignment            gene_alignment mirna_start mirna_end gene_start gene_end
1     | :|: ||:|| ||| ||||    aaCGGUGAAAUCU-CUAGCCUCu           2        21        495      516
2     || |||:  ::||||||||:  acaaAUCACAGUUUUUACUACCUUc           2        19        459      483
3         |::||: ||||||||     aauuucAUGACUGUACUACCUga           3        17         77       99
4      || || |   | |||||||     ccCUCUAGA---UUCUACCUCa           2        21       1282     1300
5        :|| |:   ||||||||     guagGUAAAGGAAACUACCUCa           2        19       6410     6431
6    || || ||| || ||||| ||   uaCUUUAAAACAUAUCUACCAUCu           2        21       2265     2288
              genome_coordinates conservation align_score seed_cat energy mirsvr_score
1 [hg19:2:224840068-224840089:-]       0.5684         122        0 -14.73      -0.7269
2  [hg19:17:37092945-37092969:-]       0.6464         140        0 -16.38      -0.1156
3    [hg19:20:3904018-3904040:+]       0.6522         139        0 -16.04      -0.2066
4   [hg19:6:56966300-56966318:+]       0.7627         144        7 -14.51      -0.8609
5  [hg19:12:24964511-24964532:-]       0.6775         150        7 -15.09      -0.2735
6  [hg19:20:11906579-11906602:+]       0.5740         131        0 -12.59      -0.2540

Or if you are on a Unix-like system, you could obtain the .txt file (either outside of R or using system or system2 from within R) like this:

[nathan@nrussell tmp]$ url="http://cbio.mskcc.org/microrna_data/human_predictions_S_C_aug2010.txt.gz"
[nathan@nrussell tmp]$ wget "$url" && gunzip human_predictions_S_C_aug2010.txt.gz

and then proceed as above, where you are reading in human_predictions_S_C_aug2010.txt from wherever wget and gunzip were executed,

data <- read.csv(
  "~/tmp/human_predictions_S_C_aug2010.txt",
  stringsAsFactors=FALSE,
  header=TRUE,
  sep="\t")

in my case.

Using R to download gzipped data file, extract, and import data

I like Ramnath's approach, but I would use temp files like so:

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

The list.files() should produce something like this:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt"

which you could parse if you needed to automate this process for a lot of files.

unzip a tar.gz file?

fn <- "http://s.wordpress.org/resources/survey/wp2011-survey.tar.gz"
download.file(fn,destfile="tmp.tar.gz")
untar("tmp.tar.gz",list=TRUE)  ## check contents
untar("tmp.tar.gz")
## or, if you just want to extract the target file:
untar("tmp.tar.gz",files="wp2011-survey/anon-data.csv")
X <- read.csv("wp2011-survey/anon-data.csv")

Tom Wenseleers points out that the archive package can help with this:

library(archive)
library(readr)
read_csv(archive_read("tmp.tar.gz", file = 3), col_types = cols())

and that archive::archive_extract("tmp.tar.gz", files="wp2011-survey/anon-data.csv") is quite a bit faster than the in-built base R untar (especially for large archives) It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.

Decompress Gz File Using R