Using R to Download Newest Files from Ftp-Server

Using R to download newest files from ftp-server

This should work

library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)

-

times<-lapply(strsplit(filenames,"[-.]"),function(x){
time<-paste(c(substr(x[1], nchar(x[1])-3, nchar(x[1])),x[2:6]),
collapse="-")
time<-as.POSIXct(time, "%Y-%m-%d-%H-%M-%S", tz="GMT")
})
ind <- which.max(times)
dat <- try(getURL(paste(url,filenames[ind],sep=""), userpwd = userpwd))

So datis now containing the newest file

To make it reproduceable: all others can use this instead of the upper part use

filenames<-c("FileA2014-03-05-10-24-12.csv","FileB2014-03-06-10-25-12.csv") 

Import File from FTP to R

I think you get this error message because function download.file() has no argument named credentials.

I would try to pass the credentials as discussed here:

url = "ftp://username:password@ftppath/www_logs/testfolder/test.csv"
download.file(url, destfile = "test.csv")

If you want to load the file into an R data.frame, you could try something like this:

library(RCurl) 
url <- "ftp://ftppath/www_logs/testfolder/test.csv"
text_data <- getURL(url, userpwd = "username:password", connecttimeout = 60)
df <- read.csv(text = text_data)

Downloading files from ftp with R

Use the curl library to extract the directory listing

> library(curl)
> url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
> h = new_handle(dirlistonly=TRUE)
> con = curl(url, "r", h)
> tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
> close(con)
> head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML

Paste the relevant ones on to the url and use

urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])

Using R to access FTP Server and Download Files Results in Status 530 Not logged in

I think you're going to need a more defensive strategy when working with this FTP server:

library(curl)  # ++gd > RCurl
library(purrr) # consistent "data first" functional & piping idioms FTW
library(dplyr) # progress bar

# We'll use this to fill in the years
ftp_base <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/%s/"

dir_list_handle <- new_handle(ftp_use_epsv=FALSE, dirlistonly=TRUE, crlf=TRUE,
ssl_verifypeer=FALSE, ftp_response_timeout=30)

# Since you, yourself, noted the server was perhaps behaving strangely or under load
# it's prbly a much better idea (and a practice of good netizenship) to cache the
# results somewhere predictable rather than a temporary, ephemeral directory
cache_dir <- "./gsod_cache"
dir.create(cache_dir, showWarnings=FALSE)

# Given the sporadic efficacy of server connection, we'll wrap our calls
# in safe & retry functions. Change this variable if you want to have it retry
# more times.
MAX_RETRIES <- 6

# Wrapping the memory fetcher (for dir listings)
s_curl_fetch_memory <- safely(curl_fetch_memory)
retry_cfm <- function(url, handle) {

i <- 0
repeat {
i <- i + 1
res <- s_curl_fetch_memory(url, handle=handle)
if (!is.null(res$result)) return(res$result)
if (i==MAX_RETRIES) { stop("Too many retries...server may be under load") }
}

}

# Wrapping the disk writer (for the actual files)
# Note the use of the cache dir. It won't waste your bandwidth or the
# server's bandwidth or CPU if the file has already been retrieved.
s_curl_fetch_disk <- safely(curl_fetch_disk)
retry_cfd <- function(url, path) {

# you should prbly be a bit more thorough than `basename` since
# i think there are issues with the 1971 and 1972 filenames.
# Gotta leave some work up to the OP
cache_file <- sprintf("%s/%s", cache_dir, basename(url))
if (file.exists(cache_file)) return()

i <- 0
repeat {
i <- i + 1
if (i==6) { stop("Too many retries...server may be under load") }
res <- s_curl_fetch_disk(url, cache_file)
if (!is.null(res$result)) return()
}

}

# the stations and years
station <- c("983240-99999", "983250-99999", "983270-99999", "983280-99999",
"984260-41231", "984290-99999", "984300-99999", "984320-99999",
"984330-99999")
years <- 1960:2016

# progress indicators are like bowties: cool
pb <- progress_estimated(length(years))
walk(years, function(yr) {

# the year we're working on
year_url <- sprintf(ftp_base, yr)

# fetch the directory listing
tmp <- retry_cfm(year_url, handle=dir_list_handle)
con <- rawConnection(tmp$content)
fils <- readLines(con)
close(con)

# sift out only the target stations
map(station, ~grep(., fils, value=TRUE)) %>%
keep(~length(.)>0) %>%
flatten_chr() -> fils

# grab the stations files
walk(paste(year_url, fils, sep=""), retry_cfd)

# tick off progress
pb$tick()$print()

})

You may also want to set curl_interrupt to TRUE in the curl handle if you want to be able to stop/esc/interrupt the downloads.

Download or Connect csv data from FTP wiht R

For the case of a sftp-server I use 'getURL' function from RCurl package

For your input, and an sfpt-server, the first call would download the file.

Maybe its similar for a regular ftp-connection(second call)?

# sftp
RCurl::getURL(url = "sftp://user123:Test123@example.com/my_file/example.csv")

# maybe works for a ftp connection
RCurl::getURL(url = "ftp://user123:Test123@example.com/my_file/example.csv")

hope it helps :)



Related Topics



Leave a reply



Submit