Download All Files from a Folder on a Website

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?

Solution:

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

Explanation:

  • It will download all files and subfolders in ddd directory
  • -r : recursively
  • -np : not going to upper directories, like ccc/…
  • -nH : not saving files to hostname folder
  • --cut-dirs=3 : but saving it to ddd by omitting
    first 3 folders aaa, bbb, ccc
  • -R index.html : excluding index.html
    files

Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list using golang?

Here you can find the algorithm for the wget --recursive implementation: https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html

Basically, you access the page and then parse the HTML and follow each href link (and css link if necessary), which can be extracted like this: https://vorozhko.net/get-all-links-from-html-page-with-go-lang.

Once you have all the links just do a request on them and based on the Content-Type header you save it if it is not text/html or parse it for links if it is.

Download files from url of site directory

You must keep in mind: if your web-server disallows to scan a directory then you couldn't get file names. But if the directory is shared in web-server you can get a list of files with using wget command. For example:

wget -nv -r -np '*.*' https://example.com/directory/

Download all CSV files from folders and subfolders with URL in R

How about this:

eg <- expand.grid(2012:2021, 
c("01", "02", "03", "04","05","06", "07", "08", "09", "10","11","12"))
eg$url <- paste("http://tiservice.hii.or.th/opendata/data_catalog/daily_rain/",
eg[,1],
"/",
paste(eg[,1],
eg[,2],
sep=""),
"/ABRT.csv",
sep="")
eg$dest <- paste(eg[,1], eg[,2], "ABRT.csv", sep="_")
for(i in 1:nrow(eg)){
curl::curl_download(eg$url[1], eg$dest[1])
}

Download all the files (.zip and .txt) from a webpage using R

Here try this. It works because the file url has a repeatable pattern. It's a tad clunky getting the file names out of the webpage but it does seem to work.

Many text files may be missing an End of Line marker (this is common) and may throw an error. However, it is probably not an important error. If that happens, open the txt file that downloaded to ensure that it downloaded correctly. No doubt there is a way to automate that step but I'm out of time for this one, Dude (or Dudette or whatever you prefer).

#get homepage for locations
page <- "https://pubs.usgs.gov/sir/2007/5107/downloads/"
a <- readLines(page)

#find lines of interest
loc.txt <- grep(".txt", a)
loc.zip <- grep(".zip", a)

#A convenience function that uses
#line from original page
#marker of file type to locate name
#and page (url original page)
#------------------------------------
convfn <- function(line, marker, page){
i <- unlist(gregexpr(pattern ='href="', line)) + 6
i2<- unlist(gregexpr(pattern =,marker, line)) + 3
#target file
.destfile <- substring(line, i[1], i2[1])
#target url
.url <- paste(page, .destfile, sep = "/")
#print targets
cat(.url, '\n', .destfile, '\n')
#the workhorse function
download.file(url=.url, destfile=.destfile)
}
#--------------------------------------------

#they will save in your working directory
#use setwd() to change if needed
print(getwd())

#get the .txt files and download them
sapply(a[loc.txt],
FUN = convfn,
marker = '.txt"', #this is key part, locates text file name
page = page)

#get the .zip files and download them
sapply(a[loc.zip],
FUN = convfn,
marker = '.zip"', #this is key part, locates zip file name
page = page)


Related Topics



Leave a reply



Submit