regex to remove the webpage part of a url in ruby
If your heart is set on using regex and you know that your URLs will be pretty straight forward you could use (.*)/.*
to capture everything before the last / in your URL.
irb(main):007:0> url = "www.example.com/home/index.html"
=> "www.example.com/home/index.html"
irb(main):008:0> regex = "(.*)/.*"
=> "(.*)/.*"
irb(main):009:0> url =~ /#{regex}/
=> 0
irb(main):010:0> $1
=> "www.example.com/home"
How do I remove a URL from a string in Ruby?
That seems to be working fine for a regular string:
my_str = "Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)
But, yours might have some garbage, non-printable, characters in it. Take, for instance, a random null character right before the first slash:
# vv - random null character
my_str = "Top (http:\0//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
# looks the same vv
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
Now, if you try and copy and paste the output from this null character from the website, it will still work:
# I copied this from the output from the line below `looks the same vv`
my_str = 'Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)'
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)
So it would end up looking like it works for us. So, you might try removing all non-printable characters and see if it works for you:
my_str = "Top (http:\0//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
my_str.gsub!(/[^[:print:]]/i, '')
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)
remove hostname and port from url using regular expression
To javascript you can use this code:
var URL = "http://localhost:7001/www.facebook.com";
var newURL = URL.replace (/^[a-z]{4,5}\:\/{2}[a-z]{1,}\:[0-9]{1,4}.(.*)/, '$1'); // http or https
alert (newURL);
Look at this code in action Here
Regards,
Victor
Ruby Regular expression to match a url
You can try this:
/https?:\/\/[\S]+/
The \S
means any non-whitespace character.
(Rubular)
Getting parts of a URL (Regex)
A single regex to parse and breakup a
full URL including query parameters
and anchors e.g.https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash
^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$
RexEx positions:
url: RegExp['$&'],
protocol:RegExp.$2,
host:RegExp.$3,
path:RegExp.$4,
file:RegExp.$6,
query:RegExp.$7,
hash:RegExp.$8
you could then further parse the host ('.' delimited) quite easily.
What I would do is use something like this:
/*
^(.*:)//([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$
*/
proto $1
host $2
port $3
the-rest $4
the further parse 'the rest' to be as specific as possible. Doing it in one regex is, well, a bit crazy.
Getting all links of a webpage using Ruby
why you dont use groups in your pattern?
e.g.
/http[s]?:\/\/(.+)/i
so the first group will already be the link you searched for.
Simple regex to replace first part of URL
You could use a regex like this:
(https?://)(.*?)(/.*)
Working demo
As you can see in the Substitution section, you can use capturing group and concatenates the strings you want to generate the needed urls.
The idea of the regex is to capture the string before and after the domain and use \1
+ staticpages
+ \3
.
If you want to change the protocol to ftp, you could play with capturing group index and use this replacement string:
ftp://\2\3
So, you would have:
ftp://localhost:3000/something
ftp://www.domainname.com/something
ftp://domainname.com/something
How to parse a URL and extract the required substring
I'd do it this way:
require 'uri'
uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"
URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.
Related Topics
How to Use 'Debugger' and 'Pry' When Developing a Gem? (Ruby)
How to Output Leading Zeros in Ruby
List of All/Best Gems for Ruby
Ruby/Watir/Rasta: Pass the Value from the Excel/Rasta to an Array in Ruby
Enter & Ioerror: Byte Oriented Read for Character Buffered Io
Definition of Method in Top Level
No Such File to Load -- Soap4R -- Why
Replicating Xml Request with Savon/Ruby
Ruby, Value Bucketing, Beautify Code
How to Marshal a Hash with Arrays
Slicing of Arrays in Ruby Returns Different Result - Nil VS. Empty Array