Why Can't Python Sockets Resolve Url's with Http in It

Why can't python sockets resolve url's with http in it

For this to be answered, you need to understand how the TCP/IP stack works. BTW, I'll just ignore the OSI model because it is mostly useless in most real-world situations.

At the bottom of all, we have the protocols used to transfer bits and bytes between physical/wireless links. This includes things such as Ethernet, 802.11, MAC stuff, etc... It's a direct comunication from one machine directly connected to another machine. Period.

Then, over that, we have one of the gems of protocol design, the Internet Protocol. Unlike what you may think, it's a (relatively) very simple protocol. It's purpose is to define the concepts of hosts, addresses, routing, quality of service, and a few other things. It's very minimalistic in perspective. Thanks to the IP, one machine can indirectly connect to another through a whole arrangement of networks and gateways (that usually means routers).

The IP by itself, however, has certain pitfalls. Namely...

  • There's no concept of ports.
  • Everything must be represented by internet addresses, that is, there are no thingslikethis.com, also known as domains.
  • The IP is unreliable, meaning that packets can get lost (with no notification whatsoever), duplicated, corrupted, etc... This does happen in modern networks. There's no concept of "connection" whatsoever. Just packets, period.

So, to solve (most) of these problems, comes a not-so-bright gem in protocol design, the Transmission Control Protocol. Note: Ignored UDP for straightforwardness' sake. The TCP's purpose is to allow a reliable and stateful connection to be established over an unreliable routing protocol, usually the IP. To do so, it adds a considerable overhead to the packets, which is sometimes undesirable. It also has some nice extra features, such as ports. The idea is that a port represents a "service" or "application" that runs inside a host, along with other applications. This is a primordial concept of multitasking systems. A pair made up of a host's address and a port is referred to as a "socket". A pair of sockets, one in host A pointing to host B, and another in host B pointing to host A, upon three-way handshaking, is called a connection. Thanks to the TCP, we can now say things such as 192.168.1.123:8080, send data there, and be confident that the data either never reaches the destination, or reaches it successfully and correctly. However, still no domains!

Enters the Domain Name System. It defines a hierarchical structure of "domains", symbolical names representing either a host or another hierarchical structure of the same kind. For instance, we have the top-level domain com, and its subdomain google, which happens to refer to 201.191.202.174. We refer to domains in reverse-order notation, separated by dots, in the style of google.com. With the IP plus the TCP plus the DNS, we can know say things such as google.com:21, and get a reliable connection to it. Hurray!

It's now worth noting that when Python talks about "sockets", like most libraries/languages/operating systems, it's talking about sockets in the sense of the TCP. And, as we already know, TCP can only handle things of the style 192.168.1.123:8080. However, Python's socket.socket.connect, though mostly a wrapper around it, gives you a little abstraction over C/POSIX's connect(3), and it is that it performs that appropriate dances with the DNS if a hostname is provided instead of an actual address. Nonetheless, the abstraction ends there.

However, what about funky things such as https://qwe.rty.uio/asd/fgh.html? To solve this, enters one of the most complex parts of the equation. Understood by none but glorified by all, the Hypertext Transfer Protocol. HTTP is somewhat of a vagely multipurpose protocol. On its basis, it defines Uniform Resource Identifiers, which are most of the time Uniform Resource Locators. This allows you to use the slash (/) after a domain in order to address a "resource" inside it, such as an image or webpage. URIs also define the way the resource is accessed (http:// means "through the HTTP", https:// means "through the HTTPS", ftp:// means "through the FTP", etc...). HTTP adds a uncountable amount of extra funky things that are necessary for the World Wide Web (often incorrectly called "the Internet") to work the way it does, such as sessions, authentication, encryption, status codes, caching, proxies, file downloads, etc...

tl;dr: Python's socket library is a thin wrapper around C's that happens to add a DNS resolution vanguard mechanism. Excluding this, it works with vanilla TCP concepts.

Why can't python socket resolve domain name here?

Your code has several mistakes:

You're using a UDP socket primitive sendto() on an IP socket.
Also, the format of the address parameter is incorrect, it usually is a tuple (hostname, port).

So, either create a proper UDP socket:

mysocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
mysocket.sendto(b'data', ('hostname', 9999))

or use connect + send/sendall instead:

mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysocket.connect(('hostname', 9999))
mysocket.sendall(b'data')

How can I fix 301 error using python sockets

301 HTTP code is Moved Permanently not a error. For this resources, you need use with another link (current is /, it define in first line GET / HTTP). It define in your response http_response .

Why am I getting the error connection refused in Python? (Sockets)

Instead of

host = socket.gethostname() #Get the local machine name
port = 12397 # Reserve a port for your service
s.bind((host,port)) #Bind to the port

you should try

port = 12397 # Reserve a port for your service
s.bind(('', port)) #Bind to the port

so that the listening socket isn't too restricted. Maybe otherwise the listening only occurs on one interface which, in turn, isn't related with the local network.

One example could be that it only listens to 127.0.0.1, which makes connecting from a different host impossible.

Error when accessing incorrect url in python requests

It's likely because the hostname can't be resolved

Alternative could use try except block here

import requests
url = "http://sdfsdfasdfsfsdf.com"
try:
res = requests.get(url)
if res.status_code == 200:
print("ok")
else:
print('wrong')
except:
print('wrong')

python socket http library post request not working when data is passed

I have no idea where you've got your knowledge of HTTP from. But neither your GET nor your POST request are proper requests according to the HTTP standard. For GET you are adding an additional \r\n at the end (i.e. GET / HTTP/1.0\r\nHost: example.com\r\n\r\n\r\n - the last \r\n being too much) and for POST you do the same and also add some invalid data (TCRLF) after the request body.

This additional \r\n you add after the request header is counted against the Content-length you gave which means that the HTTP body as read by the server is two bytes shorter than the body you've intended. Together with the TRCLF you send after the body the actual HTTP body you send is 6 bytes longer than the value you give in the Content-Length header.

Please refer to the HTTP standard for how HTTP should look like - that's what standards are for. Do not try to get a proper understanding of how HTTP works by just looking at traffic or code samples you'll find on the internet. Specifically simple code samples are often wrong or incomplete and might work with some servers or in some use cases but not in general.

Anytime I pass data to the post, it doesn't get a response.

I cannot reproduce the claim that no response at all is returned. When taking your unmodified code I actually get a response from the server which includes among others the servers view of your response body:

...    
"data": "\r\n{'Assignment': ",
...

This is your intended body prefixed with your wrong \r\n and shortened by two bytes to match the claimed Content-length. When modifying your code to do proper requests without the wrong \r\n on various places I get the expected:

...
"data": "{'Assignment': 1}",
...

Python3 Failed to establish connection socket.gaierror: Name or service not known

host='www.miet.ac.in%0a', port=80

The problem is with your string interpolation

socket.error: [Errno 48] Address already in use

You already have a process bound to the default port (8000). If you already ran the same module before, it is most likely that process still bound to the port. Try and locate the other process first:

$ ps -fA | grep python
501 81651 12648 0 9:53PM ttys000 0:00.16 python -m SimpleHTTPServer

The command arguments are included, so you can spot the one running SimpleHTTPServer if more than one python process is active. You may want to test if http://localhost:8000/ still shows a directory listing for local files.

The second number is the process number; stop the server by sending it a signal:

kill 81651

This sends a standard SIGTERM signal; if the process is unresponsive you may have to resort to tougher methods like sending a SIGKILL (kill -s KILL <pid> or kill -9 <pid>) signal instead. See Wikipedia for more details.

Alternatively, run the server on a different port, by specifying the alternative port on the command line:

$ python -m SimpleHTTPServer 8910
Serving HTTP on 0.0.0.0 port 8910 ...

then access the server as http://localhost:8910; where 8910 can be any number from 1024 and up, provided the port is not already taken.

Error trying to connect to socket

Turns out that @t.m.adam was right.

The code works fine except that TRUE should be True.

It could have been a host or server problem earlier. I shall delete the question if others can get the same output below:

C:\Users\Kane\Desktop>python networking.py
HTTP/1.1 200 OK
Date: Sun, 17 Sep 2017 00:12:07 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "1d3-54f6609240717"
Accept-Ranges: bytes
Content-Length: 467
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain


Related Topics



Leave a reply



Submit