Fastest way to get IPv4 address from string
Since we are speaking about maximizing throughput of IP address parsing, I suggest using a vectorized solution.
Here is x86-specific fast solution (needs SSE4.1, or at least SSSE3 for poor):
__m128i shuffleTable[65536]; //can be reduced 256x times, see @IwillnotexistIdonotexist
UINT32 MyGetIP(const char *str) {
__m128i input = _mm_lddqu_si128((const __m128i*)str); //"192.167.1.3"
input = _mm_sub_epi8(input, _mm_set1_epi8('0')); //1 9 2 254 1 6 7 254 1 254 3 208 245 0 8 40
__m128i cmp = input; //...X...X.X.XX... (signs)
UINT32 mask = _mm_movemask_epi8(cmp); //6792 - magic index
__m128i shuf = shuffleTable[mask]; //10 -1 -1 -1 8 -1 -1 -1 6 5 4 -1 2 1 0 -1
__m128i arr = _mm_shuffle_epi8(input, shuf); //3 0 0 0 | 1 0 0 0 | 7 6 1 0 | 2 9 1 0
__m128i coeffs = _mm_set_epi8(0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1, 0, 100, 10, 1);
__m128i prod = _mm_maddubs_epi16(coeffs, arr); //3 0 | 1 0 | 67 100 | 92 100
prod = _mm_hadd_epi16(prod, prod); //3 | 1 | 167 | 192 | ? | ? | ? | ?
__m128i imm = _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 6, 4, 2, 0);
prod = _mm_shuffle_epi8(prod, imm); //3 1 167 192 0 0 0 0 0 0 0 0 0 0 0 0
return _mm_extract_epi32(prod, 0);
// return (UINT32(_mm_extract_epi16(prod, 1)) << 16) + UINT32(_mm_extract_epi16(prod, 0)); //no SSE 4.1
}
And here is the required precalculation for shuffleTable
:
void MyInit() {
memset(shuffleTable, -1, sizeof(shuffleTable));
int len[4];
for (len[0] = 1; len[0] <= 3; len[0]++)
for (len[1] = 1; len[1] <= 3; len[1]++)
for (len[2] = 1; len[2] <= 3; len[2]++)
for (len[3] = 1; len[3] <= 3; len[3]++) {
int slen = len[0] + len[1] + len[2] + len[3] + 4;
int rem = 16 - slen;
for (int rmask = 0; rmask < 1<<rem; rmask++) {
// { int rmask = (1<<rem)-1; //note: only maximal rmask is possible if strings are zero-padded
int mask = 0;
char shuf[16] = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};
int pos = 0;
for (int i = 0; i < 4; i++) {
for (int j = 0; j < len[i]; j++) {
shuf[(3-i) * 4 + (len[i]-1-j)] = pos;
pos++;
}
mask ^= (1<<pos);
pos++;
}
mask ^= (rmask<<slen);
_mm_store_si128(&shuffleTable[mask], _mm_loadu_si128((__m128i*)shuf));
}
}
}
Full code with testing is avaliable here. On Ivy Bridge processor it prints:
C0A70103
Time = 0.406 (1556701184)
Time = 3.133 (1556701184)
It means that the suggested solution is 7.8 times faster in terms of throughput than the code by OP. It processes 336 millions of addresses per second (single core of 3.4 Ghz).
Now I'll try to explain how it works. Note that on each line of the listing you can see contents of the value just computed. All the arrays are printed in little-endian order (though set
intrinsics use big-endian).
First of all, we load 16 bytes from unaligned address by lddqu
instruction. Note that in 64-bit mode memory is allocated by 16-byte chunks, so this works well automatically. On 32-bit it may theoretically cause issues with out of range access. Though I do not believe that it really can. The subsequent code would work properly regardless of the values in the after-the-end bytes. Anyway, you'd better ensure that each IP address takes at least 16 bytes of storage.
Then we subtract '0' from all the chars. After that '.' turns into -2, and zero turns into -48, all the digits remain nonnegative. Now we take bitmask of signs of all the bytes with _mm_movemask_epi8
.
Depending on the value of this mask, we fetch a nontrivial 16-byte shuffling mask from lookup table shuffleTable
. The table is quite large: 1Mb total. And it takes quite some time to precompute. However, it does not take precious space in CPU cache, because only 81 elements from this table are really used. That is because each part of IP address can be either one, two, three digits long => hence 81 variants in total.
Note that random trashy bytes after the end of the string may in principle cause increased memory footprint in the lookup table.
EDIT: you can find a version modified by @IwillnotexistIdonotexist in comments, which uses lookup table of only 4Kb size (it is a bit slower, though).
The ingenious _mm_shuffle_epi8
intrinsic allows us to reorder the bytes with our shuffle mask. As a result XMM register contains four 4-byte blocks, each block contains digits in little-endian order. We convert each block into a 16-bit number by _mm_maddubs_epi16
followed by _mm_hadd_epi16
. Then we reorder bytes of the register, so that the whole IP address occupies the lower 4 bytes.
Finally, we extract the lower 4 bytes from the XMM register to GP register. It is done with SSE4.1 intrinsic (_mm_extract_epi32
). If you don't have it, replace it with other line using _mm_extract_epi16
, but it will run a bit slower.
Finally, here is the generated assembly (MSVC2013), so that you can check that your compiler does not generate anything suspicious:
lddqu xmm1, XMMWORD PTR [rcx]
psubb xmm1, xmm6
pmovmskb ecx, xmm1
mov ecx, ecx //useless, see @PeterCordes and @IwillnotexistIdonotexist
add rcx, rcx //can be removed, see @EvgenyKluev
pshufb xmm1, XMMWORD PTR [r13+rcx*8]
movdqa xmm0, xmm8
pmaddubsw xmm0, xmm1
phaddw xmm0, xmm0
pshufb xmm0, xmm7
pextrd eax, xmm0, 0
P.S. If you are still reading it, be sure to check out comments =)
How to convert string to IP address and vice versa
Use inet_ntop()
and inet_pton()
if you need it other way around. Do not use inet_ntoa(), inet_aton()
and similar as they are deprecated and don't support ipv6.
Here is a nice guide with quite a few examples.
// IPv4 demo of inet_ntop() and inet_pton()
struct sockaddr_in sa;
char str[INET_ADDRSTRLEN];
// store this IP address in sa:
inet_pton(AF_INET, "192.0.2.33", &(sa.sin_addr));
// now get it back and print it
inet_ntop(AF_INET, &(sa.sin_addr), str, INET_ADDRSTRLEN);
printf("%s\n", str); // prints "192.0.2.33"
Fastest way to get numerical value of the request's IP Address in ASP.NET
HttpContext does not seem to be doing any more magic than what you already see: a string value in HttpRequest.UserHostAddress
Some background info:
HttpContext.Current.Request
is of type System.Web.HttpRequest
which takes a System.Web.HttpWorkerRequest
as parameter when instantiated.
The HttpWorkerRequest
is an abstract class instantiated by hosting implementations like, in case of IIS, System.Web.Hosting.IIS7WorkerRequest
which then implements the abstract method GetRemoteAddress()
of HttpWorkerRequest
which is internally used by HttpRequest.UserHostAddress
.
IIS7HttpWorkerRequest
knows that REMOTE_ADDR
is the IIS property it needs to read and, after going through a few more layers of abstraction while passing around the request context, it all finally ends in calling MgdGetServerVariableW(IntPtr pHandler, string pszVarName, out IntPtr ppBuffer, out int pcchBufferSize);
in webengine.dll which simply writes a string of length pcchBufferSize
into ppBuffer
containing the same stuff you get from HttpRequest.UserHostAddress
.
Since i doubt that there are other parts in the HttpContext that get fed request-sender related information, i'm assuming you'll have to keep doing your own magic for conversion for which there are plenty of ideas in the link i posted in the comments.
Using python, what's the fastest way to see which of around one million ip addresses fit into three cidrs?
If you're on Python 3.3 or higher, a decent solution is to use the ipaddress
module. Convert your CIDRs to network objects with ipaddress.ip_network
up front, then convert your addresses to address objects (with ipaddress.ip_address
if they might be IPv4 or IPv6, or just ipaddress.IPv4Address
/ipaddress.IPv6Address
directly if they are of known type (skips a layer of wrapping).
You can test for membership relatively cheaply with the in
operator, e.g. if you stored your networks in a sequence (e.g. list
/tuple
) you could do:
for address in map(ipaddress.ip_address, stream_of_string_addresses):
if any(address in network for network in networks):
... got a match ...
There are more efficient solutions (particularly if you're talking about many networks, not just three), but this is straightforward, relatively memory efficient, and leaves you with a useful object (not just the raw address string) for further processing.
Getting IPV4 address as a String
What would happen if someone passed in nil data?
+ (NSString *)getStringFromAddressData:(NSData *)dataIn {
if (dataIn != nil) {
struct sockaddr_in *socketAddress = nil;
socketAddress = (struct sockaddr_in *)[dataIn bytes];
NSString *ipString = [NSString stringWithFormat: @"%s", inet_ntoa(socketAddress->sin_addr)];
return ipString;
}
return @"";
}
How to extract an IP address from an HTML string?
Remove your capturing group:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
Result:
['165.91.15.131']
Notes:
- If you are parsing HTML it might be a good idea to look at BeautifulSoup.
- Your regular expression matches some invalid IP addresses such as
0.00.999.9999
. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the+
to{1,3}
for a partial fix without making the regular expression overly complex.
how to get ip address as a string
ip
is not initialized.
char *ip = malloc(20);
ip[0] = 0;
You should check the result of malloc
to avoid dereferencing the NULL pointer
Related Topics
Return Statement VS Exit() in Main()
Why Can't I Initialize Non-Const Static Member or Static Array in Class
Is There a Replacement For Unistd.H For Windows (Visual C)
Detect If Stdin Is a Terminal or Pipe
How to Use the Qt'S Pimpl Idiom
How to Make a .Lib File When Have a .Dll File and a Header File
Why Can't Variable Names Start With Numbers
Create Random Number Sequence With No Repeats
Selectively Disable Gcc Warnings For Only Part of a Translation Unit
When Can Outer Braces Be Omitted in an Initializer List
Why Are C++ Inline Functions in the Header
Difference Between Conversion Specifiers %I and %D in Formatted Io Functions (*Printf/*Scanf)
Why Is a Pure Virtual Function Initialized by 0
Will Using Goto Leak Variables
How Is Std::Function Implemented