C++ Remove Punctuation from String

How to remove punctuation from a String in C

Just a sketch of an algorithm using functions provided by ctype.h:

#include <ctype.h>

void remove_punct_and_make_lower_case(char *p)
{
char *src = p, *dst = p;

while (*src)
{
if (ispunct((unsigned char)*src))
{
/* Skip this character */
src++;
}
else if (isupper((unsigned char)*src))
{
/* Make it lowercase */
*dst++ = tolower((unsigned char)*src);
src++;
}
else if (src == dst)
{
/* Increment both pointers without copying */
src++;
dst++;
}
else
{
/* Copy character */
*dst++ = *src++;
}
}

*dst = 0;
}

Standard caveats apply: Completely untested; refinements and optimizations left as exercise to the reader.

Remove punctuation at beginning and end of a string

char *rm_punct(char *str) {
char *h = str;
char *t = str + strlen(str) - 1;
while (ispunct(*p)) p++;
while (ispunct(*t) && p < t) { *t = 0; t--; }
/* also if you want to preserve the original address */
{ int i;
for (i = 0; i <= t - p + 1; i++) {
str[i] = p[i];
} p = str; } /* --- */

return p;
}

Remove punctuation from characters in C

I updated your chip_punct function to give the desired result.

  1. I call isalnum to check for alphanumeric - letters and numbers. isalpha checks only for letters.
  2. I use a flag addSpace to remember if last char was not a alphanumeric. Then, before adding a char, I check if a space needs to be added.
  3. I remember the first ch position with ch0 to avoid adding a space at the beginning.
void chip_punct(char *ch) {
int addSpace = 0; // Flag if need to add space
char *ch0 = ch; // Remember first position
for (char *p = ch; *p; ++p)
{
if (isalnum(*p)) // Check if char is alphanumeric
{
if (addSpace && ch > ch0) // Check if need to add space
*ch++ = ' ';
*ch++ = *p;
addSpace = 0;
}
else
addSpace = 1;
}
*ch = '\0';
}

Output:

1 g7 h 8 zs892 fd sa

The lack of a space between zs89 and 2 from the next line is because the main prints them with no space. You can change that by adding a space: printf("%s ", ch);

Most efficient way to remove punctuation marks from string in c++

No string copies. No heap allocation. No heap deallocation.

void strip_punct(string& inp)
{
auto to = begin(inp);
for (auto from : inp)
if (!ispunct(from))
*to++ = from;
inp.resize(distance(begin(inp), to));
}

Comparing to:

void strip_punct_re(string& inp)
{
inp.erase(remove_if(begin(inp), end(inp), ispunct), end(inp));
}

I created a variety of workloads. As a baseline input, I created a string containing all char values between 32 and 127. I appended this string num-times to create my test string. I called both strip_punct and strip_punct_re with a copy of the test string iters-times. I performed these workloads 10 times timing each test. I averaged the timings after dropping the lowest and highest results. I tested using release builds (optimized) from VS2015 on Windows 10 on a Microsoft Surface Book 4 (Skylake). I SetPriorityClass() for the process to HIGH_PRIORITY_CLASS and timed the results using QueryPerformanceFrequency/QueryPerformanceCounter. All timings were performed without a debugger attached.

 num        iters      seconds      seconds (re)    improvement
10000 1000 2.812 2.947 4.78%
1000 10000 2.786 2.977 6.85%
100 100000 2.809 2.952 5.09%

By varying num and iters while keeping the number of processed bytes the same, I was able to see that the cost is primarily influenced by the number of bytes processed rather than per-call overhead. Reading the disassembly confirmed this.

So this version, is ~5% faster and generates 30% of the code.



Related Topics



Leave a reply



Submit