How to remove punctuation from a String in C
Just a sketch of an algorithm using functions provided by ctype.h
:
#include <ctype.h>
void remove_punct_and_make_lower_case(char *p)
{
char *src = p, *dst = p;
while (*src)
{
if (ispunct((unsigned char)*src))
{
/* Skip this character */
src++;
}
else if (isupper((unsigned char)*src))
{
/* Make it lowercase */
*dst++ = tolower((unsigned char)*src);
src++;
}
else if (src == dst)
{
/* Increment both pointers without copying */
src++;
dst++;
}
else
{
/* Copy character */
*dst++ = *src++;
}
}
*dst = 0;
}
Standard caveats apply: Completely untested; refinements and optimizations left as exercise to the reader.
Remove punctuation at beginning and end of a string
char *rm_punct(char *str) {
char *h = str;
char *t = str + strlen(str) - 1;
while (ispunct(*p)) p++;
while (ispunct(*t) && p < t) { *t = 0; t--; }
/* also if you want to preserve the original address */
{ int i;
for (i = 0; i <= t - p + 1; i++) {
str[i] = p[i];
} p = str; } /* --- */
return p;
}
Remove punctuation from characters in C
I updated your chip_punct
function to give the desired result.
- I call
isalnum
to check for alphanumeric - letters and numbers.isalpha
checks only for letters. - I use a flag
addSpace
to remember if last char was not a alphanumeric. Then, before adding a char, I check if a space needs to be added. - I remember the first
ch
position withch0
to avoid adding a space at the beginning.
void chip_punct(char *ch) {
int addSpace = 0; // Flag if need to add space
char *ch0 = ch; // Remember first position
for (char *p = ch; *p; ++p)
{
if (isalnum(*p)) // Check if char is alphanumeric
{
if (addSpace && ch > ch0) // Check if need to add space
*ch++ = ' ';
*ch++ = *p;
addSpace = 0;
}
else
addSpace = 1;
}
*ch = '\0';
}
Output:
1 g7 h 8 zs892 fd sa
The lack of a space between zs89
and 2
from the next line is because the main
prints them with no space. You can change that by adding a space: printf("%s ", ch);
Most efficient way to remove punctuation marks from string in c++
No string copies. No heap allocation. No heap deallocation.
void strip_punct(string& inp)
{
auto to = begin(inp);
for (auto from : inp)
if (!ispunct(from))
*to++ = from;
inp.resize(distance(begin(inp), to));
}
Comparing to:
void strip_punct_re(string& inp)
{
inp.erase(remove_if(begin(inp), end(inp), ispunct), end(inp));
}
I created a variety of workloads. As a baseline input, I created a string containing all char values between 32 and 127. I appended this string num
-times to create my test string. I called both strip_punct
and strip_punct_re
with a copy of the test string iters
-times. I performed these workloads 10 times timing each test. I averaged the timings after dropping the lowest and highest results. I tested using release builds (optimized) from VS2015 on Windows 10 on a Microsoft Surface Book 4 (Skylake). I SetPriorityClass()
for the process to HIGH_PRIORITY_CLASS
and timed the results using QueryPerformanceFrequency/QueryPerformanceCounter
. All timings were performed without a debugger attached.
num iters seconds seconds (re) improvement
10000 1000 2.812 2.947 4.78%
1000 10000 2.786 2.977 6.85%
100 100000 2.809 2.952 5.09%
By varying num and iters while keeping the number of processed bytes the same, I was able to see that the cost is primarily influenced by the number of bytes processed rather than per-call overhead. Reading the disassembly confirmed this.
So this version, is ~5% faster and generates 30% of the code.
Related Topics
Is #Pragma Once Part of the C++11 Standard
Common Reasons for Bugs in Release Version Not Present in Debug Mode
How to Use Formatmessage() Properly in C++
Programmatically Reading a Web Page
How to Create a Utf-8 String Literal in Visual C++ 2008
What Is the Underlying Data Structure of a Stl Set in C++
Lvalue to Rvalue Implicit Conversion
Replacing Ld with Gold - Any Experience
Capturing Perfectly-Forwarded Variable in Lambda
Are C++ Templates Just MACros in Disguise
How to Give Priority to Privileged Thread in Mutex Locking
Why Does the Enhanced Gcc 6 Optimizer Break Practical C++ Code
C++ Reading the Data Part of a Wav File
When Does Opengl Get Finished with Pointers in Functions
What's the Precedence of Comma Operator Inside Conditional Operator in C++