PHP to clean-up pasted Microsoft input
HTML Purifier will create standards compliant markup and filter out many possible attacks (such as XSS).
For faster cleanups that don't require XSS filtering, I use the PECL extension Tidy which is a binding for the Tidy HTML utility.
If those don't help you, I suggest you switch to FCKEditor which has this feature built-in.
Remove MS Word HTML using PHP
http://htmlpurifier.org/
This will do what you want.
How to clean up garbage text from string using PHP?
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.
formatted PHP code in Microsoft Word
i know one way.
open this page: http://qbnz.com/highlighter/demo.php
the above link is the php syntax highlighter on web,
(1) copy and paste your php code to the text area labelled 'Input via a text field:'
(2) go to the 'Options' selectbox below that text area, and choose 'Line numbers: none'
(3) click the 'Highlight!' button at the bottom of the page
(4) the highlighted php code will be shown
(5) select, copy this highlighted code, and paste it into Word. u will see the colored code in your Word document
those' are the detailed steps
hope this may help
php regular expression removing mso tag
Code :
$html = "<p style='mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; padding: 4px;' class=MsoNormal>text</P>";
$cleanHtml = preg_replace('(mso-[a-z\-: ]+; )i', '', $html);
echo $cleanHtml;
Output :
<P style='padding: 4px;' class=MsoNormal>text</P>
Clean Microsoft Word Pasted Text using JavaScript
Here is the function I wound up writing that does the job fairly well (as far as I can tell anyway).
I am certainly open for improvement suggestions if anyone has any. Thanks.
function cleanWordPaste( in_word_text ) {
var tmp = document.createElement("DIV");
tmp.innerHTML = in_word_text;
var newString = tmp.textContent||tmp.innerText;
// this next piece converts line breaks into break tags
// and removes the seemingly endless crap code
newString = newString.replace(/\n\n/g, "<br />").replace(/.*<!--.*-->/g,"");
// this next piece removes any break tags (up to 10) at beginning
for ( i=0; i<10; i++ ) {
if ( newString.substr(0,6)=="<br />" ) {
newString = newString.replace("<br />", "");
}
}
return newString;
}
Hope this is helpful to some of you.
how to separate data pasted from excel to textarea
you have to check this question:
Parse form textarea by comma or new line
using that in your code:
<?php
if(isset($_POST['url']))
{
$input = $_POST['url'];
$data = preg_split("/[\r\n]+/", $input, -1, PREG_SPLIT_NO_EMPTY);
var_dump($data);
}
?>
$data array will have the required data
Related Topics
How to Password-Protect PHP Page
Relative Path in Require_Once Doesn't Work
How to Use Break or Continue Within for Loop in Twig Template
"Adaptive Server Is Unavailable or Does Not Exist" Error Connecting to SQL Server from PHP
Php: Setting a Timezone by Utc Offset
Change Foreign Characters to Their Roman Equivalent
PHP Post_Max_Size VS Upload_Max_Filesize, What Is the Difference
Function Eregi() Is Deprecated
Fatal Error - 'Mongo' Class Not Found
Calculate Total Seconds in PHP Dateinterval
Getting Static Property from a Class with Dynamic Class Name in PHP