Why Is "" Being Injected into My HTML

Why is being injected into my HTML?

It is an issue in the source. The live example that you provided starts with the following bytes (i.e., they appear before <!DOCTYPE html>): 0xE2 0x80 0x8B. This can be seen e.g. using Rex Swain’s HTTP Viewer by selecting “Hex” under “Display Format”. Also note that validating the page with the W3C Markup Validator gives information that suggests that there is something very wrong at the start of the document, especially the message “Line 1, Column 1: Non-space characters found without seeing a doctype first.”

What happens in the validator and in the Chrome tools – as well as e.g. in Firebug – is that the bytes 0xE2 0x80 0x8B are taken as character data, which implicitly starts the body element (since character data cannot validly appear in the head element or before it), implying an empty head element before it.

The solution, of course, is to remove those bytes. Browsers usually ignore them, but you should not rely on such error handling, and the bytes prevent useful HTML validation. How you remove them, and how they got there in the first place, depends on your authoring environment.

Since the page is declared (in HTTP headers) as being UTF-8 encoded, those bytes represent the ZERO WIDTH SPACE (U+200B) character. It has no visible glyph and no width, so you won’t notice anything in the visual presentation even though browsers treat it as being data at the start of the body element. The notation is a character reference for it, presumably used by browser tools to indicate the presence of a normally invisible character.

It is possible that the software that produced the HTML document was meant to insert ZERO WIDTH NO-BREAK SPACE (U+FEFF) instead. That would have been valid, since by a special convention, UTF-8 encoded data may start with this character, also known as byte order mark (BOM) when appearing at the start of data. Using U+200B instead of U+FEFF sounds like an error that software is unlikely to make, but human beings may be mistaken that way if they think of the Unicode names of the characters.

When injecting html into the DOM, why are js files not obtained?

I am going to quote some of what @alohci's comment said.

If it's via innerHTML or similar, then contained scripts don't get fetched or run.

This turns out to be true for what I have been experiencing. Updating the DOM is really only for injecting HTML real-time. The way I was trying to update the DOM with HTML that contained script tags and expecting those scripts to get loaded was indeed the wrong path to go down. Doing something like this turns out to be a lot more complicated than meets the eye, due to chrome trying to prevent XSS of course.
I am marking this as the answer, because it explains why I was having the issues I was.
I am not looking for a fix anymore, but if anyone reading this comes across a solution, please post the solution.
Thanks to all who replied.

How do I prevent CSS interference in an injected piece of HTML?

You can't prevent that from happen. However, you can override the CSS rules. Give your main element a unique id (which really should be unique by obfustation, like "yourapplicationname_mainelement_name" or something), then override all possible styles that might give strange effects on your html.

Your plugin:

<div id="yourapplicationname_mainelement_name">
  <p>My paragraph that must not be styled</p>
</div>

Your css:

#yourapplicationname_mainelement_name p {
  display: block;
  color: black;
  background: white;
  position: relative;
  ... and so on ...
}

As your css style rules are the most specific, given your id, they will override any settings present on the page where your html is injected.

Further... It might be hard to see what rules are the most important. You can use firebug or similar to understand which is overriding another. You'll have a hard time without it when developing your application.

Security - All Possible Causes for HTML Injection

This is not “HTML injection” as that term is normally used. That normally refers to vulnerabilities where a web application writes user input in an HTML template without escaping characters like < to <. (In PHP, this is a common problem caused by forgetting to call htmlspecialchars() in the output template, and the language not doing it automatically for you.)

However if someone has changed the contents of an existing .php file on the server you have a much worse compromise than a simple HTML injection XSS. To be able to write to the file, the attacker must have either gained access to your administative account, or exploited a file-write vulnerability in the web app and the app is running as a user that has write access to its own .php files. (Usually you should run a web app as a low-privilege user with no permission to write to app files. But in some sad shared hosting scenarios, one user may be all you get for everything.)

Regardless of what your host said, the number one cause of this kind of compromise today is, by far, FTP credential loss. The usual pattern is that a machine that is used to access the FTP server has been infected with password-stealing malware, typically as a result of visiting an infected page on another web site (which itself is often hacked).

So definitely virus checkers on all machines that have access the FTP server; use multiple AVs, and do an OS reinstall on any infected machines you find, as AVs are bad at detection and worse at removal. Then change the password of the account and stop using unencrypted FTP for god's sake it's 2013. If your host refuse to provide SFTP access then this along with the poor advice above would be good reason to dump them.

File uploading is a part of my site. Could they have done some sort of buffer attack and made my site interpret a file as PHP commands?

It's conceivable; file upload is a very difficult function to get right in general. If the web site is running as a user that has write access to any location from which your web server will run .php files, then any file-write vuln escalates to an execute-abitrary-code vuln. So audit what bits of the filesystem have write access for the web server user, and lock it down as much as possible, in addition to performing strong whitelist validation on any variables used for filenames in your app. (Best: never use any client-supplied data to form a filename.)

My injected script runs after the target-page's javascript, despite using run_at: document_start?

Injected scripts, with src attributes are executed asynchronously in Chrome.

Consider this HTML file:

<html><head>
    <title>Chrome early script inject, Test Page</title>
    <script>
        var s = document.createElement ("script");
        s.src = "localInject1.js";
        document.documentElement.appendChild (s);
    </script>
</head>
<body>
<p>See the console.</p>
<script src="localInject2.js"></script>
<script> console.log("In page: ", new Date().getTime()); </script>
</body>
</html>

where localInject1.js is just:

console.log("In page's script 1: ", new Date().getTime());

and localInject2.js is:

console.log("In page's script 2: ", new Date().getTime());

Nominally, the console should show:

In page's script 1:   ...
In page's script 2:   ...
In page:   ...

But frequently, and especially on cache-less reloads, you'll see:

In page's script 2:   ...
In page:   ...
In page's script 1:   ...

Setting s.async = false; makes no difference.

I'm not sure if this is a bug or not. Firefox also gets the scripts out of order, but seems more consistent about it. There don't seem to be any relevant Chrome bugs, and the specs are unclear to me.

Work around:

Scripts with code set by textContent, rather than a file linked by src, execute immediately as you would expect.

For example, if you changed injector.js to:

var s = document.createElement ("script");
s.src = chrome.extension.getURL ("inject.js");
document.documentElement.appendChild (s);

var s = document.createElement ('script');
s.textContent = 'console.log ("Text runs correctly!", new Date().getTime() );';
document.documentElement.appendChild (s);

console.log("Inject finished", new Date().getTime() );

you would see:

Text runs correctly! ...
Inject finished ...
In page
Inside inject.js

Admittedly this can be a pain in a content script, but it may be all you can do (in Chrome) until this issue is resolved by the W3C and browser vendors.

Preventing HTML and Script injections in Javascript

You can encode the < and > to their HTML equivelant.

html = html.replace(/</g, "<").replace(/>/g, ">");

How to display HTML tags as plain text

Why Is "" Being Injected into My HTML