Getting the source HTML of the current page from chrome extension
Inject a script into the page you want to get the source from and message it back to the popup....
manifest.json
{
"name": "Get pages source",
"version": "1.0",
"manifest_version": 2,
"description": "Get pages source from a popup",
"browser_action": {
"default_icon": "icon.png",
"default_popup": "popup.html"
},
"permissions": ["tabs", "<all_urls>"]
}
popup.html
<!DOCTYPE html>
<html style=''>
<head>
<script src='popup.js'></script>
</head>
<body style="width:400px;">
<div id='message'>Injecting Script....</div>
</body>
</html>
popup.js
chrome.runtime.onMessage.addListener(function(request, sender) {
if (request.action == "getSource") {
message.innerText = request.source;
}
});
function onWindowLoad() {
var message = document.querySelector('#message');
chrome.tabs.executeScript(null, {
file: "getPagesSource.js"
}, function() {
// If you try and inject into an extensions page or the webstore/NTP you'll get an error
if (chrome.runtime.lastError) {
message.innerText = 'There was an error injecting script : \n' + chrome.runtime.lastError.message;
}
});
}
window.onload = onWindowLoad;
getPagesSource.js
// @author Rob W <http://stackoverflow.com/users/938089/rob-w>
// Demo: var serialized_html = DOMtoString(document);
function DOMtoString(document_root) {
var html = '',
node = document_root.firstChild;
while (node) {
switch (node.nodeType) {
case Node.ELEMENT_NODE:
html += node.outerHTML;
break;
case Node.TEXT_NODE:
html += node.nodeValue;
break;
case Node.CDATA_SECTION_NODE:
html += '<![CDATA[' + node.nodeValue + ']]>';
break;
case Node.COMMENT_NODE:
html += '<!--' + node.nodeValue + '-->';
break;
case Node.DOCUMENT_TYPE_NODE:
// (X)HTML documents are identified by public identifiers
html += "<!DOCTYPE " + node.name + (node.publicId ? ' PUBLIC "' + node.publicId + '"' : '') + (!node.publicId && node.systemId ? ' SYSTEM' : '') + (node.systemId ? ' "' + node.systemId + '"' : '') + '>\n';
break;
}
node = node.nextSibling;
}
return html;
}
chrome.runtime.sendMessage({
action: "getSource",
source: DOMtoString(document)
});
chrome extension: Getting the source HTML of the current page on page load
It is about the permissions.
Your example a bit insufficient, but as I can see you are using "activeTab" permission.
According to the activeTab docs, the extension will get access (e.g. sources) to current tab after any of these actions will be performed:
- Executing a browser action
- Executing a page action
- Executing a context menu item
- Executing a keyboard shortcut from the commands API
- Accepting a suggestion from the omnibox API
That's why you can get sources after opening the popup.
In order to get access to tabs without those actions, you need to ask for the following permissions:
tabs
<all_urls>
Be noted, it allows you to run content-script on every tab, not only the active one.
Here is the simplest example:
manifest.json
{
"name": "Getting Started Example",
"version": "1.0",
"description": "Build an Extension!",
"permissions": ["tabs", "<all_urls>"],
"background": {
"scripts": ["background.js"],
"persistent": false
},
"manifest_version": 2
}
background.js
chrome.tabs.onUpdated.addListener(function (tabId, info) {
if(info.status === 'complete') {
chrome.tabs.executeScript({
code: "document.documentElement.innerHTML" // or 'file: "getPagesSource.js"'
}, function(result) {
if (chrome.runtime.lastError) {
console.error(chrome.runtime.lastError.message);
} else {
console.log(result)
}
});
}
});
chrome extension get source html not currentPage
Use XMLHttpRequest to download whatever the server responds with when the url is accessed. On some sites it could be a minipage with script loader that would later render the page in case it were loaded by the browser normally.
To get a fully rendered source or DOM tree of an arbitrary url you'll have to load it in a tab first. To make the process less distracting for the user load it in a pinned tab:
chrome.tabs.create({url: "https://google.com", pinned: true}, function(tab) {
.... wait for the tab to load, get the source
});(the simplest form of waiting that doesn't require any additional permissions would be periodic checking of
tab.status == "complete"
invoked from the above callback, otherwise use webNavigation.onCompleted for example or inject a content script with the run-of-the-mill "DOMContentLoaded" or "load" event handlers).Or load the page in an
IFRAME
but some sites forbid the browser to do it.
Chrome extension : get source code of active tab
Your manifest has both "content_scripts"
(which run in the context of the page on document_idle
) and "browser_action"
scripts (which run in an isolated context when the extensions menu button is clicked).
In popup.html
you reference popup.js
, so in popup.js
when you call document.documentElement.outerHTML
you're getting the content of popup.html
, not the active tab.
You reference both popup.js
and popup1.js
, which is confusing. You're currently running the same code in both the popup and the page context, which is almost guaranteed to break in one or the other. By convention use content.js
in "content_scripts"
and reference popup.js
in the action popup.html
.
"content_scripts"
run in every page, whether users click on the extension or not. Your current manifest is adding ["popup1.js","jquery-1.10.2.js","jquery-ui.js","bootstrap.min.js"]
to every page, which is needlessly slow.
Avoid using jQuery in Chrome extensions. It's fairly large and a browser standardisation library doesn't add much when you know for absolute certain that all your users are on Chrome. If you can't code without it then try to restrict it to just your popup or load it in dynamically.
You set a "scripts": [ "background.js"]
, which runs constantly in the background and isn't needed at all in your current code. If you need to do things outside of the action button consider using event pages instead.
Use the Chrome API to get from the context of the popup to the page. You need to query chrome.tabs
to get the active tab, and then call chrome.tabs.executeScript
to execute script in the context of that tab.
Google's API uses callbacks, but in this example I'm going to use chrome-extension-async
to allow use of promises (there are other libraries that do this too).
In popup.html
(assuming you use bower install chrome-extension-async
):
<!doctype html>
<html>
<head>
<script type="text/javascript" src="bower_components/chrome-extension-async/chrome-extension-async.js"></script>
<script type="text/javascript" src="popup.js"></script>
</head>
<body style="width: 600px; height: 300px;">
<button value="Test" id="check-1"> </button>
</body>
</html>
In popup.js
(discard popup1.js
):
function scrapeThePage() {
// Keep this function isolated - it can only call methods you set up in content scripts
var htmlCode = document.documentElement.outerHTML;
return htmlCode;
}
document.addEventListener('DOMContentLoaded', () => {
// Hook up #check-1 button in popup.html
const fbshare = document.querySelector('#check-1');
fbshare.addEventListener('click', async () => {
// Get the active tab
const tabs = await chrome.tabs.query({ active: true, currentWindow: true });
const tab = tabs[0];
// We have to convert the function to a string
const scriptToExec = `(${scrapeThePage})()`;
// Run the script in the context of the tab
const scraped = await chrome.tabs.executeScript(tab.id, { code: scriptToExec });
// Result will be an array of values from the execution
// For testing this will be the same as the console output if you ran scriptToExec in the console
alert(scraped[0]);
});
});
If you do it this way you don't need any "content_scripts"
in manifest.json
. You don't need jQuery or jQuery UI or Bootstrap either.
How to get actual HTML elements in Chrome extension, not original source code
Any elements that you see in the JavaScript inspector, but not in the HTML source code, are either (a) automatically added by the browser to normalize the any missing elements (i.e. no <body>
tag) or correcting invalid structure (i.e. unclosed <p>
tag) to make the document valid, or (b) added by JavaScript.
Any technique you use to inspect the document from your Chrome extension will automatically see the document as you see it in the document inspector. You don't need to do anything specific. All of the elements that have been created by the browser or by JavaScript will be there. For example, you could use document.querySelectorAll('*')
to get an array-like object containing all of them, or document.body.outerHTML
to get the HTML code.
The harder task would actually be if you wanted to get the original, uncorrected source code.
Storing source code of current browser in chrome extension
If you want to show HTML code inside an HTML element you may need to set the text content (innerText
), not the HTML content (innerHTML
):
bth1.onclick = function scrapeThePage() {
// Keep this function isolated - it can only call methods you set up in content scripts
var htmlCode = document.documentElement.outerHTML;
var btn = document.getElementById("mybtn1");
btn.innerText = htmlCode;
}
http://jsfiddle.net/1jk94r50/
Google Chrome Extensions: Get Current Page HTML (incl. Ajax or updated HTML)
I followed the exact solution here, and this gave me the Page Source HTML:
Getting the source HTML of the current page from chrome extension
The solution is to inject the HTML into the Popup.
Related Topics
Force to Open "Save As..." Popup Open At Text Link Click For Pdf in Html
How to Wrap Text Around a Bottom-Right Div
How to Get This CSS Text-Decoration Override to Work
Does Opacity:0 Have Exactly the Same Effect as Visibility:Hidden
Why Does the Outer ≪Div≫ Here Not Completely Surround the Inner ≪Div≫
Show an Image Preview Before Upload
Implement an Input With a Mask
Html5 Best Practices; Section/Header/Aside/Article Elements
How to Expand Floated Child Div'S Height to Parent'S Height
How to Change the Button Text of ≪Input Type="File" /≫
Display: Inline-Block Extra Margin
Are Class Names in CSS Selectors Case Sensitive
Center Image Horizontally Within a Div
How to Escape Ampersands in Xml So They Are Rendered as Entities in Html
Does Form Data Still Transfer If the Input Tag Has No Name
Should I Specify Height and Width Attributes For My Imgs in Html
Why the Content Is Not Covered by the Background of an Overlapping Element