How do I get the HTML source from the page?
Use
document.documentElement.outerHTML
or
document.documentElement.innerHTML
Python - Get HTML Source Code of a Web Page
why you don't use requests module ? :
import requests
r = requests.get("https://example.com")
print r.text
or for answer correctly to you'r question , you can download the urllib2 module using pip and easy_install :
pip install urllib2
easy_isntall urllib2
for requests:
pip install requests
easy_install requests
for requests , you should install urllib3:
pip install urllib3
easy_install urllib3
Get html source code from a website and then get an element from the html file
For Node.js there are two native fetching modules: http
and https
. If you're looking to scrape with a Node.js application, then you should probably use https
, get the page's html, parse it with an html parser, I'd recommend cheerio
. Here's an example:
// native Node.js module
const https = require('https')
// don't forget to `npm install cheerio` to get the parser!
const cheerio = require('cheerio')
// custom fetch for Node.js
const fetch = (method, url, payload=undefined) => new Promise((resolve, reject) => {
https.get(
url,
res => {
const dataBuffers = []
res.on('data', data => dataBuffers.push(data.toString('utf8')))
res.on('end', () => resolve(dataBuffers.join('')))
}
).on('error', reject)
})
const scrapeHtml = url => new Promise((resolve, reject) =>{
fetch('GET', url)
.then(html => {
const cheerioPage = cheerio.load(html)
// cheerioPage is now a loaded html parser with a similar interface to jQuery
// FOR EXAMPLE, to find a table with the id productData, you would do this:
const productTable = cheerioPage('table .productData')
// then you would need to reload the element into cheerio again to
// perform more jQuery like searches on it:
const cheerioProductTable = cheerio.load(productTable)
const productRows = cheerioProductTable('tr')
// now we have a reference to every row in the table, the object
// returned from a cheerio search is array-like, but native JS functions
// such as .map don't work on it, so we need to do a manually calibrated loop:
let i = 0
let cheerioProdRow, prodRowText
const productsTextData = []
while(i < productRows.length) {
cheerioProdRow = cheerio.load(productRows[i])
prodRowText = cheerioProdRow.text().trim()
productsTextData.push(prodRowText)
i++
}
resolve(productsTextData)
})
.catch(reject)
})
scrapeHtml(/*URL TO SCRAPE HERE*/)
.then(data => {
// expect the data returned to be an array of text from each
// row in the table from the html we loaded. Now we can do whatever
// else you want with the scraped data.
console.log('data: ', data)
})
.catch(err => console.log('err: ', err)
Happy scraping!
How to get html source code of a web page
If you can write scripts for Node.js, here is a small example using puppeteer library. It logs page source code after the page is loaded in a headless (invisible) Chrome, with dynamic content generated by page scripts:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
try {
const [page] = await browser.pages();
await page.goto('https://example.org/');
console.log(await page.content());
} catch (err) { console.error(err); } finally { await browser.close(); }
how to get html page source by C#
here is the way
string url = "https://www.digikala.com/";
using (HttpClient client = new HttpClient())
{
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
using (HttpContent content = response.Content)
{
string result = content.ReadAsStringAsync().Result;
}
}
}
and result
variable will contains the page as HTML
then you can save it to a file like this
System.IO.File.WriteAllText("path/filename.html", result);
NOTE you have to use the namespace
using System.Net.Http;
Update if you are using legacy VS then you can see this answer for using WebClient
and WebRequest
for the same purpose, but Actually updating your VS is a better solution.
How to get html source after login to a website?
Logging in
I looked at the website and how its login system works and you are making a couple of assumptions as to how it works that are incorrect. The way you log in to this specific website is by sending a request to "https://lobby-api.ogame.gameforge.com/users" and giving it data in the "application/x-www-form-urlencoded" format. The data needed is as shown in the following table:
Key █ Value
credentials[email] █ the email here
credentials[password] █ the password here
Once you send this request you will receive a cookie called "PHPSESSID" You can use this cookie to make subsequent requests, for example, to "https://lobby.ogame.gameforge.com/?language=tr" which is the page you are trying to get to when going to "index.php"
More issues
However, once you do load this page and render the HTML you will find that it does not contain anything interesting like the servers which is probably what you are after.
Here is the HTML:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no">
<meta name="theme-color" content="#000000">
<link rel="shortcut icon" href="/favicon.ico">
<script type="text/javascript" src="/config/configuration.js"></script>
<title>OGame Lobby</title>
<link href="https://s3-static.geo.gfsrv.net/browsergamelobby/ogame/1.0.8/css/main.2e4c281d.css" rel="stylesheet">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<div class="planet"></div>
<script type="text/javascript" src="https://s3-static.geo.gfsrv.net/browsergamelobby/ogame/1.0.8/js/main.edde2ed8.js"></script>
</body>
</html>
The javascript then loads makes the thing on the page. This leaves you two options, you could use a browser component as suggested by Andrius Naruševičius or you could use the API that the javascript uses. In order to figure out the API, you could use Network tab in your browser's dev tools. This way may be more complicated initially but in the end, it should be easier and make for cleaner code because the API is designed to be used by people (the people who made it) but the HTML was not designed to be parsed because it was made for the browser, not a human. However, depending on what you intend to do with the list of servers it might actually be easier to use Andrius's way, you will have to make that decision for yourself.
How to go on if you choose to go my route
You can learn about the chrome dev tools network tab here and by using google (obviously). You can test your API calls using a software like Postman.
If you know nothing about web requests/APIs, cookies, and session IDs you should not start here, you should learn what those are first. To learn that just look them up on Google.
Related Topics
Highcharts - Issue About Full Chart Width
Googlemaps Does Not Load on Page Load
How to Do Anything About "Repaints on Scroll" Warning in Chrome for "Overflow:Scroll" Div
Capture/Save/Export an Image with CSS Filter Effects Applied
Wrap Text Inside Fixed Div with CSS or JavaScript
Dynamically Resize Columns in CSS Grid Layout with Mouse
How to Open Bootstrap Modal in Ajax Success
Why Are CSS Keyframe Animations Broken in Vue Components with Scoped Styling
Get 3D CSS Rotation Value from Matrix3D() with JavaScript
Detect Browser Zoom Level Using JavaScript
Anchor Tag Download Attribute Not Working :Bug in Chrome 35.0.1916.114
D3 Bar Graph Example Not Working Locally
Android Keyboard Shrinking the Viewport and Elements Using Unit Vh in CSS
How to Select Nth Element of the Same Type
Using Media Queries to Only Include Js Files on Mobile
How to Get Screen Position of CSS3-3D Transformed Elements
Jquery + Animate.CSS Animation Only Working Once, Animation Not Resetting