How to Get the HTML Source from the Page

How do I get the HTML source from the page?

Use

document.documentElement.outerHTML

or

document.documentElement.innerHTML

Python - Get HTML Source Code of a Web Page

why you don't use requests module ? :

import requests

r = requests.get("https://example.com")
print r.text

or for answer correctly to you'r question , you can download the urllib2 module using pip and easy_install :

pip install urllib2
easy_isntall urllib2

for requests:

pip install requests
easy_install requests

for requests , you should install urllib3:

pip install urllib3
easy_install urllib3

Get html source code from a website and then get an element from the html file

For Node.js there are two native fetching modules: http and https. If you're looking to scrape with a Node.js application, then you should probably use https, get the page's html, parse it with an html parser, I'd recommend cheerio. Here's an example:

// native Node.js module
const https = require('https')
// don't forget to `npm install cheerio` to get the parser!
const cheerio = require('cheerio')

// custom fetch for Node.js
const fetch = (method, url, payload=undefined) => new Promise((resolve, reject) => {
https.get(
url,
res => {
const dataBuffers = []
res.on('data', data => dataBuffers.push(data.toString('utf8')))
res.on('end', () => resolve(dataBuffers.join('')))
}
).on('error', reject)
})

const scrapeHtml = url => new Promise((resolve, reject) =>{
fetch('GET', url)
.then(html => {
const cheerioPage = cheerio.load(html)
// cheerioPage is now a loaded html parser with a similar interface to jQuery
// FOR EXAMPLE, to find a table with the id productData, you would do this:
const productTable = cheerioPage('table .productData')

// then you would need to reload the element into cheerio again to
// perform more jQuery like searches on it:
const cheerioProductTable = cheerio.load(productTable)
const productRows = cheerioProductTable('tr')

// now we have a reference to every row in the table, the object
// returned from a cheerio search is array-like, but native JS functions
// such as .map don't work on it, so we need to do a manually calibrated loop:
let i = 0
let cheerioProdRow, prodRowText
const productsTextData = []
while(i < productRows.length) {
cheerioProdRow = cheerio.load(productRows[i])
prodRowText = cheerioProdRow.text().trim()
productsTextData.push(prodRowText)
i++
}
resolve(productsTextData)
})
.catch(reject)
})

scrapeHtml(/*URL TO SCRAPE HERE*/)
.then(data => {
// expect the data returned to be an array of text from each
// row in the table from the html we loaded. Now we can do whatever
// else you want with the scraped data.
console.log('data: ', data)
})
.catch(err => console.log('err: ', err)

Happy scraping!

How to get html source code of a web page

If you can write scripts for Node.js, here is a small example using puppeteer library. It logs page source code after the page is loaded in a headless (invisible) Chrome, with dynamic content generated by page scripts:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({ headless: false, defaultViewport: null });

try {
const [page] = await browser.pages();
await page.goto('https://example.org/');
console.log(await page.content());

} catch (err) { console.error(err); } finally { await browser.close(); }

how to get html page source by C#

here is the way

string url = "https://www.digikala.com/";

using (HttpClient client = new HttpClient())
{
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
using (HttpContent content = response.Content)
{
string result = content.ReadAsStringAsync().Result;
}
}
}

and result variable will contains the page as HTML then you can save it to a file like this

System.IO.File.WriteAllText("path/filename.html", result);

NOTE you have to use the namespace

using System.Net.Http;

Update if you are using legacy VS then you can see this answer for using WebClient and WebRequest for the same purpose, but Actually updating your VS is a better solution.

How to get html source after login to a website?

Logging in

I looked at the website and how its login system works and you are making a couple of assumptions as to how it works that are incorrect. The way you log in to this specific website is by sending a request to "https://lobby-api.ogame.gameforge.com/users" and giving it data in the "application/x-www-form-urlencoded" format. The data needed is as shown in the following table:

Key █ Value
credentials[email] █ the email here

credentials[password] █ the password here

Once you send this request you will receive a cookie called "PHPSESSID" You can use this cookie to make subsequent requests, for example, to "https://lobby.ogame.gameforge.com/?language=tr" which is the page you are trying to get to when going to "index.php"

More issues

However, once you do load this page and render the HTML you will find that it does not contain anything interesting like the servers which is probably what you are after.
Here is the HTML:

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no">
<meta name="theme-color" content="#000000">
<link rel="shortcut icon" href="/favicon.ico">
<script type="text/javascript" src="/config/configuration.js"></script>
<title>OGame Lobby</title>
<link href="https://s3-static.geo.gfsrv.net/browsergamelobby/ogame/1.0.8/css/main.2e4c281d.css" rel="stylesheet">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
<div class="planet"></div>
<script type="text/javascript" src="https://s3-static.geo.gfsrv.net/browsergamelobby/ogame/1.0.8/js/main.edde2ed8.js"></script>
</body>
</html>

The javascript then loads makes the thing on the page. This leaves you two options, you could use a browser component as suggested by Andrius Naruševičius or you could use the API that the javascript uses. In order to figure out the API, you could use Network tab in your browser's dev tools. This way may be more complicated initially but in the end, it should be easier and make for cleaner code because the API is designed to be used by people (the people who made it) but the HTML was not designed to be parsed because it was made for the browser, not a human. However, depending on what you intend to do with the list of servers it might actually be easier to use Andrius's way, you will have to make that decision for yourself.

How to go on if you choose to go my route

You can learn about the chrome dev tools network tab here and by using google (obviously). You can test your API calls using a software like Postman.
If you know nothing about web requests/APIs, cookies, and session IDs you should not start here, you should learn what those are first. To learn that just look them up on Google.



Related Topics



Leave a reply



Submit