How to Get Results from the Wikipedia API with PHP

How to get results from the Wikipedia API with PHP?

The problem you are running into here is related to the MW API's User-Agent policy - you must supply a User-Agent header, and that header must supply some means of contacting you.

You can do this with file_get_contents() with a stream context:

$opts = array('http' =>
array(
'user_agent' => 'MyBot/1.0 (http://www.mysite.com/)'
)
);
$context = stream_context_create($opts);

$url = 'http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0';
var_dump(file_get_contents($url, FALSE, $context));

Having said that, it might be considered more "standard" to use cURL, and this will certainly give you more control:

$url = 'http://en.wikipedia.org/w/api.php?action=query&titles=Your_Highness&prop=revisions&rvprop=content&rvsection=0';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, 'MyBot/1.0 (http://www.mysite.com/)');

$result = curl_exec($ch);

if (!$result) {
exit('cURL Error: '.curl_error($ch));
}

var_dump($result);

PHP: How to retrieve extract text from Wiki API

There are various possible solutions to this.

You could use reset() / current() against the pages property to get the first / current item in that array, or you could loop around that property with a foreach and ignore the keys. You could also use array_values() on the pages property to get force sequential indicies, or use array_keys() on it to get a list of the page ids and use those to access each item. (There are other ways).

The foreach option is going to be your best bet.

foreach($wiki_array['query']['pages'] as $page)

$page inside the loop will be the array that you're after.

You should then make sure you can deal with multiple results properly.

Extracting data from Wikipedia API

$pageid was returning an array with one element. If you only want to get the fist one, you should do this:

$pageid = $data->query->pageids[0];

You were probably getting this warning:

 Array to string conversion 

Full code:

    $url = 'http://en.wikipedia.org/w/api.php?action=query&prop=extracts|info&exintro&titles=google&format=json&explaintext&redirects&inprop=url&indexpageids';

$json = file_get_contents($url);
$data = json_decode($json);

$pageid = $data->query->pageids[0];
echo $data->query->pages->$pageid->title;

Getting Wikipedia API

Sorry to bother you but you could do this

$ua = array();
$ua[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0';
$ua[] = 'content-type:application/json; charset=utf-8';

$data = json_decode(get("https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=$query",$ua));

foreach ($data->query->pages as $pid) {
echo 'Your pageid = ' . $pid->pageid . PHP_EOL;
echo 'Title = ' . $pid->title . PHP_EOL;
echo 'extract = ' . $pid->extract . PHP_EOL;
}

RESULT

Your pageid = 7529378
Title = Facebook
extract = Facebook (stylized as facebook) is an American online social media and social networking service based in Menlo Park, California, and a flagship service of the namesake company Facebook, Inc. It was founded by Mark Zuckerberg, along with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, and Chris Hughes.
The founders of Facebook initially limited membership to Harvard students. Membership was expanded to Columbia, Stanford, and Yale before being expanded to the rest of the Ivy League, MIT, and higher education institutions in the Boston area, then various other universities, and lastly high school students. Since 2006, anyone who claims to be at least 13 years old has been allowed to become a registered user of Facebook, though this may vary depending on local laws. The name comes from the face book directories often given to American university students.
Facebook can be accessed from devices with Internet connectivity, such as personal computers, tablets and smartphones. After registering, users can create a profile revealing information about themselves. They can post text, photos and multimedia which is shared with any other users that have agreed to be their "friend", or, with a different privacy setting, with any reader. Users can also use various embedded apps, join common-interest groups, buy and sell items or services on Marketplace, and receive notifications of their Facebook friends' activities and activities of Facebook pages they follow. Facebook claimed that it had 2.74 billion monthly active users as of September 2020, and it was the most downloaded mobile app of the 2010s globally.Facebook has been the subject of numerous controversies, often involving user privacy (as with the Cambridge Analytica data scandal), political manipulation (as with the 2016 U.S. elections), mass surveillance, psychological effects such as addiction and low self-esteem, and content such as fake news, conspiracy theories, copyright infringement, and hate speech. Commentators have accused Facebook of willingly facilitating the spread of such content and also exaggerating its number of users in order to appeal to advertisers. As of January 21, 2021, Alexa Internet ranks Facebook seventh in global internet usage.

Note: I removed the ,true so the json object gets converted to a PHP object

OR simply

echo 'PageId = ' . array_keys((array)$data->query->pages)[0];

get page id from title with wiki api none english

You are making the request to English Wikipedia instead of Vietnamese. Change the en to vi in your call and you will get results. See here:

https://vi.wikipedia.org/w/api.php?action=query&titles=Trung%20%C4%90%C3%B4ng&prop=iwlinks&format=json

Is there a Wikipedia API just for retrieve the content summary?

There's a way to get the entire "introduction section" without any HTML parsing! Similar to AnthonyS's answer with an additional explaintext parameter, you can get the introduction section text in plain text.

Query

Getting Stack Overflow's introduction in plain text:

Using the page title:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Stack%20Overflow

Or use pageids:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=21721040

JSON Response

(warnings stripped)

{
"query": {
"pages": {
"21721040": {
"pageid": 21721040,
"ns": 0,
"title": "Stack Overflow",
"extract": "Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, created in 2008 by Jeff Atwood and Joel Spolsky, as a more open alternative to earlier Q&A sites such as Experts Exchange. The name for the website was chosen by voting in April 2008 by readers of Coding Horror, Atwood's popular programming blog.\nIt features questions and answers on a wide range of topics in computer programming. The website serves as a platform for users to ask and answer questions, and, through membership and active participation, to vote questions and answers up or down and edit questions and answers in a fashion similar to a wiki or Digg. Users of Stack Overflow can earn reputation points and \"badges\"; for example, a person is awarded 10 reputation points for receiving an \"up\" vote on an answer given to a question, and can receive badges for their valued contributions, which represents a kind of gamification of the traditional Q&A site or forum. All user-generated content is licensed under a Creative Commons Attribute-ShareAlike license. Questions are closed in order to allow low quality questions to improve. Jeff Atwood stated in 2010 that duplicate questions are not seen as a problem but rather they constitute an advantage if such additional questions drive extra traffic to the site by multiplying relevant keyword hits in search engines.\nAs of April 2014, Stack Overflow has over 2,700,000 registered users and more than 7,100,000 questions. Based on the type of tags assigned to questions, the top eight most discussed topics on the site are: Java, JavaScript, C#, PHP, Android, jQuery, Python and HTML."
}
}
}
}

Documentation: API: query/prop=extracts

How to use Wikipedia API to search for values input by a user?

There are a few things to know about the Wikipedia API.

Consider the url that you have shared:

var url = "https://en.wikipedia.org/w/api.php?action=opensearch&search="+ searchTerm + "&format=json&callback=?";

There are two parts in the API URL.

  1. The API Entry Point: https://en.wikipedia.org/w/api.php - This is
    the URL to which you make all your API calls i.e. it is the part
    common to all API calls.
  2. Parameters: The rest of the URL are parameters. In the parameters, you specify what exactly you want from the API call. I am explaining some of the parameters below:

action parameter: There are many action parameters available in the Wikipedia API. action=query parameter is used to get information about a wikipedia article. Another common action parameter is action=opensearch which is used to search Wikipedia - which is also there in the URL above. To read more on the Action parameter go here.

Each action parameter also may have its own sub-parameters. For example, the search parameter which is used in the url above. It tells the API what term to search for.

format parameter tells which format you want the result in. It is usually json though php and xml are also supported but deprecated. More on this here.

callback=? may have been added in your query to trigger a JSONP response to avoid violation of Same Origin Policy. More information on Cross Site Requests regarding the Wikipedia API are available here.

`



Related Topics



Leave a reply



Submit