Accessing Main Picture of Wikipedia Page by API

Accessing main picture of wikipedia page by API

http://en.wikipedia.org/w/api.php

Look at prop=images.

It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url

or to calculate the URL via the filename's hash.

Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").

Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.

How to get the default image of Wikipedia article?

Use MediaWiki API with pageimages. For example for Wikipedia article Jaguar and requested max image size 500 the query will be:

https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Jaguar&pithumbsize=500

From the response you can get also the largest image version, just remove the thumb part and everything from the pixels to the end of the link.

How to retrive some jpeg image from Wikipedia page by API?

Most of the pages from Wikipedia are associated with an Item in Wikidata. If there is "main image" for some wiki item it will be kept by Image (P18) property. This property you can access by using MediaWiki API for Wikidata with wbgetentities action:

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Paris

In this example, the article Paris (titles=Paris) in English Wikipedia (sites=enwiki) will returns Wikidata Item (Q90) with its Image property. You can be more specific in request by using &props=claims to miss all unnecessary information like labels, descriptions, sitelinks etc. The result will include:

{
"entities": {
"Q90": {
"claims": {
"P18": [ {
"mainsnak": {
"datavalue": {
"value": "Paris - Eiffelturm und Marsfeld2.jpg",
},
},
} ],
}
}
},
}

where the value "Paris - Eiffelturm und Marsfeld2.jpg" is the main image of the article.

Accessing full url of all page images Wikipedia API

The API does not give you all results at once, it defaults to 10 results. You see in the beginning answer that you have a value for the parameter gimcontinue. If you use it like this you get more images: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
&gimcontinue=1092923|Google_bike.jpg

Alternatively, you can ask for more images at once using gimlimit like this: http://en.wikipedia.org/w/api.php?action=query&pageids=1092923&generator=images&prop=imageinfo&iiprop=url|dimensions|mime&format=xml
&gimlimit=500

How to get Wikipedia image by title?

Using Api You can.

https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india

prop=pageimages

Prop plays an important role.if you want image description, you can get it by prop=pageimages|pageterms

we can also get the original image using piprop=original

or if you want in thumbnail size image, you can get it using piprop=thumbnail&pithumbsize=500
if the size height/width=500

if we want to request in Json format, we should always use formatversion=2 in the Api Query.

Original Image Size

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein

Thumbnail Image Size

https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=500&titles=Albert Einstein

We can fetch the specific data using Api and by Modifying it too On Web and Mobile Apps.

Is there a clean wikipedia API just for retrieve image by article name?

To retrieve all images from a MediaWiki article, query for the images property:

http://en.wikipedia.org/w/api.php?action=query&prop=images&titles=Main%20Page

Then you can ask for paths to specific sizes:

https://en.wikipedia.org/w/api.php?action=query&titles=File:Bubolz%20Grass.jpg&prop=imageinfo&&iiprop=url&iiurlwidth=220

Finally, you can use the first query as a generator for the second, to get all the data in one request:

https://en.wikipedia.org/w/api.php?action=query&generator=images&titles=Albert%20Einstein&prop=imageinfo&&iiprop=url&iiurlwidth=220

Note that this is part of the MediaWiki core API, and not specific to Wikipedia or even the Wikimedia universe.

How can I get the principal image from MediaWiki API?

As others have noted, Wikipedia articles don't really have any such thing as a "principal image", so your first problem will be deciding how to choose between the different images used on a given page. Some possible selection criteria might be:

  • Biggest image in the article.
  • First image exceeding some specific minimum dimensions, e.g. 60 × 60 pixels.
  • First image referenced directly in the article's source text, rather than through a template.

For the first two options, you'll want to fetch the rendered HTML code of the page via action=parse and use an HTML parser to find the img tags in the code, like this:

http://en.wikipedia.org/w/api.php?action=parse&page=English_language&prop=text|images

(The reason you can't just get the sizes of the images, as used on the page, directly from the API is that that information isn't actually stored anywhere in the MediaWiki database.)


For the last option, what you want is the source wikitext of the article, available via prop=revisions with rvprop=content:

http://en.wikipedia.org/w/api.php?action=query&titles=English_language&prop=revisions|images&rvprop=content

Note that many images in infoboxes and such are specified as parameters to a template, so just parsing for [[Image:...]] syntax will miss some of them. A better solution is probably to just get the list of all images used on the page via prop=images (which you can do in the same query, as I showed above) and look for their names (with or without Image: / File: prefix) in the wikitext.

Keep in mind the various ways in which MediaWiki automatically normalizes page (and image) names: most notably, underscores are mapped to spaces, consecutive whitespace is collapsed to a single space and the first letter of the name is capitalized. If you decide to go this way, here's some sample PHP code that will convert a list of file names into a regexp that should match any of them in wikitext:

foreach ($names as &$name) {
$name = trim( preg_replace( '/[_\s]+/u', ' ', $name ) );
$name = preg_quote( $name, '/' );
$name = preg_replace( '/^(\\\\?.)/us', '(?i:$1)', $name );
$name = preg_replace( '/\\\\? /u', '[_\s]+', $name );
}
$regexp = '/' . implode( '|', $names ) . '/u';

For example, when given the list:

Anglospeak(800px)Countries.png
Anglospeak.svg
Circle frame.svg
Commons-logo.svg
Flag of Argentina.svg
Flag of Aruba.svg

the generated regexp will be:

/(?i:A)nglospeak\(800px\)Countries\.png|(?i:A)nglospeak\.svg|(?i:C)ircle[_\s]+frame\.svg|(?i:C)ommons\-logo\.svg|(?i:F)lag[_\s]+of[_\s]+Argentina\.svg|(?i:F)lag[_\s]+of[_\s]+Aruba\.svg/u

How to get the first image of any wiki page

Seems like the images are getting returned in alphabetical order.... weird.

Anyway, this might work better:

https://en.wikipedia.org/w/api.php?action=parse&text={{Barack_Obama}}&prop=images

Unfortunately, only the first image is usable, but at least it's the right one.

Wikipedia api to get the jpeg image from wikipage

The API does not support that kind of request, you will have to sort that out on your side. For an easy overview of what you can do, use the API Sandbox. Here is a query with your parameters but in the sandbox:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=json&iiprop=url
&iiurlwidth=400&titles=Kiel&generator=images



Related Topics



Leave a reply



Submit