Linking in Wikipedia

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Linking in Wikipedia

Nick Whitelegg-2

Some of you might be aware that Freemap has the potential facility to look
up details on a place (town, village etc) by clicking on it on the map. I
was thinking, rather than rewriting descriptions of places from scratch, to
make use of Wikipedia entries on them instead. The idea would be to fetch
an entry from Wikipedia and display it in one of the <div>s of Freemap, to
provide a more seamless user experience than navigating externally to
Wikipedia.

Technically, this would be possible but it would involve parsing a rather
hairy hierarchy of <div>s on the Wikipedia content. Is anyone aware of any
plans by Wikipedia to provide entries as a concise XML description rather
than a full HTML page? I can see that having enormous potential.

Also, as long as I referenced Wikipedia I take it that this sort of thing
would be legal? I'm not fully up on the full implications of the FDL - how
would it impact on freemap as a whole? Would it mean the whole freemap site
would have to be FDL licenced?

Thanks,
Nick


_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

frank mohr
Nick Whitelegg wrote:

> Some of you might be aware that Freemap has the potential facility to look
> up details on a place (town, village etc) by clicking on it on the map. I
> was thinking, rather than rewriting descriptions of places from scratch, to
> make use of Wikipedia entries on them instead. The idea would be to fetch
> an entry from Wikipedia and display it in one of the <div>s of Freemap, to
> provide a more seamless user experience than navigating externally to
> Wikipedia.
>
> Technically, this would be possible but it would involve parsing a rather
> hairy hierarchy of <div>s on the Wikipedia content. Is anyone aware of any
> plans by Wikipedia to provide entries as a concise XML description rather
> than a full HTML page? I can see that having enormous potential.
>
> Also, as long as I referenced Wikipedia I take it that this sort of thing
> would be legal? I'm not fully up on the full implications of the FDL - how
> would it impact on freemap as a whole? Would it mean the whole freemap site
> would have to be FDL licenced?
>
> Thanks,
> Nick
>
>
> _______________________________________________
> Openstreetmap mailing list
> [hidden email]
> http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
>
>
there are 2 similar projects, that link GoogleEarth
(and NASA World Wind) to Wikipedia

(sorry ... all in german)
Newsticker:
http://www.heise.de/newsticker/meldung/61346
http://www.heise.de/newsticker/meldung/61605
And the Project Pages:
http://www.kartographie.uni-trier.de/p/h/users/sk/Google_Earth/google_earth_de_wikipedia.htm
http://www.polybos.de/

Wikipedia Georeferencing is on:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Geographical_coordinates

       

       
               
___________________________________________________________
Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de


_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

Lars Aronsson
In reply to this post by Nick Whitelegg-2
Nick Whitelegg wrote:
> Technically, this would be possible but it would involve parsing a rather
> hairy hierarchy of <div>s on the Wikipedia content. Is anyone aware of any
> plans by Wikipedia to provide entries as a concise XML description rather
> than a full HTML page? I can see that having enormous potential.

The German Wikipedia has been doing with this for biographical
metadata for persons ("Personendaten"), which is useful for a
search on birth year, death year, profession, and nationality.  
The German Wikipedia-on-DVD published by a private company has
such a search function.  I don't know of any concrete plans to use
a similar system for geographic places, but I guess it could make
a lot of sense.

You might also want to look at Wikitravel.org where places are
described from a tourist's perspective.

The German Personendaten approach consists of a wiki template,
which is manually edited into every article, e.g.:

 {{Personendaten|
   NAME=Churchill, Winston
  |ALTERNATIVNAMEN=Winston Leonard Spencer Churchill
  |KURZBESCHREIBUNG=Britischer Premierminister während des Zweiten Weltkriegs
  |GEBURTSDATUM=[[30. November]] [[1874]]
  |GEBURTSORT=[[Blenheim Palace]] bei [[Woodstock]], [[Großbritannien]]
  |STERBEDATUM=[[24. Januar]] [[1965]]
  |STERBEORT=[[London]], [[Großbritannien]]
 }}

Here "Geburtsdatum" means "birthdate", "Sterbeort" means "place of
death", etc.

This template text (within {{ and }}) can easily be extracted from
the raw wiki text and processed into a database table.

The English Wikipedia article http://en.wikipedia.org/wiki/Berlin 
contains the geographic coordinates of that city in a template
format {{coor dm|52|31|N|13|24|E|}} but the corresponding article
http://en.wikipedia.org/wiki/London doesn't.  However, I assume
you wanted something more than just the geo coordinates, more like
the German Personendaten approach?

If you can write a description of what you need, I might bring
this to the Wikimania conference in Frankfurt, two weeks from now.


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

Nick Whitelegg-2
In reply to this post by Nick Whitelegg-2





Lars Aronsson <[hidden email]>@vr.ucl.ac.uk on 19/07/2005 14:23:10

Sent by:    [hidden email]


To:    [hidden email]
cc:
Subject:    Re: [Openstreetmap] Linking in Wikipedia



>Here "Geburtsdatum" means "birthdate", "Sterbeort" means "place of
>death", etc.

>This template text (within {{ and }}) can easily be extracted from
>the raw wiki text and processed into a database table.

Is it possible to obtain the "source code" of a Wikipedia page? Some
wikipedia index pages have "view source" but not other pages. If so, then
presumably what one could do is request the "source" of a page from
Wikipedia then use the source to format it into your own site.

>The English Wikipedia article http://en.wikipedia.org/wiki/Berlin
>contains the geographic coordinates of that city in a template
>format {{coor dm|52|31|N|13|24|E|}} but the corresponding article
>http://en.wikipedia.org/wiki/London doesn't.  However, I assume
>you wanted something more than just the geo coordinates, more like
>the German Personendaten approach?

What would be good is something along the lines of:

- User visits http://www.free-map.org.uk/
- User clicks on a place name, e.g. Fernhurst
- A request is made to Wikipedia for the Wikipedia article on Fernhurst
- Wikipedia sends back the Fernhurst article in XML which can be processed
by the client.
- The XML might be something like:

<wikipedia_article>
<title>Fernhurst</title>
<description>
Fernhurst is a medium sized village in West Sussex. Originally developing
around the Village Green, the centre of the village is not quarter of a
mile west on the main road. etc etc etc

</description>
<references>
..... links to other relevant articles ...
</references>
</wikipedia_article>

This would be a lot easier to process than a complete HTML page.

Nick






_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

Lars Aronsson
Nick Whitelegg wrote:
> Is it possible to obtain the "source code" of a Wikipedia page? Some
> wikipedia index pages have "view source" but not other pages. If so, then
> presumably what one could do is request the "source" of a page from
> Wikipedia then use the source to format it into your own site.

Every Wikipedia page has an "edit" tab that brings up the source
wikitext in an HTML form textarea, with a save button underneath.  
The exception is pages that are write protected, where the tab
instead reads "view source" and there is no save button.  But the
wikitext source in the textarea is the same.

So all you need to do is to use "wget" to fetch
http://en.wikipedia.org/w/index.php?title=Berlin&action=edit
and use some regexp to extract everything between <textarea>
and </textarea>.

But this is impractical for big scale metadata extraction, such as
harvesting all "Personendaten" for all 250,000 articles in the
German Wikipedia. Instead you can download the entire database and
import it to your own MySQL instance.  You can get both the
current (cur) and archived previous versions (old) of every
article of every language.  But beware that this is a lot of data,
many gigabytes.

The architecture is described starting at
http://meta.wikimedia.org/wiki/MediaWiki_architecture
and the database download is available at
http://download.wikimedia.org/

> What would be good is something along the lines of:
>
> - User visits http://www.free-map.org.uk/
> - User clicks on a place name, e.g. Fernhurst
> - A request is made to Wikipedia for the Wikipedia article on Fernhurst
> - Wikipedia sends back the Fernhurst article in XML which can be processed
> by the client.

Would this assume that Wikipedia has an article on Fernhurst?  (It
does not.)  For many place names, Wikipedia has a "disambiguation
pages" that branch off into the specific articles, such as
http://en.wikipedia.org/wiki/San_Jose , in which case you would
want to access http://en.wikipedia.org/wiki/San_Jose%2C_California

If you download the current articles of the English Wikipedia
(http://download.wikimedia.org/wikipedia/en/20050623_cur_table.sql.gz 
size 1.0 gigabyte) and dig through to find geo coordinates, you
could extract a list like this:

  Lat.        Long.          Article name
  ----------- ------------   -----------------------
  37°18'15" N 121°52'22" W   San Jose, California
  52°31′    N  13°24′    E   Berlin

Since the London article doesn't contain coordinates, it would be
missing from this list.  You would not find Fernhurst, since there
is no Wikipedia article of this place.  And you would find no geo
coordinates in the disambiguation page "San Jose", which is fine.

For the article http://en.wikipedia.org/wiki/Limehouse
there is no lat-long coordinate, but an OS Grid Reference, that
your script could pick up and convert to something useful.

Now you can fit your free-map with links at these coordinates.

Still missing is the XML export.  You might do without this, by
simply opening the plain HTML page from Wikipedia in a new window
or browser tab.  That would remove the need for postprocessing.
We could still discuss this XML export as a feature request, but
it doesn't really stop you from doing the rest of the work.

Suppose there are geo coordinates in the Wikipedia articles on
Europe, Great Britain, England, London, City of London, Tower
Hamlets, and Limehouse.  At which zoom level would you show the
individual townships and where would you show the overall London
or England link instead?  How do you tell?  Which extra fields
would you need in the coordinate list above?  How should that
information best be written into the wikitext source?


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

Nick Whitelegg-2
In reply to this post by Nick Whitelegg-2





Lars Aronsson <[hidden email]>@vr.ucl.ac.uk on 19/07/2005 16:54:42

Sent by:    [hidden email]


To:    [hidden email]
cc:
Subject:    Re: [Openstreetmap] Linking in Wikipedia



>Every Wikipedia page has an "edit" tab that brings up the source
>wikitext in an HTML form textarea, with a save button underneath.
>The exception is pages that are write protected, where the tab
>instead reads "view source" and there is no save button.  But the
>wikitext source in the textarea is the same.

Thanks, that's the sort of thing I was looking for :-) A bit inelegant, as
you're requesting an edit but not not actually editing the page, but in the
absence of an XML "feed" back from Wikipedia it'll do nicely.

Still think that supplying articles as XML would be very elegant from a
"web service" point of view, allowing other websites and client apps to
display and search Wikipedia information. Isn't this what the whole
excitement around Web Services revolved around?

Nick





_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap
Reply | Threaded
Open this post in threaded view
|

Re: Linking in Wikipedia

Lars Aronsson
Nick Whitelegg wrote:

> Still think that supplying articles as XML would be very elegant
> from a "web service" point of view, allowing other websites and
> client apps to display and search Wikipedia information. Isn't
> this what the whole excitement around Web Services revolved
> around?

Possibly.  But Wikipedia isn't revolving around web services, but
around open content.  The content is open.  You can download it,
reformat it, and redistribute it in XML or any format you wish,
provided you follow the GFDL license.  If your wanted format
requires Wikipedia to change its rules, e.g. by adding metadata in
a standardized way, I guess the Wikipedia community would be
interested in hearing your needs and suggestions.

Projects such as IMDb.com input data in a highly structured
format, where person P is an actor in movie M, which was directed
by person D, and first showed in year Y.  Wikipedia on the other
hand only uses one big text window with plain text marked with
occasional tags.  Extracting structured information after the fact
can be hard.  If you find geo coordinates in an article, you can
conclude that this article describes a place and not a person or
an abstract concept, but right now you cannot know if that place
is a town, a building, or a county.  The German "Personendaten"
metadata is a compromise, creating a structured format template
within the plain text.  We could argue that a {{placename}}
template should be introduced in Wikipedia, but which attributes
should it have?  Lat-long, yes.  Perhaps also type (town, country,
landmark building, mountain peak, etc.) and population.  But
should it also have an attribute for the year when the town was
founded?


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

_______________________________________________
Openstreetmap mailing list
[hidden email]
http://bat.vr.ucl.ac.uk/cgi-bin/mailman/listinfo/openstreetmap