wikipedia page-to-page links by pageid

bsrgpfxpyy

New Member
What?:
I'm trying to get page-to-page link map (matrix) of wikipedia pages by \[code\]page_id\[/code\] in following format:\[code\]from1 to1 to2 to3 ...from2 to1 to2 to3 ......\[/code\]Why?:
I'm looking for data set (pages from wikipedia) to try out PageRank.Problem:
At dumps.wikimedia.org it is possible to download pages-articles.xml which is XML with this kind of format:\[code\]<page> <title>...</title> <id>...</id> // pageid <text>...</text></page>\[/code\]that I will use for retrieving articles (\[code\]text\[/code\]), then also base per-page data (page.sql) which contains some details about pages by \[code\]page_id\[/code\] and last one that seems relevant to me is pagelinks.sql that contains page-to-page link records. Problem is that \[code\]pagelinks\[/code\] table has following fields: \[code\]pl_from\[/code\], \[code\]pl_namespace\[/code\] and \[code\]pl_title\[/code\].Idea: Create temporary database, import \[code\]page\[/code\] and \[code\]pagelinks\[/code\] tables and create this matrix by using \[code\]pagelinks\[/code\] table and retrieving \[code\]page_id\[/code\]s according to \[code\]pl_title\[/code\]sQuestion:
Is there a place where to get this kind of matrix of page-to-page links by \[code\]page_id\[/code\] so that I don't need to create it on my own ?Or if not, is there any faster way how to get this kind of matrix than idea that I've pointed out?
 
Back
Top