bluesoul

If you want to get or record data from the wiki, you have a few ways to do it.

  1. The Wikidot XML-RPC API (list of methods, arguments, and returns)
  2. Page Scraping
  3. Crafting requests to ajax-module-connector.php
  4. Using third-party APIs such as SCPPER, CROM, and SCUTTLE
  5. Using purpose-built packages like pyscp

Depending on what exactly you are looking to accomplish, you may have to take a combination of approaches, as they all have strengths and weaknesses.

Method Pros Cons
XML-RPC API The only officially sanctioned method, generous rate limit of 240 requests per minute. Requires an API key, is not feature-complete.
Page Scraping If you can see it on a page, you can capture it. Lots of work, lots of extraneous data (read: bandwidth), comparatively slow, harder error handling.
ajax-module-connector.php Returns just about everything there is to get on the wiki. Inconsistent argument names (page_id vs. pageId, etc.), returns HTML which is needlessly bulky and must be parsed.
Third Party APIs Does most of the legwork for you. None of them are feature-complete.
Purpose-built packages Easy to use, returns useful data. Needs to match with your desired programming language.

I find it useful to have at least a basic understanding of the wikidot database schema, which is available here. Notably, this is where pages are defined and this is for revisions. You'll notice that everything has a unique identifier which is good news.

Some tasks are incredibly easy in one way and nearly impossible in another.

Task Recommended Using… Not Recommended
Listing all Pages XML-RPC API, SCUTTLE pages.select, /pages on SCUTTLE Page Scraping
cell-content cell-content cell-content cell-content

ajax-module-connector.php can connect to a ton of modules, listed here. Generally speaking, you can find what module triggers the desired response by watching the network view in your browser's development tools, and looking for a POST request to ajax-module-connector.php. In doing so you'll also note the presence of a wikidot_token7 key both as a cookie and sent as POST data, and that they match. Any random string of 6-32 lowercase letters and numbers will suffice for this value. The POST will also contain a moduleName value that corresponds to the PHP file in /php/modules as linked above, an identifier of some sort, and perhaps some extra values for things like pagination. Some commonly used modules are listed here.

Example usage in python:

import requests
import json
import string
import random
 
token = ''.join(
    random.choice(string.ascii_lowercase + string.digits) for x in range(6)
)
cookies = requests.cookies.RequestsCookieJar()
cookies.set('wikidot_token7', token, domain='www.scp-wiki.net', path='/')
p = requests.post('http://www.scp-wiki.net/ajax-module-connector.php', data={
    'wikidot_token7': token, 
    'categoryId': '1900210', 
    'moduleName': 'forum/ForumRecentPostsListModule', 
    'page':'1', 
    'limit':'5'}, 
    cookies=cookies)
response = json.loads(p.text)
body = response['body']
print(body)
Module Task Identifier(s) Notes
changes/SiteChangesListModule Renders the view seen on "Recent Changes" None, the domain is the identifier. Does not return the unique revision_id but will give the page, common revision ID (e.g., Revision 6 instead of revision_id 8675309), the user_id and username of the user making the change. The XML feed provides the actual revision_id and is recommended when viable.
backlinks/BacklinksModule Gets backlinks page_id An absence of a <ul> element indicates an orphaned page. This module will return common backlinks as well as includes using the [[include]] tag as separate elements.
history/PageDiffModule Return formatted diffs. from_revision_id, to_revision_id Returns the fully formed "pretty" diff with <del> and <ins> tags indicating removals and additions.
history/PageHistoryModule Loads the pager for page history. page_id This will not actually give you any page information.
history/PageRevisionListModule Loads the table of revisions for a page. page_id, page, perpage, options Without a perpage argument it will default to only 1 revision which is nearly useless. options and page can be omitted, and perpage can be set arbitrarily high (i.e., 99999). You can associate wikidot revision_ids to common revision IDs here with <tr id="revision-row-$revision_id"><td>$revision_number.</td>
history/PageSourceModule Loads the wikidot source for a revision. revision_id The source is enclosed in <div class="page-source"> and will still need the usual conversion of HTML special characters back to a more database-friendly format, e.g., <br> to \n and &gt; to >
history/PageVersionModule Loads the HTML for a revision. revision_id The source is prepended with a <div id="page-version-info"> that provides some metadata like timestamp, the user_id and username responsible for the version, and the edit description.
pagerate/PageRateWidgetModule Loads the current rating. pageId This is about 33% less costly to use than PageRateModule for the same useful data back.
pagerate/WhoRatedPageModule Loads the list of who voted on a page and how. pageId Everything you need is available here, the user_id, username, deleted users are identified by their old id number. This can time out (error 504) on pages with large numbers of votes (e.g., 173)
userinfo/UserChangesListModule Shows a user's edits across all wikis. userId Everything you need is technically here to correlate a revision ID but the revision ID itself is not present. You have a wiki, a page, and the common revision number and a timestamp.
userinfo/UserInfoMemberOfModule Shows a user's wiki memberships. user_id WARNING: Note the potential for a false negative here if you use userId instead of user_id, it will not error out, instead you will be told the user is not a member of any sites. This holds true for the Moderator and Admin modules as well.
userinfo/UserInfoModeratorOfModule Lists the wikis the user is a moderator of. user_id WARNING: Note the potential for a false negative here if you use userId instead of user_id, it will not error out, instead you will be told the user is not a moderator of any sites. This holds true for the Member and Admin modules as well.
userinfo/UserInfoAdminOfModule Lists the wikis the user is an admin or master admin of. user_id WARNING: Note the potential for a false negative here if you use userId instead of user_id, it will not error out, instead you will be told the user is not an admin of any sites. This holds true for the Member and Moderator modules as well.
userinfo/UserRecentPostsModule Lists the most recent forum posts by the user. user_id There's quite a bit of useful information available here including the post_id, the thread_id of the parent thread, and the timestamp.
userinfo/UserRecentPostsListModule Lists forum posts the user has made recently. userId, page, limit, categoryId, options This can be called instead from the beginning with page: 1, but is marginally less efficient if you're only interested in the first 20 posts. Note that limit will accept any numeric value allowing to capture a user's entire post history. Only userId is required.
userinfo/UserInfoProfileModule Displays information the user has volunteered about themself. user_id This includes a lot of useful things if the user has filled in fields on the Wikidot profile side including their name, date of birth, and the timestamp for how long they've been a user.
users/UserInfoWinModule Displays information about the user as it relates to the active wiki. user_id This includes all the useful stuff from UserInfoProfileModule as well as their relation to the wiki referenced by the domain the request originates from, including their role and how long they've been a member of the site.
report/FlagUserModule Flags an abusive user. targetUserId You do actually need to be logged in to hit this module, to avoid abuse of the abuse system.
viewsource/ViewSourceModule Displays the source code of the most recent revision of the page. page_id, raw The source is enclosed within <div class="page-source">, or you can send along raw with a string value of true and return pure wikidot source back (no <br> tags but there are html special characters to convert).
membership/MembersListModule Shows a paginated list of members. page Unlike most paginated modules, you cannot control the per-page amount on this one and it will provide 100 users at a time. Ordered oldest to newest with no sort options.
forum/ForumStartModule Renders the forum homepage. hidden What you're looking for here are the individual forum categories which are found in <a href="/forum/c-$category_id/$category_title">. If you pass along hidden: true it will include hidden forums.
forum/ForumViewCategoryModule Gives a paged list of threads in a particular forum. c, p This will give you a paginated view of forum c that can be selected by page p, each row here is a thread that is prefixed with t- and that gets used in…
forum/ForumViewThreadModule Renders a single thread. t, pageNo The nested divs allows you to set parent comments so comments thread correctly. A div id of fpc is the outer container for a comment, and within it may be more fpc elements, and at the innermost layer will be a post element. The post_id is highly visible at each step of the way and fpc exists solely as a threading mechanism. The presence of a pager in the code indicates multiple pages exist and the maximum is displayed and can be readily scraped.
forum/ForumCommentsListModule Shows discussion about a wiki page. pageId There is a useful nugget in here, in that you can load a page by page_id and the first thing back is the forumThreadId, so it's still useful for association. Otherwise this is useless on paginated discussion as there's no page argument available.
forum/ForumRecentPostsListModule Shows recent posts in a category. categoryId, page, options, limit The limit argument excepts any numeric value but defaults to 20. The pager present does not tell you how many pages there actually are so you would need to verify this against the count visible from the Start module.
forum/sub/ForumPostRevisionModule Retrieves previous versions of a post. revisionId The payload here will return the PostId, a content string with the HTML of the revision, and a title string of the post's subject, but does not include the edited timestamp.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License