Nova Resource:Wikisource/Documentation

From Wikitech-static
< Nova Resource:Wikisource
Revision as of 17:36, 10 February 2020 by imported>MusikAnimal (ScriptAlias /tool "/var/www/tool/public")
Jump to navigation Jump to search

Wikisource is a VPS project for Wikisource-related tools.

Wikisource Export

Creating a new instance

Create a new m1.large instance running on Debian Buster (or m1.small for staging instances). Once the instance has been spawned, SSH in and follow these steps:

  1. Install PHP and Apache, along with some dependencies:
    sudo apt update && sudo apt -y upgrade
    sudo apt -y install php php-common
    sudo apt -y install php-cli php-fpm php-json php-xml php-mysql php-sqlite3 php-intl php-zip php-mbstring php-curl
    sudo apt -y install apache2 libapache2-mod-php
    sudo apt -y install zip unzip calibre
    
  2. Install composer by following these instructions, but make sure to install to the /usr/local/bin directory and with the filename composer, e.g.:
    sudo php composer-setup.php --install-dir=/usr/local/bin --filename=composer
    
  3. Clone the repository, first removing the html directory created by Apache.
    cd /var/www && sudo rm -rf html
    sudo git clone https://github.com/wsexport/tool.git
    cd /var/www/tool
    
  4. Become the root user with sudo su root
  5. Run sudo composer install --no-dev
  6. Make sure that all the files in the repo are owned by www-data.
    sudo chown -R www-data:www-data .
    
  7. Create the web server configuration file at /etc/apache2/sites-available/wsexport.conf with the following:
    <VirtualHost *:80>
            DocumentRoot /var/www/tool/public
            ServerName wsexport.wmflabs.org
            
            php_value memory_limit 512M
    
            # Requests with these user agents are denied.
            SetEnvIfNoCase User-Agent "(uCrawler|Baiduspider|CCBot|scrapy\.org|kinshoobot|YisouSpider|Sogou web spider|yandex\.com\/bots|twitterbot|TweetmemeBot|SeznamBot|datasift\.com\/bot|Googlebot|Yahoo! Slurp|Python-urllib|BehloolBot|MJ13bot|SemrushBot|facebookexternalhit|rcdtokyo\.com|Pcore-HTTP|yacybot|ltx71|RyteBot|bingbot|python-requests|Cloudflare-AMP|Mr\.4x3|MSIE 7\.0; AOL 9\.5|Acoo Browser|AcooBrowser|MSIE 6\.0; Windows NT 5\.1; SV1; QQDownload|\.NET CLR 2\.0\.50727|MSIE 7\.0; Windows NT 5\.1; Trident\/4\.0; SV1; QQDownload|Frontera|tigerbot|Slackbot|Discordbot|LinkedInBot|BLEXBot|filterdb\.iss\.net|SemanticScholarBot|FemtosearchBot|BrandVerity|Zuuk crawler|archive\.org_bot|mediawords bot|Qwantify\/Bleriot|Pinterestbot|EarwigBot|Citoid \(Wikimedia|GuzzleHttp|PageFreezer|Java\/|SiteCheckerBot|Re\-re Studio|^R \(|GoogleDocs|WinHTTP|cis455crawler|WhatsApp|Archive\-It|lua\-resty\-http|crawler4j|libcurl|dygg\-robot|GarlikCrawler|Gluten Free Crawler|WordPress|Paracrawl|7Siters|Microsoft Office Excel)" bad_bot=yes
    
            CustomLog ${APACHE_LOG_DIR}/access.log combined expr=!(reqenv('bad_bot')=='yes'||reqenv('dontlog')=='yes')
            ErrorLog ${APACHE_LOG_DIR}/error.log
    
            ScriptAlias /tool "/var/www/tool/public"
            <Directory /var/www/tool/public/>
                 Options Indexes FollowSymLinks
                 AllowOverride All
                 Require all granted
                 DirectoryIndex book.php
            </Directory>
    
            <Directory /var/www/tool/>
                    Options Indexes FollowSymLinks
                    AllowOverride None
                    Require all granted
                    Deny from env=bad_bot
            </Directory>
    
            ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
            <Directory /usr/lib/cgi-bin/>
                    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                    Require all granted
            </Directory>
    
            ErrorDocument 403 "Access denied"
            RewriteCond "%{HTTP_REFERER}" "^http://127\.0\.0\.1:(5500|8002)/index\.html" [NC]
            RewriteRule .* - [R=403,L]
            RewriteCond "%{HTTP_USER_AGENT}" "^[Ww]get"
            RewriteRule .* - [R=403,L]
            
            RewriteEngine On
            RewriteCond %{HTTP:X-Forwarded-Proto} !https
            RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R=301,L]
    </VirtualHost>
    
  8. Enable the mod-rewrite Apache module, and enable the web server configuration.
    sudo a2dismod mpm_event
    sudo a2enmod php7.3
    sudo a2enmod rewrite
    sudo a2ensite wsexport
    sudo service apache2 reload
    
  9. (Re)start Apache:
    sudo service apache2 restart
    
    Moving forward, you should use sudo service apache2 graceful to restart the server.