You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Uploading large files: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Dereckson
(Now 5 GB max, per rOMWCa6e5f230)
imported>Reedy
(→‎Step 1: download files: +seemingly missing word)
 
(7 intermediate revisions by 5 users not shown)
Line 5: Line 5:
* the name of the user account for this first revision and upload
* the name of the user account for this first revision and upload


Per [https://phabricator.wikimedia.org/rOMWCa6e5f230be5c0d8f9da181f8509e4cf3062419ea rOMWCa6e5f230] the maximum file size is 5 GB.
MediaWiki currently doesn't support files greater than 4 GB (as size is stored as a 32 bits unsigned integer) while our swift backend storage is limited to 5 Gb. See [[phab:T191804]] and [[phab:T191802]] for discussion to extend this limit respectively to 5 GB and beyond.


== Step 1: download files ==
== Step 1: download files ==


Download the files to terbium (or if there's not enough space, tin) with http_proxy=webproxy.eqiad.wmnet:8080 (or https_proxy for https requests)
Download the files to [[mwmaint1002]] (or if there's not enough space, deploy1001).


(url-downloader.wikimedia.org:8080 has a limit just under 1GB - see [https://phabricator.wikimedia.org/T111941])
<syntaxhighlight lang="bash">
wget <URL>
</syntaxhighlight>


<source lang="bash">
At this stage, it could be pertinent to check the hash of the file if known.
curl -O -x webproxy.eqiad.wmnet:8080 <URL>
</source>


At this stage, it could be pertinent to check the hash of the file if known.
Requestors are advised to provide direct link to the to-upload-file. However, ocasionally, they do not do that, and use a public cloud service instead, which usually don't provide direct download links (like Google Drive).
 
From Google Drive, it is possible to download a file using it's unique ID via [[:en:rclone]]:


== Step 2: import image to Commons ==
<syntaxhighlight lang="bash">
urbanecm@titanium  /nas/urbanecm/wmf-rehosting
$ rclone -P backend copyid <config>: '<fileid>' '<filename>'
</syntaxhighlight>


<source lang="bash">
where:
mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=USERNAME /tmp/uploads
</source>


== Tip and tricks ==
* <config> refers to the name of the rclone config entry (you can use <code>rclone config</code> to see/edit the config entries)
=== Google Drive ===
* <fileid> is the ID of the file at Google (for https://drive.google.com/file/d/1K9QrMXyhPqlvc-vQRjVmT8YrbgYjfelC/view, the ID is <code>1K9QrMXyhPqlvc-vQRjVmT8YrbgYjfelC</code>
It's also possible to download a file from Google Drive, but you need to transform the URL and ask curl to follow redirects:
* <filename> is the name you want to store the file under


<source lang="bash">
Since rclone is not installed at production servers, this requires copying the file first to a temporary location first and then transfering to the maintenance server, however, it does not mean downloading the file to administrator's own laptop (which might have capacity or connection speed issues).
curl -L -O -x webproxy.eqiad.wmnet:8080 https://googledrive.com/host/0BxoqnOKDr5j5NWJHQ0RlM2VJcWM
</source>


You have to use <nowiki>https://googledrive.com/host/<the file ID></nowiki> as URL.
== Step 2: import image to Commons ==
Server-side uploads run much faster because of minimal network overhead, and as a result can cause extra strain on the job queue, especially videos which require transcoding. It's recommended to add some delay in between each upload with the <code>--sleep</code> parameter. Because videos have various factors (resolution, fps, length) that would affect how long transcodes would take, it might be worth uploading one video, seeing how long the median transcode takes, and then sleeping for that length to avoid queuing up a large number of transcodes.
<syntaxhighlight lang="bash">
mwscript importImages.php --wiki=commonswiki --sleep=SECONDS --comment-ext=txt --user=USERNAME /tmp/uploads
</syntaxhighlight>

Latest revision as of 13:57, 20 December 2021

Requirement

You need:

  • the URL of the media file to upload
  • a text file with the first revision content
  • the name of the user account for this first revision and upload

MediaWiki currently doesn't support files greater than 4 GB (as size is stored as a 32 bits unsigned integer) while our swift backend storage is limited to 5 Gb. See phab:T191804 and phab:T191802 for discussion to extend this limit respectively to 5 GB and beyond.

Step 1: download files

Download the files to mwmaint1002 (or if there's not enough space, deploy1001).

wget <URL>

At this stage, it could be pertinent to check the hash of the file if known.

Requestors are advised to provide direct link to the to-upload-file. However, ocasionally, they do not do that, and use a public cloud service instead, which usually don't provide direct download links (like Google Drive).

From Google Drive, it is possible to download a file using it's unique ID via en:rclone:

urbanecm@titanium  /nas/urbanecm/wmf-rehosting
$ rclone -P backend copyid <config>: '<fileid>' '<filename>'

where:

Since rclone is not installed at production servers, this requires copying the file first to a temporary location first and then transfering to the maintenance server, however, it does not mean downloading the file to administrator's own laptop (which might have capacity or connection speed issues).

Step 2: import image to Commons

Server-side uploads run much faster because of minimal network overhead, and as a result can cause extra strain on the job queue, especially videos which require transcoding. It's recommended to add some delay in between each upload with the --sleep parameter. Because videos have various factors (resolution, fps, length) that would affect how long transcodes would take, it might be worth uploading one video, seeing how long the median transcode takes, and then sleeping for that length to avoid queuing up a large number of transcodes.

mwscript importImages.php --wiki=commonswiki --sleep=SECONDS --comment-ext=txt --user=USERNAME /tmp/uploads