You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

PERCCli

From Wikitech-static
Revision as of 10:04, 23 August 2022 by imported>Slyngshede
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Later models (2022) versions of the Dell PowerEdge servers have migrated from the MegaRAID controller and the MegaCli tool to PERC6 (PowerEdge RAID Controller) cards. Management of these RAID cards is done via Dells perccli and perccli64 tool. We will be using the perccli64, which is the 64 bit version.

The full manual from Dell for the perccli tool is available here: Dell EMC PowerEdge RAID Controller Command Line Interface

Installation

The perccli64 tool is installed by Puppet on the relevant servers. The servers with the PERC cards are identified by the PCI ID of those cards. As we do not have permission to redistribute the Dell software, the DEB packages are only available via our private APT repository (See Reprepro of information on the private repo).


Monitoring

Integration into monitoring is handled by the get-raid-status-perccli script. This can also be manually run using:

 $ sudo /usr/local/lib/nagios/plugins/get-raid-status-perccli

The perccli64 tool supports exporting data to JSON, using the "J" option after each command. E.g. the following command will list all installed RAID controllers as JSON:

 $ sudo perccli64 show all J

To use the perccli64 tool to locate any errors, first find the available controllers and their IDs. We will typically only have one controller per server, make the ID always be 0. Verify that this is the case using:

 $ sudo perccli64 show all
 CLI Version = 007.1910.0000.0000 Oct 08, 2021
 Operating system = Linux 5.10.0-17-amd64
 Status Code = 0
 Status = Success
 Description = None
 
 Number of Controllers = 1
 Host Name = dumpsdata1007
 Operating System  = Linux 5.10.0-17-amd64
 
 System Overview :
 ===============
 
 ---------------------------------------------------------------------------
 Ctl Model           Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth
 ---------------------------------------------------------------------------
   0 PERCH750Adapter     8  14   2     0   2     0 Opt On  -  N      0 Opt
 ---------------------------------------------------------------------------
 
 ...

Battery status

To view the state of the BBU (Battery Backup Unit) run the following command, where /c0 is controller 0 (the first controller)

 $ sudo perccli64 /c0/bbu show status
 CLI Version = 007.1910.0000.0000 Oct 08, 2021
 Operating system = Linux 5.10.0-17-amd64
 Controller = 0
 Status = Success
 Description = None
 
 
 BBU_Info :
 ========
 
 ----------------------
 Property      Value
 ----------------------
 Type          BBU
 Voltage       3938 mV
 Current       0 mA
 Temperature   36 C
 Battery State Optimal
 ----------------------
 ...

Relearn cycle

A BBU learn cycle means that the battery will be fully discharged, and recharged, to allow the controller to become aware of reduced battery capacity over time. After a cycle the controller will update with the new information about the new capacity of the BBU. A cycle may take in excess of 24 hours.

Check that a cycle in not currently running, or see when the next cycle will start automatically:

 $ sudo perccli64 /c0/bbu show learn
 ... 
 BBU Learn :
 =========
 
 -----------------------------------------------------
 Property           Value
 -----------------------------------------------------
 Auto Learn Mode    Transparent
 Schedule Time      SUN, October 30, 2022 at 14:38:57
 Interval           12 Weeks 6 Days
 Learn Cycle Active No
 -----------------------------------------------------

To force a relearning cycle run:

 $ sudo perccli /c0/bbu start learn


Virtual disk (array) status

To list all virtual disk (RAID arrays) and their status run:

 $ sudo perccli64 /c0/vall show
 Virtual Drives :
 ==============
 
 ----------------------------------------------------------------
 DG/VD TYPE   State Access Consist Cache Cac sCC       Size Name
 ----------------------------------------------------------------
 1/238 RAID1  Optl  RW     Yes     RWBD  -   OFF 446.625 GB
 0/239 RAID10 Optl  RW     Yes     RWBD  -   OFF  43.661 TB
 ----------------------------------------------------------------

The information above indicates that we have two disk groups (DG), 0 and 1, also known as virtual disks (VD) 238 and 239. Both are currently "Optimal" (Optl).

More details on the individual VDs can be had by running the following, where v238 is virtual disk 238, the 446GB array show by the previous command:

 $ sudo perccli64 /c0/v238 show all
 ... 
 PDs for VD 238 :
 ============== 
 
 -----------------------------------------------------------------------------
 EID:Slt DID State DG       Size Intf Med SED PI SeSz Model           Sp Type
 -----------------------------------------------------------------------------
 64:12     0 Onln   1 446.625 GB SATA SSD N   N  512B HFS480G3H2X069N U  -
 64:13     4 Onln   1 446.625 GB SATA SSD N   N  512B HFS480G3H2X069N U  -
 -----------------------------------------------------------------------------
 ...
 VD238 Properties :  
 ================
 Strip Size = 512 KB
 Number of Blocks = 936640512
 Span Depth = 1
 Number of Drives Per Span = 2
 Write Cache(initial setting) = WriteBack
 Disk Cache Policy = Disk's Default
 Encryption = None
 Data Protection = None
 Active Operations = None
 Exposed to OS = Yes
 OS Drive Name = /dev/sda
 Creation Date = 29-06-2022
 Creation Time = 04:55:14 PM
 Emulation type = default
 Cachebypass size = Cachebypass-64k
 Cachebypass Mode = Cachebypass Intelligent
 Is LD Ready for OS Requests = Yes
 SCSI NAA Id = 670b5e80fe06a9002a4f4072aa6e59b2
 Unmap Enabled = N/A

Among other useful information this helps identify the disk, in this case /dev/sda, and the physical devices used to construct the virtual disk, as well as their state.

Physical drives

To identify a and debug physical drive we need to be able to identify and locate the physical devices. The following command will output a lot of information, but we mainly care about the topology (If we don't already know the layout of the virtual disks) and the drive list.

 $ sudo perccli64 /c0/dall show all
 ...
 ------------------------------------------------------------------------------
 DG Arr Row EID:Slot DID Type   State BT       Size PDC  PI SED DS3  FSpace TR
 ------------------------------------------------------------------------------
  0 -   -   -        -   RAID10 Optl  N   43.661 TB dflt N  N   dflt N      N
  0 0   -   -        -   RAID1  Optl  N   43.661 TB dflt N  N   dflt N      N
  0 0   0   64:0     1   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   1   64:1     2   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   2   64:2     6   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   3   64:3     7   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   4   64:4     5   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   5   64:5     9   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   6   64:6     3   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   7   64:7     12  DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   8   64:8     8   DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   9   64:9     11  DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   10  64:10    10  DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  0 0   11  64:11    13  DRIVE  Onln  N    7.276 TB dflt N  N   dflt -      N
  1 -   -   -        -   RAID1  Optl  N  446.625 GB dflt N  N   dflt N      N
  1 0   -   -        -   RAID1  Optl  N  446.625 GB dflt N  N   dflt N      N
  1 0   0   64:12    0   DRIVE  Onln  N  446.625 GB dflt N  N   dflt -      N
  1 0   1   64:13    4   DRIVE  Onln  N  446.625 GB dflt N  N   dflt -      N
 ------------------------------------------------------------------------------
 
 ...
 DG Drive LIST :
 =============
 
 ----------------------------------------------------------------------------------
 EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                Sp Type
 ----------------------------------------------------------------------------------
 64:0      1 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:1      2 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:2      6 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:3      7 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:4      5 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:5      9 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:6      3 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:7     12 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:8      8 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:9     11 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:10    10 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:11    13 Onln   0   7.276 TB SATA HDD N   N  512B TOSHIBA MG06ACA800EY U  -
 64:12     0 Onln   1 446.625 GB SATA SSD N   N  512B HFS480G3H2X069N      U  -
 64:13     4 Onln   1 446.625 GB SATA SSD N   N  512B HFS480G3H2X069N      U  -
 ----------------------------------------------------------------------------------

Like with the controllers there will typically only be one enclosure, and we just need to get the correct ID, in this case EID 64.

Replace a drive

To identify the failed disk (in this example disk 12) in the enclosure (enclosure 64) and stop the disk, run:

  $ sudo perccli64 /c0[/e64]/s12 start locate
  $ sudo perccli64 /c0/e64/s12 spindown

This will light up the indicator LED for that drive. To do the reverse use "stop locate" and "spinup".

Verify that the virtual disk is rebuilding after drive replacement (where the virtual disk is ID 238 on controller 0):

 $ sudo perccli64 /c0/v238 show