legacy-wiki
Disk Alignment
Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.
References
-
Netapp: Database Layout with Data ONTAP
-
starting partition offset should be divisible by 4096 [4k] which is the block size for the WAFL file system
-
for best performance partitions should start on a 4096 byte [4k] boundary, as 4096=8*512
-
fdisk was used to decrease the number of sectors per track to 56 and to proportionally increase the number of cylinders
-
http://www.yellow-bricks.com/2010/04/08/aligning-your-vms-virtual-harddisks/
-
http://blogs.netapp.com/storage_nuts_n_bolts/2009/01/mbrscanmbralign.html
-
http://netapptips.com/2010/03/16/lun-types-and-linux-partition-alignment/
-
https://www.redhat.com/archives/linux-lvm/2009-March/msg00023.html
-
PV extent size defaults to 64k which is a multiple of 4k [which will be aligned when there are no partitions]
-
https://www.redhat.com/archives/lvm-devel/2009-January/msg00148.html
-
http://post-office.corp.redhat.com/archives/tech-list/2009-July/msg00031.html
-
make certain that the start of a PV’s data area (pe_start) is aligned on a full MD stripe width boundary.
-
To do so each LVM2 PV should be created using the pvcreate –dataalignment argument, e.g.: pvcreate –dataalignment 256K …
-
the start of a PV’s data area (pe_start) can be determined with: pvs -o +pe_start
-
http://post-office.corp.redhat.com/archives/tech-list/2009-July/msg00032.html
-
[M]ost disks have firmware that does selective read ahead. This basically means that for very small requests, even though the disk could read the remainder of a cylinder in preemptively for performance reasons, it won’t because it assumes you’ll never touch the subsequent data. If you want decent sequential I/O performance, your requests must be larger than the disk firmware’s cutoff for a small request. That cutoff is usually in the 8-16k range. So, if you aren’t issuing at least 8-16k read requests, expect the sequential performance of the array to drop like a rock.
-
If you do the striped LVM, then you’ve already done away with the increased data safety you got by splitting up the raid5 arrays in the first place, so I would just do one big raid5 array with no LVM.
-
the kernel/sysfs enhancements are not in RHEL5 or below (as of 2.6.18-194.11.1)
-
http://post-office.corp.redhat.com/archives/tech-list/2010-January/msg00263.html
-
https://www.redhat.com/archives/rhelv5-list/2010-August/msg00092.html nowiki A huge problem in the PC world is that many devices and Windows itself formats with 255/63 heads/sectors, which results in a non-even cylinder. Anaconda is only going to preserve the existing geometry, for compatibility. It’s not Anaconda causing the issue, at least that has been my experience.
Going back to when kernel 2.6 / parted was first adopted not by just Anaconda, but many other installers, with official, to-the-spec Extended Interrupt 13h support, most everyone dual-booting ran into issues. And even when not, there were disconnects between the BIOS and disk geometry. Since then every installer has been rather conservative, preserving existing geometry.
One way to address this is to use parted to slice the disk so even when 255/63 head/sector geometry is utilized for various compatibility, things are LBA even. I’ve seen Anaconda able to do this at times. But there’s still a lot of legacy issues and conservative defaults involved. Which is why dropping to Alt-F2 and ensuring geometry is setup as you desire is the best way.
Until EFI/GPT takes over in the PC world, I don’t see this being always addressed automatically and programmatically. Just my view. /nowiki
WHY
blockquote Hard disks with a 4KiB physical sector size come in a couple of different configurations. In the first, a legacy logical sector size of 512 bytes is exported so that the O/S and applications can continue to treat the device as if it has a 512 byte physical sector size. Internally, the drive’s firmware will map the logical sectors to physical sectors, performing read/modify/write where necessary. Historically, the first partition on a device was created at sector 63. Since this does not line up on a 4KiB boundary, that means that all 4KiB I/Os will be misaligned. The 4KiB number is relevant for two reasons: first, that is the physical sector size of the device, and second, that is the page size of x86 architectures and the file system block size is bounded by the host’s page size. This means that as the file system issues a single 4KiB write or read, it triggers I/O to two physical disk sectors instead of one. When dealing with writes, it requires a read-modify-write of two sectors instead of a single write operation (requiring no read). As you can imagine, performance suffers, especially for random write workloads. /blockquote
blockquote The second configuration adopted by 4KiB physical sector size devices does not have this translation layer. Instead, a 4KiB logical sector size is exported. As a result, legacy BIOSes are not able to boot from these devices (UEFI is required). Also, any application which made an assumption about the physical sector size of the disk will encounter problems, most likely in the form of I/O errors. /blockquote
blockquote To address disks using the first configuration, the notion of I/O topology support was added to the Linux Kernel(TM). For devices that report topology information, the kernel and user-space utilities (such as parted) will be able to make informed decisions on how to layout data on the drives. The idea is to avoid misaligned I/O. If no topology information is available, then parted, for example, will start partitions at 1MiB. Other userspace utilities, such as lvm, have been modified to consume the topology information as well. This work was part of RHEL 6, and is available in 6.0. If you are running a release older than RHEL 6.0, then I would suggest manually partitioning these disks using sector mode, and ensure the start of each partition lies on a 4KiB boundary. /blockquote
blockquote Disks which export both a logical and a physical sector size of 4KiB are detected and should work fine under RHEL 6.0 as non-boot disks. Support for installing and booting from these devices (on UEFI-based machines) has been implemented for 6.1. /blockquote
blockquote For more information on the I/O topology support present in RHEL 6, please see: /blockquote
Aligning all layers of the stack
- Move the partition’s beginning offset to 64
- fdisk expert mode: fdisk /dev/mapper/mpath1
- parted:
- Make aligned PVs
- Enable data_alignment_detection in lvm.conf
- pvcreate –dataalignment=64k /dev/mapper/mpath1p1
Checking alignment attributes
F13 2.5 drive
[root@x200 ~]# dumpe2fs /dev/mapper/vg_x200-lv_root | fgrep -i block size
dumpe2fs 1.41.10 (10-Feb-2009)
Block size: 4096
[root@x200 ~]# vgs -v
Finding all volume groups
Finding volume group vg_x200
VG Attr Ext #PV #LV #SN VSize VFree VG UUID
vg_x200 wz--n- 32.00m 1 3 0 595.66g 0 Cj5BLG-DkKN-Cjzy-SnEB-ZjAS-x53Q-tOteK7
[root@x200 ~]# pvs -o +pe_start
PV VG Fmt Attr PSize PFree 1st PE
/dev/sda2 vg_x200 lvm2 a- 595.66g 0 192.00k
[root@x200 ~]# fgrep align /etc/lvm/lvm.conf | egrep -v \#
md_chunk_alignment = 1
data_alignment_detection = 1
data_alignment = 0
data_alignment_offset_detection = 1
[root@x200 ~]# parted /dev/sda print
Model: ATA WDC WD6400BEVT-0 (scsi)
Disk /dev/sda: 640GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 525MB 524MB primary ext4 boot
2 525MB 640GB 640GB primary lvm
[root@x200 ~]# dd if=/dev/sda of=/tmp/sda.out bs=512 count=1; file /tmp/sda.out
1+0 records in
1+0 records out
512 bytes (512 B) copied, 9.6662e-05 s, 5.3 MB/s
/tmp/sda.out: x86 boot sector; GRand Unified Bootloader, stage1 version 0x3, boot drive 0x80, 1st sector stage2 0x84a42, GRUB version 0.94; \
partition 1: ID=0x83, active, starthead 32, startsector 2048, 1024000 sectors; \
partition 2: ID=0x8e, starthead 221, startsector 1026048, 1249236992 sectors, code offset 0x48
[root@x200 ~]# fdisk -lu /dev/sda
Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00019cd8
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 1026047 512000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 1026048 1250263039 624618496 8e Linux LVM
[root@x200 ~]# for i in `find /sys/block/sda/ | fgrep -v -e sda1 -e sda2 | fgrep -e alignment -e block_size -e io_size` ; do echo -n $i ; cat $i; done;/sys/block/sda/alignment_offset 0
/sys/block/sda/discard_alignment 0
/sys/block/sda/queue/logical_block_size 512
/sys/block/sda/queue/physical_block_size 512
/sys/block/sda/queue/minimum_io_size 512
/sys/block/sda/queue/optimal_io_size 0
[root@x200 ~]# sg_inq -p 0xb0 /dev/sda
VPD INQUIRY: Block limits page (SBC)
Optimal transfer length granularity: 1 blocks
Maximum transfer length: 0 blocks
Optimal transfer length: 0 blocks
Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
Maximum unmap LBA count: 0
Maximum unmap block descriptor count: 0
Optimal unmap granularity: 0
Unmap granularity alignment valid: 0
Unmap granularity alignment: 0
[root@x200 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD6400BEVT-0 Rev: 01.0
Type: Direct-Access ANSI SCSI revision: 05
Alignment Scenarios
Scenario 1

Scenario 2

Scenario 3

Scenario 4

Scenario 5

Scenario 6

Kickstart
#!/bin/bash
SANVENDOR=`cat /proc/scsi/scsi | fgrep -e NETAPP -e EMC | awk '{print $2}' | sort | uniq`
SIZE1=`parted /dev/mapper/mpath1 print | grep Disk | awk '{print $3}'`
#echo $SANVENDOR
if [ $SANVENDOR == NETAPP ]; then
echo it's a $SANVENDOR with a size of $SIZE1
vgchange -an user /dev/null
dd if=/dev/zero of=/dev/mapper/mpath1 bs=512 count=1 /dev/null
echo ######### parted
parted -s /dev/mapper/mpath1 mktable msdos
parted -s /dev/mapper/mpath1 mkpart primary 64s $SIZE1
parted -s /dev/mapper/mpath1 set 1 lvm on
dd if=/dev/zero of=/dev/mapper/mpath1p1 bs=1M count=1 /dev/null
echo
echo ######### pvcreate,vgcreate,lvcreate
pvremove --force /dev/mapper/mpath1p1
pvcreate --force /dev/mapper/mpath1p1
vgcreate --alloc contiguous --physicalextentsize 65536k user /dev/mapper/mpath1p1
lvcreate -L 19500 -n data user
lvcreate -L 19500 -n apps user
echo ######### mkfs
mkfs.ext3 /dev/mapper/user-data
mkfs.ext3 /dev/mapper/user-apps
echo ######### parted print
parted /dev/mapper/mpath1 print
echo ######### fdisk -lu
fdisk -lu /dev/mapper/mpath1
echo ######### pvs
pvs -o +pe_start
echo ######### vgs
vgs -v -o +vg_extent_size user
echo ######### lvs
lvs -o +seg_count,seg_size,seg_pe_ranges user
echo ######### dumpe2fs
dumpe2fs /dev/mapper/user-data | fgrep -i block size
dumpe2fs /dev/mapper/user-apps | fgrep -i block size
#echo /dev/user/apps /apps ext3 defaults 1 2 /etc/fstab
#echo /dev/user/data /data ext3 defaults 1 2 /etc/fstab
fi