Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.

References

Going back to when kernel 2.6 / parted was first adopted not by just Anaconda, but many other installers, with official, to-the-spec Extended Interrupt 13h support, most everyone dual-booting ran into issues. And even when not, there were disconnects between the BIOS and disk geometry. Since then every installer has been rather conservative, preserving existing geometry.

One way to address this is to use parted to slice the disk so even when 255/63 head/sector geometry is utilized for various compatibility, things are LBA even. I’ve seen Anaconda able to do this at times. But there’s still a lot of legacy issues and conservative defaults involved. Which is why dropping to Alt-F2 and ensuring geometry is setup as you desire is the best way.

Until EFI/GPT takes over in the PC world, I don’t see this being always addressed automatically and programmatically. Just my view. /nowiki

WHY

blockquote Hard disks with a 4KiB physical sector size come in a couple of different configurations. In the first, a legacy logical sector size of 512 bytes is exported so that the O/S and applications can continue to treat the device as if it has a 512 byte physical sector size. Internally, the drive’s firmware will map the logical sectors to physical sectors, performing read/modify/write where necessary. Historically, the first partition on a device was created at sector 63. Since this does not line up on a 4KiB boundary, that means that all 4KiB I/Os will be misaligned. The 4KiB number is relevant for two reasons: first, that is the physical sector size of the device, and second, that is the page size of x86 architectures and the file system block size is bounded by the host’s page size. This means that as the file system issues a single 4KiB write or read, it triggers I/O to two physical disk sectors instead of one. When dealing with writes, it requires a read-modify-write of two sectors instead of a single write operation (requiring no read). As you can imagine, performance suffers, especially for random write workloads. /blockquote

blockquote The second configuration adopted by 4KiB physical sector size devices does not have this translation layer. Instead, a 4KiB logical sector size is exported. As a result, legacy BIOSes are not able to boot from these devices (UEFI is required). Also, any application which made an assumption about the physical sector size of the disk will encounter problems, most likely in the form of I/O errors. /blockquote

blockquote To address disks using the first configuration, the notion of I/O topology support was added to the Linux Kernel(TM). For devices that report topology information, the kernel and user-space utilities (such as parted) will be able to make informed decisions on how to layout data on the drives. The idea is to avoid misaligned I/O. If no topology information is available, then parted, for example, will start partitions at 1MiB. Other userspace utilities, such as lvm, have been modified to consume the topology information as well. This work was part of RHEL 6, and is available in 6.0. If you are running a release older than RHEL 6.0, then I would suggest manually partitioning these disks using sector mode, and ensure the start of each partition lies on a 4KiB boundary. /blockquote

blockquote Disks which export both a logical and a physical sector size of 4KiB are detected and should work fine under RHEL 6.0 as non-boot disks. Support for installing and booting from these devices (on UEFI-based machines) has been implemented for 6.1. /blockquote

blockquote For more information on the I/O topology support present in RHEL 6, please see: /blockquote

Aligning all layers of the stack

  • Move the partition’s beginning offset to 64
  • fdisk expert mode: fdisk /dev/mapper/mpath1
  • parted:
  • Make aligned PVs
  • Enable data_alignment_detection in lvm.conf
  • pvcreate –dataalignment=64k /dev/mapper/mpath1p1

Checking alignment attributes

F13 2.5 drive

[root@x200 ~]# dumpe2fs /dev/mapper/vg_x200-lv_root | fgrep -i block size
dumpe2fs 1.41.10 (10-Feb-2009)
Block size:               4096
[root@x200 ~]# vgs -v
    Finding all volume groups
    Finding volume group vg_x200
  VG      Attr   Ext    #PV #LV #SN VSize   VFree VG UUID
  vg_x200 wz--n- 32.00m   1   3   0 595.66g    0  Cj5BLG-DkKN-Cjzy-SnEB-ZjAS-x53Q-tOteK7
[root@x200 ~]# pvs -o +pe_start
  PV         VG      Fmt  Attr PSize   PFree 1st PE
  /dev/sda2  vg_x200 lvm2 a-   595.66g    0  192.00k
[root@x200 ~]# fgrep align /etc/lvm/lvm.conf  | egrep -v \#
    md_chunk_alignment = 1
    data_alignment_detection = 1
    data_alignment = 0
    data_alignment_offset_detection = 1
[root@x200 ~]# parted /dev/sda print
Model: ATA WDC WD6400BEVT-0 (scsi)
Disk /dev/sda: 640GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot
 2      525MB   640GB  640GB  primary               lvm
[root@x200 ~]# dd if=/dev/sda of=/tmp/sda.out bs=512 count=1; file /tmp/sda.out
1+0 records in
1+0 records out
512 bytes (512 B) copied, 9.6662e-05 s, 5.3 MB/s
/tmp/sda.out: x86 boot sector; GRand Unified Bootloader, stage1 version 0x3, boot drive 0x80, 1st sector stage2 0x84a42, GRUB version 0.94; \
partition 1: ID=0x83, active, starthead 32, startsector 2048, 1024000 sectors; \
partition 2: ID=0x8e, starthead 221, startsector 1026048, 1249236992 sectors, code offset 0x48
[root@x200 ~]# fdisk -lu /dev/sda

Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00019cd8

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2         1026048  1250263039   624618496   8e  Linux LVM
[root@x200 ~]# for i in `find  /sys/block/sda/ | fgrep -v -e sda1 -e sda2  | fgrep -e alignment -e block_size -e io_size` ; do echo -n $i   ; cat $i; done;/sys/block/sda/alignment_offset  0
/sys/block/sda/discard_alignment  0
/sys/block/sda/queue/logical_block_size  512
/sys/block/sda/queue/physical_block_size  512
/sys/block/sda/queue/minimum_io_size  512
/sys/block/sda/queue/optimal_io_size  0
[root@x200 ~]# sg_inq -p 0xb0 /dev/sda
VPD INQUIRY: Block limits page (SBC)
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 0
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
[root@x200 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD6400BEVT-0 Rev: 01.0
  Type:   Direct-Access                    ANSI  SCSI revision: 05

Alignment Scenarios

Scenario 1

Disk-alignment-1

Scenario 2

Disk-alignment-2

Scenario 3

Disk-alignment-6

Scenario 4

Disk-alignment-3

Scenario 5

Disk-alignment-4

Scenario 6

Disk-alignment-5

Kickstart

#!/bin/bash

SANVENDOR=`cat /proc/scsi/scsi  | fgrep -e NETAPP -e EMC | awk '{print $2}' | sort | uniq`
SIZE1=`parted /dev/mapper/mpath1 print | grep Disk | awk '{print $3}'`

#echo $SANVENDOR

if [ $SANVENDOR == NETAPP ]; then
        echo it's a $SANVENDOR with a size of $SIZE1

        vgchange -an user  /dev/null

        dd if=/dev/zero of=/dev/mapper/mpath1 bs=512 count=1  /dev/null

        echo ######### parted
        parted -s /dev/mapper/mpath1 mktable msdos
        parted -s /dev/mapper/mpath1 mkpart primary 64s $SIZE1
        parted -s /dev/mapper/mpath1 set 1 lvm on

        dd if=/dev/zero of=/dev/mapper/mpath1p1 bs=1M count=1  /dev/null

        echo
        echo ######### pvcreate,vgcreate,lvcreate
        pvremove --force /dev/mapper/mpath1p1
        pvcreate --force  /dev/mapper/mpath1p1
        vgcreate --alloc contiguous --physicalextentsize 65536k user /dev/mapper/mpath1p1

        lvcreate -L 19500 -n data user
        lvcreate -L 19500 -n apps user

        echo ######### mkfs
        mkfs.ext3 /dev/mapper/user-data
        mkfs.ext3 /dev/mapper/user-apps

        echo ######### parted print
        parted /dev/mapper/mpath1 print
        echo ######### fdisk -lu
        fdisk -lu /dev/mapper/mpath1

        echo ######### pvs
        pvs -o +pe_start
        echo ######### vgs
        vgs -v -o +vg_extent_size user
        echo ######### lvs
        lvs -o +seg_count,seg_size,seg_pe_ranges user

        echo ######### dumpe2fs
        dumpe2fs /dev/mapper/user-data | fgrep -i block size
        dumpe2fs /dev/mapper/user-apps | fgrep -i block size

        #echo /dev/user/apps          /apps                   ext3    defaults        1 2  /etc/fstab
        #echo /dev/user/data          /data                   ext3    defaults        1 2  /etc/fstab

fi