Ext4 布局及inode,block信息统计

inode定义

node

The inode (index node) is a data structure in a Unix-style file system 
that describes a file-system object such as a file or a directory. 
Each inode stores the attributes and disk block locations of the object's data.
File-system object attributes may include metadata (times of last change,[2] access, modification), 
as well as owner and permission data.[3]

from inode wiki

inode数据结构用于描述文件系统中的文件、目录等, 每一个inode保存了文件系统系统对象memdata如修改时间,访问时间以及权限等。Dennis Ritchie对于inodei可能代表index,而被访问的文件列表,被组织为存放存放在磁盘上的一维数组。

block

ext4文件系统以block为单位分配存储空间。

环境准备

Linux Distributions: CentOS 7

通过fdisk划出一个10M分区, 格式化为ext4

$ fdisk /dev/sdb
...
#         Start          End    Size  Type            Name
 1         2048      8390655      4G  Linux filesyste
 2      8390656      8411135     10M  Linux filesyste
[root@centosgpt ~]#

$ mkfs.ext4 /dev/sdb2
$ mount /dev/sdb2 /root/inode

ext4布局

Group 0 Padding ext4 Super Block Group Descriptors Reserved GDT Blocks Data Block Bitmap inode Bitmap inode Table Data Blocks
1024 bytes 1 block many blocks many blocks 1 block 1 block many blocks many more blocks

由于支持Flexible且 Flex block group size:16, 下面通过dumpe2fs输出的信息可以看到 Data Block Bitmap和inode Bitmap 多扩展到了16个block长(实际使用了其中的2个)

实验

磁盘分区

  • 通过fdisk划出一个10M分区, 格式化为ext4
$ fdisk /dev/sdb
...
#         Start          End    Size  Type            Name
 1         2048      8390655      4G  Linux filesyste
 2      8390656      8411135     10M  Linux filesyste
[root@centosgpt ~]#

$ mkfs.ext4 /dev/sdb2
$ mount /dev/sdb2 /root/inode
  • 可以看到inode相关配置

cat /etc/mke2fs.conf

...
[defaults]
...
        blocksize = 4096
        inode_size = 256
        inode_ratio = 16384
...

[fs_types]

        ext4 = {
               features = has_journal,extent,huge_file,flex_bg,uninit_bg,
               dir_nlink,extra_isize,64bit
               inode_size = 256
               inode_ratio = 16384
        }

        small = {
                blocksize = 1024
                inode_size = 128
                inode_ratio = 4096
        }

...

如果不指定参数,10M的分区按照 small这种文件系统类型创建,

根据fs_blocks_count和默认的blocksize 1024进行匹配出

    meg = (1024 * 1024) / EXT2_BLOCK_SIZE(sb);
    if (fs_blocks_count < 3 * meg)
        size_type = "floppy";
    else if (fs_blocks_count < 512 * meg)
        size_type = "small";
    else if (fs_blocks_count < 4 * 1024 * 1024 * meg)
        size_type = "default";
    else if (fs_blocks_count < 16 * 1024 * 1024 * meg)
        size_type = "big";
    else
        size_type = "huge";

其中可以通过 ioctl传送相关指令计算设备的大小。可以看到不同指令不同。BLKGETSIZE返回磁盘的

#if defined(__linux__) && defined(_IO) && !defined(BLKGETSIZE)
#define BLKGETSIZE _IO(0x12,96)    
/* return device size */
#endif

#if defined(__linux__) && defined(_IOR) && !defined(BLKGETSIZE64)
#define BLKGETSIZE64 _IOR(0x12,114,size_t) 
/* return device size in bytes (u64 *arg) */
#endif

格式化ext4文件系统

[root@centosgpt ~]# mkfs.ext4  /dev/sdb2
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
2560 inodes, 10240 blocks
512 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=10485760
2 block groups
8192 blocks per group, 8192 fragments per group
1280 inodes per group
Superblock backups stored on blocks:
        8193

Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

查询文件系统信息

[root@centosgpt ~]# dumpe2fs /dev/sdb2
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name:   &lt;none&gt;
Last mounted on:          &lt;not available&gt;
Filesystem UUID:          2ef6b0f7-7591-4018-87bd-e0d4141852fe
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              2560
Block count:              10240
Reserved block count:     512
Free blocks:              8715
Free inodes:              2549
First block:              1
Block size:               1024
Fragment size:            1024
Group descriptor size:    64
Reserved GDT blocks:      79
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         1280
Inode blocks per group:   160
Flex block group size:    16
Filesystem created:       Fri Sep 11 10:56:03 2020
Last mount time:          n/a
Last write time:          Fri Sep 11 10:56:03 2020
Mount count:              0
Maximum mount count:      -1
Last checked:             Fri Sep 11 10:56:03 2020
Check interval:           0 (&lt;none&gt;)
Lifetime writes:          1190 kB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      2dcd05a3-e862-4734-bc0d-e033a7b00f5c
Journal backup:           inode blocks
Journal features:         (none)
Journal size:             1024k
Journal length:           1024
Journal sequence:         0x00000001
Journal start:            0

Group 0: (Blocks 1-8192)
  Checksum 0xad60, unused inodes 1269
  Primary superblock at 1, Group descriptors at 2-2
  Reserved GDT blocks at 3-81
  Block bitmap at 82 (+81), Inode bitmap at 98 (+97)
  Inode table at 114-273 (+113)
  6749 free blocks, 1269 free inodes, 2 directories, 1269 unused inodes
  Free blocks: 1444-8192
  Free inodes: 12-1280
Group 1: (Blocks 8193-10239) [INODE_UNINIT]
  Checksum 0xd9bb, unused inodes 1280
  Backup superblock at 8193, Group descriptors at 8194-8194
  Reserved GDT blocks at 8195-8273
  Block bitmap at 83 (bg #0 + 82), Inode bitmap at 99 (bg #0 + 98)
  Inode table at 274-433 (bg #0 + 273)
  1966 free blocks, 1280 free inodes, 0 directories, 1280 unused inodes
  Free blocks: 8274-10239
  Free inodes: 1281-2560

上方是superblock信息, 下面是块组信息:

可以看到在Superblock 和Group descriptors除了在Group0中存储外, 在Group1 中也进行了备份, Super Block和 Group descriptors.

Group0 iblock使用

Group0 块使用情况: 共:8192 blocks

存放内容 大小(block)
super block 1
group descriptors 1
Reserved GDT blocks 79
Block bitmap 1+1
Inode bitmap 1+1
Inode table 160+160
First data block 1
Journal 1024
free blocks 6749

First data block. This must be at least 1 for 1k-block filesystems and is typically 0 for all other block sizes.

1+1+79+2+2+320+1+1024 = 1430, 1443-1431+1=13, 还差13个block

查看mkfs.ext4 v1.42.9相关代码, 剩余13个iblock分布主要是rootdir和lost+found目录使用。

  • 创建目录结构:

https://github.com/tytso/e2fsprogs/blob/v1.42.9/misc/mke2fs.c

...
2359 int main (int argc, char *argv[])
2360 {
        ...
2629    create_root_dir(fs);
2630    create_lost_and_found(fs);
        ...
     }
...
  • root_dir创建, 申请1个block

create_root_dir主要工作是创建目录和写入相关inode信息

358 static void create_root_dir(ext2_filsys fs)
{
     ext2fs_mkdir(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, 0);
     ...
}

可以看到在创建目录是会申请1个block
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/mkdir.c

errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum, const char *name)
{
/*
 * Allocate a data block for the directory
 */
  retval = ext2fs_new_block2(fs, 0, 0, &blk);

}

  • lost+found目录创建, 申请12个block

create_lost_and_found 通过 ext2fs_mkdir 申请1个block,通过ext2fs_expand_dir创建11个block

https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/ext2_fs.h

#define EXT2_NDIR_BLOCKS       12

https://github.com/tytso/e2fsprogs/blob/v1.42.9/misc/mke2fs.c

static void create_lost_and_found(ext2_filsys fs)
{
   retval = ext2fs_mkdir(fs, EXT2_ROOT_INO, 0, name);
   for (i=1; i < EXT2_NDIR_BLOCKS; i++) {
      ...
      ext2fs_expand_dir(fs, ino);
      ...
   }
   ...
}
  • 关于 保留inode相关目录信息创建, ex2fs_expand_dir 调用链如下,其中涉及申请iblock 在 expand_dir_proc中实现

https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/expanddir.c

errcode_t ext2fs_expand_dir(ext2_filsys fs, ext2_ino_t dir)
{
  ...
  retval = ext2fs_block_iterate3(fs, dir, BLOCK_FLAG_APPEND,\
  0, expand_dir_proc, &es);
  ...
}

可以看到ext2fs_block_iterate3最终调用 expand_dir_proc创建相关目录的block

https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/block.c

errcode_t ext2fs_block_iterate3(ext2_filsys fs,
    ext2_ino_t ino,
    int flags,
    char *block_buf,
    int (*func)(ext2_filsys fs,
               blk64_t  *blocknr,
               e2_blkcnt_t  blockcnt,
               blk64_t  ref_blk,
               int      ref_offset,
               void *priv_data),
    void *priv_data)
static int expand_dir_proc(...)
{
  ...
  retval = ext2fs_new_dir_block(fs, 0, 0, &block);
  ...
}
Group0 inode使用
存放内容 大小(inode)
root dir 1
lost and found 1
Reserved inodes 8
EXT2_BAD_INO 1
free blocks 6749
  • root_dir, lost+found 分别申请1个inode , 共2个inode
  • reserve_inodes保留的inode节点 3-10 , 共申请 8个inode
static void reserve_inodes(ext2_filsys fs)
{
    ext2_ino_t  i;

    for (i = EXT2_ROOT_INO + 1; i < EXT2_FIRST_INODE(fs->super); i++)
        ext2fs_inode_alloc_stats2(fs, i, +1, 0);
    ext2fs_mark_ib_dirty(fs);
}
  • create_bad_block_inode 申请EXT2_BAD_INO inode节点 共1个inode
static void create_bad_block_inode(ext2_filsys fs, badblocks_list bb_list)
{
  ...
  ext2fs_inode_alloc_stats2(fs, EXT2_BAD_INO, +1, 0);
  ...
}
  • 保留特殊的inode编号

https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/ext2_fs.h

/*
 * Special inode numbers
 */

#define EXT2_BAD_INO        1  /* Bad blocks inode */
#define EXT2_ROOT_INO       2  /* Root inode */
#define EXT4_USR_QUOTA_INO  3  /* User quota inode */
#define EXT4_GRP_QUOTA_INO  4  /* Group quota inode */
#define EXT2_BOOT_LOADER_INO    5  /* Boot loader inode */
#define EXT2_UNDEL_DIR_INO  6  /* Undelete directory inode */
#define EXT2_RESIZE_INO         7  /* Reserved group descriptors inode */
#define EXT2_JOURNAL_INO    8  /* Journal inode */
#define EXT2_EXCLUDE_INO    9  /* The "exclude" inode, for snapshots */
#define EXT4_REPLICA_INO   10  /* Used by non-upstream feature */
Group 1

Group 1 块的使用情况: 共:10239-8193+1 = 2047

存放内容 大小(block)
super block 1
group descriptors 1
Reserved GDT blocks 79
free blocks 1966

Journal (jbd2)

为了计算日志占用空间我们把日志放到单独文件系统, 分一个10M的分区

# fdisk /dev/sdb

#         Start          End    Size  Type            Name
 1         2048      8390655      4G  Linux filesyste
 2      8390656      8411135     10M  Linux filesyste
 3      8411136      8431615     10M  Linux filesyste

# 保存后重启生效

# mkfs.ext4 -O journal_dev /dev/sdb3

# mkfs.ext4 -J device=/dev/sdb3 /dev/sdb2

可以查看下文件系统空闲空间增加为 7773, 比较之前的 6749 减少了 1024blocks 也就是journal的空间

[root@centosgpt ~]# dumpe2fs /dev/sdb2
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name:   &lt;none&gt;
Last mounted on:          &lt;not available&gt;
Filesystem UUID:          6102514b-9353-43ff-b9dc-011480c16120
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              2560
Block count:              10240
Reserved block count:     512
Free blocks:              9739
Free inodes:              2549
First block:              1
Block size:               1024
Fragment size:            1024
Group descriptor size:    64
Reserved GDT blocks:      79
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         1280
Inode blocks per group:   160
Flex block group size:    16
Filesystem created:       Wed Sep 16 05:42:45 2020
Last mount time:          n/a
Last write time:          Wed Sep 16 05:42:45 2020
Mount count:              0
Maximum mount count:      -1
Last checked:             Wed Sep 16 05:42:45 2020
Check interval:           0 (&lt;none&gt;)
Lifetime writes:          164 kB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal UUID:             8bd1e1a5-2a58-465b-81de-bea1b7a0c1cb
Journal device:           0x0813
Default directory hash:   half_md4
Directory Hash Seed:      57e0e8c6-5646-4a44-af1a-57a974e1d4d8

Group 0: (Blocks 1-8192)
  Checksum 0x1a35, unused inodes 1269
  Primary superblock at 1, Group descriptors at 2-2
  Reserved GDT blocks at 3-81
  Block bitmap at 82 (+81), Inode bitmap at 98 (+97)
  Inode table at 114-273 (+113)
  7773 free blocks, 1269 free inodes, 2 directories, 1269 unused inodes
  Free blocks: 97, 100-113, 435-8192
  Free inodes: 12-1280
Group 1: (Blocks 8193-10239) [INODE_UNINIT]
  Checksum 0x1a5b, unused inodes 1280
  Backup superblock at 8193, Group descriptors at 8194-8194
  Reserved GDT blocks at 8195-8273
  Block bitmap at 83 (bg #0 + 82), Inode bitmap at 99 (bg #0 + 98)
  Inode table at 274-433 (bg #0 + 273)
  1966 free blocks, 1280 free inodes, 0 directories, 1280 unused inodes
  Free blocks: 8274-10239
  Free inodes: 1281-2560

ext4 wiki上 external journal 布局

1024 bytes of padding ext4 Superblock Journal Superblock descriptor_block (data_blocks or revocation_block) [more data or revocations] commmit_block [ more transactions...]

可以实际查看下journal空间的布局,

[root@centosgpt ~]# dumpe2fs /dev/sdb3
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name:   &lt;none&gt;
Last mounted on:          &lt;not available&gt;
Filesystem UUID:          8bd1e1a5-2a58-465b-81de-bea1b7a0c1cb
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      journal_dev
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              0
Block count:              10240
Reserved block count:     0
Free blocks:              0
Free inodes:              0
First block:              1
Block size:               1024
Fragment size:            1024
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         0
Inode blocks per group:   0
Filesystem created:       Wed Sep 16 05:42:35 2020
Last mount time:          n/a
Last write time:          Wed Sep 16 05:42:35 2020
Mount count:              0
Maximum mount count:      -1
Last checked:             Wed Sep 16 05:42:35 2020
Check interval:           0 (&lt;none&gt;)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      9cdd050f-d565-4c9c-be25-eae6f209783d

Journal block size:       1024
Journal length:           10240
Journal first block:      3
Journal sequence:         0x00000001
Journal start:            0
Journal number of users:  1
Journal users:            6102514b-9353-43ff-b9dc-011480c16120
You have mail in /var/spool/mail/root

Flexible Block Groups

多个块组,将block bitmap聚合在一起,inode bitmap聚合在一起,同时inode table 也聚合在一起,形成一个逻辑块组

Flex block group size: 16 这16个块组的inode位图,block位图,以及inode table是连续的

# debugfs -R stats /dev/sdb2

 Group  0: block bitmap at 82, inode bitmap at 98, inode table at 114
           204 free blocks, 739 free inodes, 1 used directory, 0 unused inodes
           [Checksum 0x3566]
 Group  1: block bitmap at 83, inode bitmap at 99, inode table at 274
           0 free blocks, 1280 free inodes, 0 used directories, 0 unused inodes
           [Checksum 0x23fa]

block bitmap 占用 1 block 83-82 82-97 共 16个block
inode bitmap 占用 1 block 99-98 98-113 共 16个block
inode table 占用 160 block 274-114

packed_meta_blocks (mk2efs)

有了Flex bg可以将元数据放到结构头部,并使用较快存储设备从而提高文件存储处理速度

size_of() {
  blockdev --getsize $1
}

mkdmsetup() {
  _ssd=/dev/$1
  _hdd=/dev/$2
  _size_of_ssd=$(size_of $_ssd)
  echo """0 $_size_of_ssd linear $_ssd 0
  $_size_of_ssd $(size_of $_hdd) linear $_hdd 0" | dmsetup create dm-${1}-${2}
}

mkdmsetup sdg1 sdb

mkfs.ext4 -O 
^has_journal,flex_bg,^uninit_bg,^sparse_super,sparse_super2,^extra_isize,^dir_nlink,^resize_inode
-E packed_meta_blocks=1,lazy_itable_init=0 -G 32768 -I 128 -i
$((1024*512)) /dev/mapper/dm-sdg1-sdb

from David Casier-Fwd: Fwd: how disable double write WAL)

参考及引用

Everything You Ever Wanted to Know About inodes on Linux
理解inode
EXT4文件系统的磁盘整体布局

Be First to Comment

发表评论

电子邮件地址不会被公开。 必填项已用*标注