inode定义
node
The inode (index node) is a data structure in a Unix-style file system
that describes a file-system object such as a file or a directory.
Each inode stores the attributes and disk block locations of the object's data.
File-system object attributes may include metadata (times of last change,[2] access, modification),
as well as owner and permission data.[3]
from inode wiki
inode
数据结构用于描述文件系统中的文件、目录等, 每一个inode保存了文件系统系统对象memdata
如修改时间,访问时间以及权限等。Dennis Ritchie
对于inode
中i
可能代表index
,而被访问的文件列表,被组织为存放存放在磁盘上的一维数组。
block
ext4文件系统以block
为单位分配存储空间。
环境准备
Linux Distributions: CentOS 7
通过fdisk划出一个10M分区, 格式化为ext4
$ fdisk /dev/sdb
...
# Start End Size Type Name
1 2048 8390655 4G Linux filesyste
2 8390656 8411135 10M Linux filesyste
[root@centosgpt ~]#
$ mkfs.ext4 /dev/sdb2
$ mount /dev/sdb2 /root/inode
ext4布局
Group 0 Padding | ext4 Super Block | Group Descriptors | Reserved GDT Blocks | Data Block Bitmap | inode Bitmap | inode Table | Data Blocks |
---|---|---|---|---|---|---|---|
1024 bytes | 1 block | many blocks | many blocks | 1 block | 1 block | many blocks | many more blocks |
由于支持Flexible且 Flex block group size:16, 下面通过dumpe2fs输出的信息可以看到 Data Block Bitmap和inode Bitmap 多扩展到了16个block长(实际使用了其中的2个)
实验
磁盘分区
- 通过fdisk划出一个10M分区, 格式化为ext4
$ fdisk /dev/sdb
...
# Start End Size Type Name
1 2048 8390655 4G Linux filesyste
2 8390656 8411135 10M Linux filesyste
[root@centosgpt ~]#
$ mkfs.ext4 /dev/sdb2
$ mount /dev/sdb2 /root/inode
- 可以看到inode相关配置
cat /etc/mke2fs.conf
...
[defaults]
...
blocksize = 4096
inode_size = 256
inode_ratio = 16384
...
[fs_types]
ext4 = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,
dir_nlink,extra_isize,64bit
inode_size = 256
inode_ratio = 16384
}
small = {
blocksize = 1024
inode_size = 128
inode_ratio = 4096
}
...
如果不指定参数,10M的分区按照 small这种文件系统类型创建,
根据fs_blocks_count
和默认的blocksize 1024
进行匹配出
meg = (1024 * 1024) / EXT2_BLOCK_SIZE(sb);
if (fs_blocks_count < 3 * meg)
size_type = "floppy";
else if (fs_blocks_count < 512 * meg)
size_type = "small";
else if (fs_blocks_count < 4 * 1024 * 1024 * meg)
size_type = "default";
else if (fs_blocks_count < 16 * 1024 * 1024 * meg)
size_type = "big";
else
size_type = "huge";
其中可以通过 ioctl
传送相关指令计算设备的大小。可以看到不同指令不同。BLKGETSIZE
返回磁盘的block个数
#if defined(__linux__) && defined(_IO) && !defined(BLKGETSIZE)
#define BLKGETSIZE _IO(0x12,96)
/* return device size */
#endif
#if defined(__linux__) && defined(_IOR) && !defined(BLKGETSIZE64)
#define BLKGETSIZE64 _IOR(0x12,114,size_t)
/* return device size in bytes (u64 *arg) */
#endif
比如实现下面的程序, 通过获取block数目, 可以计算设备大小
#include <fcntl.h> #include <linux/fs.h> main(int argc, char **argv) { int fd; unsigned long numblocks=0; fd = open(argv[1], O_RDONLY); ioctl(fd, BLKGETSIZE, &numblocks); close(fd); printf("Number of blocks: %lu, this makes %.3f GB\n", numblocks, (double)numblocks * 512.0 / (1024 * 1024 * 1024)); }
格式化ext4文件系统
[root@centosgpt ~]# mkfs.ext4 /dev/sdb2
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
2560 inodes, 10240 blocks
512 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=10485760
2 block groups
8192 blocks per group, 8192 fragments per group
1280 inodes per group
Superblock backups stored on blocks:
8193
Allocating group tables: done
Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done
查询文件系统信息
[root@centosgpt ~]# dumpe2fs /dev/sdb2
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 2ef6b0f7-7591-4018-87bd-e0d4141852fe
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2560
Block count: 10240
Reserved block count: 512
Free blocks: 8715
Free inodes: 2549
First block: 1
Block size: 1024
Fragment size: 1024
Group descriptor size: 64
Reserved GDT blocks: 79
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 1280
Inode blocks per group: 160
Flex block group size: 16
Filesystem created: Fri Sep 11 10:56:03 2020
Last mount time: n/a
Last write time: Fri Sep 11 10:56:03 2020
Mount count: 0
Maximum mount count: -1
Last checked: Fri Sep 11 10:56:03 2020
Check interval: 0 (<none>)
Lifetime writes: 1190 kB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 2dcd05a3-e862-4734-bc0d-e033a7b00f5c
Journal backup: inode blocks
Journal features: (none)
Journal size: 1024k
Journal length: 1024
Journal sequence: 0x00000001
Journal start: 0
Group 0: (Blocks 1-8192)
Checksum 0xad60, unused inodes 1269
Primary superblock at 1, Group descriptors at 2-2
Reserved GDT blocks at 3-81
Block bitmap at 82 (+81), Inode bitmap at 98 (+97)
Inode table at 114-273 (+113)
6749 free blocks, 1269 free inodes, 2 directories, 1269 unused inodes
Free blocks: 1444-8192
Free inodes: 12-1280
Group 1: (Blocks 8193-10239) [INODE_UNINIT]
Checksum 0xd9bb, unused inodes 1280
Backup superblock at 8193, Group descriptors at 8194-8194
Reserved GDT blocks at 8195-8273
Block bitmap at 83 (bg #0 + 82), Inode bitmap at 99 (bg #0 + 98)
Inode table at 274-433 (bg #0 + 273)
1966 free blocks, 1280 free inodes, 0 directories, 1280 unused inodes
Free blocks: 8274-10239
Free inodes: 1281-2560
上方是superblock信息, 下面是块组信息:
可以看到在Superblock 和Group descriptors除了在Group0中存储外, 在Group1 中也进行了备份, Super Block和 Group descriptors.
Group0 iblock使用
Group0 块使用情况: 共:8192
blocks
存放内容 | 大小(block) |
---|---|
super block | 1 |
group descriptors | 1 |
Reserved GDT blocks | 79 |
Block bitmap | 1+1 |
Inode bitmap | 1+1 |
Inode table | 160+160 |
First data block | 1 |
Journal | 1024 |
free blocks | 6749 |
First data block. This must be at least 1 for 1k-block filesystems and is typically 0 for all other block sizes.
1+1+79+2+2+320+1+1024 = 1430, 1443-1431+1=13, 还差13个block
查看mkfs.ext4 v1.42.9相关代码, 剩余13个iblock分布主要是rootdir和lost+found目录使用。
- 创建目录结构:
https://github.com/tytso/e2fsprogs/blob/v1.42.9/misc/mke2fs.c
...
2359 int main (int argc, char *argv[])
2360 {
...
2629 create_root_dir(fs);
2630 create_lost_and_found(fs);
...
}
...
- root_dir创建, 申请1个block
create_root_dir
主要工作是创建目录和写入相关inode信息
358 static void create_root_dir(ext2_filsys fs)
{
ext2fs_mkdir(fs, EXT2_ROOT_INO, EXT2_ROOT_INO, 0);
...
}
可以看到在创建目录是会申请1个block
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/mkdir.c
errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum, const char *name)
{
/*
* Allocate a data block for the directory
*/
retval = ext2fs_new_block2(fs, 0, 0, &blk);
}
- lost+found目录创建, 申请12个block
create_lost_and_found
通过 ext2fs_mkdir
申请1
个block,通过ext2fs_expand_dir
创建11
个block
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/ext2_fs.h
#define EXT2_NDIR_BLOCKS 12
https://github.com/tytso/e2fsprogs/blob/v1.42.9/misc/mke2fs.c
static void create_lost_and_found(ext2_filsys fs)
{
retval = ext2fs_mkdir(fs, EXT2_ROOT_INO, 0, name);
for (i=1; i < EXT2_NDIR_BLOCKS; i++) {
...
ext2fs_expand_dir(fs, ino);
...
}
...
}
- 关于 保留inode相关目录信息创建,
ex2fs_expand_dir
调用链如下,其中涉及申请iblock 在expand_dir_proc
中实现
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/expanddir.c
errcode_t ext2fs_expand_dir(ext2_filsys fs, ext2_ino_t dir)
{
...
retval = ext2fs_block_iterate3(fs, dir, BLOCK_FLAG_APPEND,\
0, expand_dir_proc, &es);
...
}
可以看到ext2fs_block_iterate3
最终调用 expand_dir_proc
创建相关目录的block
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/block.c
errcode_t ext2fs_block_iterate3(ext2_filsys fs,
ext2_ino_t ino,
int flags,
char *block_buf,
int (*func)(ext2_filsys fs,
blk64_t *blocknr,
e2_blkcnt_t blockcnt,
blk64_t ref_blk,
int ref_offset,
void *priv_data),
void *priv_data)
static int expand_dir_proc(...)
{
...
retval = ext2fs_new_dir_block(fs, 0, 0, &block);
...
}
Group0 inode使用
存放内容 | 大小(inode) |
---|---|
root dir | 1 |
lost and found | 1 |
Reserved inodes | 8 |
EXT2_BAD_INO | 1 |
free blocks | 6749 |
- root_dir, lost+found 分别申请1个inode , 共2个inode
reserve_inodes
保留的inode节点 3-10 , 共申请 8个inode
static void reserve_inodes(ext2_filsys fs)
{
ext2_ino_t i;
for (i = EXT2_ROOT_INO + 1; i < EXT2_FIRST_INODE(fs->super); i++)
ext2fs_inode_alloc_stats2(fs, i, +1, 0);
ext2fs_mark_ib_dirty(fs);
}
create_bad_block_inode
申请EXT2_BAD_INO
inode节点 共1个inode
static void create_bad_block_inode(ext2_filsys fs, badblocks_list bb_list)
{
...
ext2fs_inode_alloc_stats2(fs, EXT2_BAD_INO, +1, 0);
...
}
- 保留特殊的inode编号
https://github.com/tytso/e2fsprogs/blob/v1.42.9/lib/ext2fs/ext2_fs.h
/*
* Special inode numbers
*/
#define EXT2_BAD_INO 1 /* Bad blocks inode */
#define EXT2_ROOT_INO 2 /* Root inode */
#define EXT4_USR_QUOTA_INO 3 /* User quota inode */
#define EXT4_GRP_QUOTA_INO 4 /* Group quota inode */
#define EXT2_BOOT_LOADER_INO 5 /* Boot loader inode */
#define EXT2_UNDEL_DIR_INO 6 /* Undelete directory inode */
#define EXT2_RESIZE_INO 7 /* Reserved group descriptors inode */
#define EXT2_JOURNAL_INO 8 /* Journal inode */
#define EXT2_EXCLUDE_INO 9 /* The "exclude" inode, for snapshots */
#define EXT4_REPLICA_INO 10 /* Used by non-upstream feature */
Group 1
Group 1 块的使用情况: 共:10239-8193+1 = 2047
存放内容 | 大小(block) |
---|---|
super block | 1 |
group descriptors | 1 |
Reserved GDT blocks | 79 |
free blocks | 1966 |
Journal (jbd2)
为了计算日志占用空间我们把日志放到单独文件系统, 分一个10M的分区
# fdisk /dev/sdb
# Start End Size Type Name
1 2048 8390655 4G Linux filesyste
2 8390656 8411135 10M Linux filesyste
3 8411136 8431615 10M Linux filesyste
# 保存后重启生效
# mkfs.ext4 -O journal_dev /dev/sdb3
# mkfs.ext4 -J device=/dev/sdb3 /dev/sdb2
可以查看下文件系统空闲空间增加为 7773
, 比较之前的 6749
减少了 1024
blocks 也就是journal
的空间
[root@centosgpt ~]# dumpe2fs /dev/sdb2
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 6102514b-9353-43ff-b9dc-011480c16120
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 2560
Block count: 10240
Reserved block count: 512
Free blocks: 9739
Free inodes: 2549
First block: 1
Block size: 1024
Fragment size: 1024
Group descriptor size: 64
Reserved GDT blocks: 79
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 1280
Inode blocks per group: 160
Flex block group size: 16
Filesystem created: Wed Sep 16 05:42:45 2020
Last mount time: n/a
Last write time: Wed Sep 16 05:42:45 2020
Mount count: 0
Maximum mount count: -1
Last checked: Wed Sep 16 05:42:45 2020
Check interval: 0 (<none>)
Lifetime writes: 164 kB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: 8bd1e1a5-2a58-465b-81de-bea1b7a0c1cb
Journal device: 0x0813
Default directory hash: half_md4
Directory Hash Seed: 57e0e8c6-5646-4a44-af1a-57a974e1d4d8
Group 0: (Blocks 1-8192)
Checksum 0x1a35, unused inodes 1269
Primary superblock at 1, Group descriptors at 2-2
Reserved GDT blocks at 3-81
Block bitmap at 82 (+81), Inode bitmap at 98 (+97)
Inode table at 114-273 (+113)
7773 free blocks, 1269 free inodes, 2 directories, 1269 unused inodes
Free blocks: 97, 100-113, 435-8192
Free inodes: 12-1280
Group 1: (Blocks 8193-10239) [INODE_UNINIT]
Checksum 0x1a5b, unused inodes 1280
Backup superblock at 8193, Group descriptors at 8194-8194
Reserved GDT blocks at 8195-8273
Block bitmap at 83 (bg #0 + 82), Inode bitmap at 99 (bg #0 + 98)
Inode table at 274-433 (bg #0 + 273)
1966 free blocks, 1280 free inodes, 0 directories, 1280 unused inodes
Free blocks: 8274-10239
Free inodes: 1281-2560
ext4 wiki上 external journal 布局
1024 bytes of padding | ext4 Superblock | Journal Superblock | descriptor_block (data_blocks or revocation_block) [more data or revocations] commmit_block | [ more transactions…] |
---|---|---|---|---|
可以实际查看下journal空间的布局,
[root@centosgpt ~]# dumpe2fs /dev/sdb3
dumpe2fs 1.42.9 (28-Dec-2013)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 8bd1e1a5-2a58-465b-81de-bea1b7a0c1cb
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: journal_dev
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 0
Block count: 10240
Reserved block count: 0
Free blocks: 0
Free inodes: 0
First block: 1
Block size: 1024
Fragment size: 1024
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 0
Inode blocks per group: 0
Filesystem created: Wed Sep 16 05:42:35 2020
Last mount time: n/a
Last write time: Wed Sep 16 05:42:35 2020
Mount count: 0
Maximum mount count: -1
Last checked: Wed Sep 16 05:42:35 2020
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Default directory hash: half_md4
Directory Hash Seed: 9cdd050f-d565-4c9c-be25-eae6f209783d
Journal block size: 1024
Journal length: 10240
Journal first block: 3
Journal sequence: 0x00000001
Journal start: 0
Journal number of users: 1
Journal users: 6102514b-9353-43ff-b9dc-011480c16120
You have mail in /var/spool/mail/root
Flexible Block Groups
多个块组,将block bitmap聚合在一起,inode bitmap聚合在一起,同时inode table 也聚合在一起,形成一个逻辑块组
Flex block group size: 16
这16个块组的inode位图,block位图,以及inode table是连续的
# debugfs -R stats /dev/sdb2
Group 0: block bitmap at 82, inode bitmap at 98, inode table at 114
204 free blocks, 739 free inodes, 1 used directory, 0 unused inodes
[Checksum 0x3566]
Group 1: block bitmap at 83, inode bitmap at 99, inode table at 274
0 free blocks, 1280 free inodes, 0 used directories, 0 unused inodes
[Checksum 0x23fa]
block bitmap 占用 1 block 83-82 82-97 共 16个block
inode bitmap 占用 1 block 99-98 98-113 共 16个block
inode table 占用 160 block 274-114
packed_meta_blocks (mk2efs)
有了Flex bg可以将元数据放到结构头部,并使用较快存储设备从而提高文件存储处理速度
size_of() {
blockdev --getsize $1
}
mkdmsetup() {
_ssd=/dev/$1
_hdd=/dev/$2
_size_of_ssd=$(size_of $_ssd)
echo """0 $_size_of_ssd linear $_ssd 0
$_size_of_ssd $(size_of $_hdd) linear $_hdd 0" | dmsetup create dm-${1}-${2}
}
mkdmsetup sdg1 sdb
mkfs.ext4 -O
^has_journal,flex_bg,^uninit_bg,^sparse_super,sparse_super2,^extra_isize,^dir_nlink,^resize_inode
-E packed_meta_blocks=1,lazy_itable_init=0 -G 32768 -I 128 -i
$((1024*512)) /dev/mapper/dm-sdg1-sdb
参考及引用
Everything You Ever Wanted to Know About inodes on Linux
理解inode
EXT4文件系统的磁盘整体布局
Be First to Comment