Well, Filesystems may sound very simple concepts specially in Unix environments but there are some interesting limitations and tricks which a Unix admin must keep in mind working with.
I'm taking some notes about filesystems and very simple definitions specifically for common file systems in Linux. I'll also include my references in here. Some are interesting.
A bit of reading
about Inodes and Unix filesystem features
Unix Inodes
Many Unix filesystems (Berkeley Fast Filesystem, Linux ext2fs, Sun
ufs, ...) take an approach that combines some of the ideas above.
Disk Allocation Considerations
limitations on file size, total partition size
internal, external fragmentation
overhead to store and access index blocks
layout of files, inodes, directories, etc, as they affect performance -
disk head movement, rotational latency - many unix filesystems keep
clusters of inodes at a variety of locations throughout the file
system, to allow inodes and the disk blocks they reference to be close
together
may want to reorganize files occasionally to improve layout (see
hw7 question)
Free Space Management
With any of these methods of allocation, we need some way to keep
track of free disk blocks.
Two main options:
bit vector - keep a vector, one bit per disk block
0 means the corresponding block is free, 1 means it is in use
search for a free block requires search for the first 0 bit,
can be efficient given hardware support
vector is too big to keep in main memory, so it must be on
disk, which makes traversal slow
with block size 212 or 4KB, disk size 233 or 8 GB,
we need 221 bits (128 KB) for bit vector
easy to allocate contiguous space for files
free list - keep a linked list of free blocks
with linked allocation, can just use existing links to form
a free list
with FAT, use FAT entries for unallocated blocks to store
free list
no wasted space
can be difficult to allocate contiguous blocks
allocate from head of list, deallocated blocks added to
tail, both O(1) operations
Performance Optimization
Caching is an important optimization for disk accesses.
A disk cache may be located:
main memory
disk controller
internal to disk drive
Safety and Recovery
When a disk cache is used, there could be data in memory that has been
"written" by programs, which which has not yet been physically
written to the disk. This can cause problems in the event of a system
crash or power failure.
If the system detects this situation, typically on bootup after such a
failure, a
consistency checker is run. In Unix, this is usually
the
fsck program, and in Windows,
scandisk or some
variant. This checks for and repairs, if possible, inconsistencies in
the filesystem.
Journaling Filesystems
One way to avoid data loss when a filesystem is left in an
inconsistent state is to move to a
log-structured or
journaling filesystem.
record updates to the filesystem as transactions
transactions are written immediately to a log, though the actual
filesystem may not yet be updated
transactions in the log are asynchronously applied to the actual
filesystem, at which time the transaction is removed from the log
if the system crashes, any pending transactions can be applied
to the filesystem - main benefits are less chance of significant
inconsistencies, and that those inconsistencies can be corrected from
the unfinished transactions, avoiding the long consistency check
Examples:
ReiserFS, a linux journaling
filesystem - I recommend reading this page
ext3fs, also for linux
jfs,
IBM journaling filesystem, available for AIX, Linux
Related idea in FreeBSD's filesystem: Soft
Updates
Journaling extensions to Macintosh HFS disks, called Elvis,
supposedly coming in OS X 10.2.2
NTFS does some journaling, but some claim it is not "fully
journaled"
the term "journaling" may also refer to systems that maintain
the transaction log for a longer time, giving the ability to "undo"
changes and retrieve a previous state of a filesystem
From: Ext4 filesystem layout
Overview
An ext4 file system is split into a series of block groups. To reduce
performance difficulties due to fragmentation, the block allocator tries
very hard to keep each file's blocks within the same group, thereby
reducing seek times.
The size of a block group can be calculated as 8 * block_size_in_bytes
. With the default block size of 4KiB, each group will contain 32,768 blocks, for a length of 128MiB.
( It's a good to group things. )
Blocks
ext4 allocates storage space in units of "blocks". A block is a group
of sectors between 1KiB and 64KiB, and the number of sectors must be an
integral power of 2. Blocks are in turn grouped into larger units
called block groups. Block size is specified at mkfs time and typically
is 4KiB.
You may experience mounting problems if block size is greater
than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory
pages).
By default a filesystem can contain 2^32 blocks; if the '64bit'
feature is enabled, then a filesystem can have 2^64 blocks.
|
32-bit mode
|
|
64-bit mode
|
Item |
1KiB |
2KiB |
4KiB |
64KiB
|
|
1KiB |
2KiB |
4KiB |
64KiB
|
Blocks |
2^32 |
2^32 |
2^32 |
2^32
|
|
2^64 |
2^64 |
2^64 |
2^64
|
Inodes |
2^32 |
2^32 |
2^32 |
2^32
|
|
2^32 |
2^32 |
2^32 |
2^32
|
File System Size |
4TiB |
8TiB |
16TiB |
256PiB
|
|
16ZiB |
32ZiB |
64ZiB |
1YiB
|
( Nice, so It's actually the System's architecture dictating the maximum filesystem's size. 2 to the power of 32 or 64 and you may have some playroom with your block size but you must stick to blocks as big as your memory page size. But, Can't I mount the file systems created on bigger machines on smaller ones ? Seems we may have difficulties. So don't be sure unless you've tried it. )
Layout
The layout of a standard block group is approximately as follows
(each of these fields is discussed in a separate section below):
Group 0 Padding |
ext4 Super Block |
Group Descriptors |
Reserved GDT Blocks |
Data Block Bitmap |
inode Bitmap |
inode Table |
Data Blocks
|
1024 bytes |
1 block |
many blocks |
many blocks |
1 block |
1 block |
many blocks |
many more blocks
|
For the special case of block group 0,
the first 1024 bytes are
unused, to allow for the installation of x86 boot sectors and other
oddities.
The superblock will start at offset 1024 bytes, whichever
block that happens to be (usually 0). However, if for some reason the
block size = 1024, then block 0 is marked in use and the superblock goes
in block 1. For all other block groups, there is no padding.
Still to continue.....