My Ideas ,Linux related Links, etc..: August 2015

Well, Filesystems may sound very simple concepts specially in Unix environments but there are some interesting limitations and tricks which a Unix admin must keep in mind working with.
I'm taking some notes about filesystems and very simple definitions specifically for common file systems in Linux. I'll also include my references in here. Some are interesting.

A bit of reading about Inodes and Unix filesystem features

Unix Inodes

Many Unix filesystems (Berkeley Fast Filesystem, Linux ext2fs, Sun ufs, ...) take an approach that combines some of the ideas above.

each file is indexed by an inode

inodes are special disk blocks set aside just for this purpose (see df -i to see how many of these exist on your favorite Unix filesystem)

they are created when the filesystem is created

the number of inodes limits the total number of files/directories that can be stored in the filesystem

the inode itself consists of

administrative information (permissions, timestamps, etc.)

a number of direct blocks (typically 12) that contain pointers to the first 12 blocks of the file

a single indirect pointer that points to a disk block which in turn is used as an index block, if the file is too big to be indexed entirely by the direct blocks

a double indirect pointer that points to a disk block which is a collection of pointers to disk blocks which are index blocks, used if the file is too big to be indexed by the direct and single indirect blocks

a triple indirect pointer that points to an index block of index blocks of index blocks...

interesting reading on your favorite FreeBSD system: /sys/ufs/ufs/dinode.h

small files need only the direct blocks, so there is little waste in space or extra disk reads in those cases

medium sized files may use indirect blocks

only large files make use of (and incur the overhead of) the double or triple indirect blocks, and that is reasonable since those files are large anyway

since the disk is now broken into two different types of blocks - inodes and data blocks, there must be some way to determine where the inodes are, and to keep track of free inodes and disk blocks. This is done by a superblock, located at a fixed position in the filesystem. The superblock is usually replicated on the disk to avoid catastrophic failure in case of corruption of the main superblock

Disk Allocation Considerations

limitations on file size, total partition size

internal, external fragmentation

overhead to store and access index blocks

layout of files, inodes, directories, etc, as they affect performance - disk head movement, rotational latency - many unix filesystems keep clusters of inodes at a variety of locations throughout the file system, to allow inodes and the disk blocks they reference to be close together

may want to reorganize files occasionally to improve layout (see hw7 question)

Free Space Management
With any of these methods of allocation, we need some way to keep track of free disk blocks.
Two main options:

bit vector - keep a vector, one bit per disk block

0 means the corresponding block is free, 1 means it is in use

search for a free block requires search for the first 0 bit, can be efficient given hardware support

vector is too big to keep in main memory, so it must be on disk, which makes traversal slow

with block size 2¹² or 4KB, disk size 2³³ or 8 GB, we need 2²¹ bits (128 KB) for bit vector

easy to allocate contiguous space for files

free list - keep a linked list of free blocks

with linked allocation, can just use existing links to form a free list

with FAT, use FAT entries for unallocated blocks to store free list

no wasted space

can be difficult to allocate contiguous blocks

allocate from head of list, deallocated blocks added to tail, both O(1) operations

Performance Optimization
Caching is an important optimization for disk accesses.
A disk cache may be located:

main memory

disk controller

internal to disk drive

Safety and Recovery

When a disk cache is used, there could be data in memory that has been "written" by programs, which which has not yet been physically written to the disk. This can cause problems in the event of a system crash or power failure.
If the system detects this situation, typically on bootup after such a failure, a consistency checker is run. In Unix, this is usually the fsck program, and in Windows, scandisk or some variant. This checks for and repairs, if possible, inconsistencies in the filesystem.

Journaling Filesystems
One way to avoid data loss when a filesystem is left in an inconsistent state is to move to a log-structured or journaling filesystem.

record updates to the filesystem as transactions

transactions are written immediately to a log, though the actual filesystem may not yet be updated

transactions in the log are asynchronously applied to the actual filesystem, at which time the transaction is removed from the log

if the system crashes, any pending transactions can be applied to the filesystem - main benefits are less chance of significant inconsistencies, and that those inconsistencies can be corrected from the unfinished transactions, avoiding the long consistency check

Examples:

ReiserFS, a linux journaling filesystem - I recommend reading this page

ext3fs, also for linux

jfs, IBM journaling filesystem, available for AIX, Linux

Related idea in FreeBSD's filesystem: Soft Updates

Journaling extensions to Macintosh HFS disks, called Elvis, supposedly coming in OS X 10.2.2

NTFS does some journaling, but some claim it is not "fully journaled"

the term "journaling" may also refer to systems that maintain the transaction log for a longer time, giving the ability to "undo" changes and retrieve a previous state of a filesystem

From: Ext4 filesystem layout

Overview

An ext4 file system is split into a series of block groups. To reduce performance difficulties due to fragmentation, the block allocator tries very hard to keep each file's blocks within the same group, thereby reducing seek times. The size of a block group can be calculated as 8 * block_size_in_bytes. With the default block size of 4KiB, each group will contain 32,768 blocks, for a length of 128MiB. ( It's a good to group things. )

Blocks

ext4 allocates storage space in units of "blocks". A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2. Blocks are in turn grouped into larger units called block groups. Block size is specified at mkfs time and typically is 4KiB. You may experience mounting problems if block size is greater than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory pages). By default a filesystem can contain 2^32 blocks; if the '64bit' feature is enabled, then a filesystem can have 2^64 blocks.

	32-bit mode				64-bit mode
Item	1KiB	2KiB	4KiB	64KiB	1KiB	2KiB	4KiB	64KiB
Blocks	2^32	2^32	2^32	2^32	2^64	2^64	2^64	2^64
Inodes	2^32	2^32	2^32	2^32	2^32	2^32	2^32	2^32
File System Size	4TiB	8TiB	16TiB	256PiB	16ZiB	32ZiB	64ZiB	1YiB

( Nice, so It's actually the System's architecture dictating the maximum filesystem's size. 2 to the power of 32 or 64 and you may have some playroom with your block size but you must stick to blocks as big as your memory page size. But, Can't I mount the file systems created on bigger machines on smaller ones ? Seems we may have difficulties. So don't be sure unless you've tried it. )

Layout

The layout of a standard block group is approximately as follows (each of these fields is discussed in a separate section below):

Group 0 Padding	ext4 Super Block	Group Descriptors	Reserved GDT Blocks	Data Block Bitmap	inode Bitmap	inode Table	Data Blocks
1024 bytes	1 block	many blocks	many blocks	1 block	1 block	many blocks	many more blocks

For the special case of block group 0, the first 1024 bytes are unused, to allow for the installation of x86 boot sectors and other oddities. The superblock will start at offset 1024 bytes, whichever block that happens to be (usually 0). However, if for some reason the block size = 1024, then block 0 is marked in use and the superblock goes in block 1. For all other block groups, there is no padding.

Still to continue.....

My Ideas ,Linux related Links, etc..

Saturday, August 01, 2015

Regular expression

Notes about Filesystems

Layout

About Me