Concatenate files by inode - linux

Is there a method in linux to concatenate existing files by essentially turning 2 files into 1 file with 2 fragments? I'm imagining by updating the first file's inode pointers to include the second files blocks and then removing the second files inode.

This is not "physically" possible on most filesystems, and there is no Linux system call to do it.
Consider the case of appending two files to each other, where each file is 1 GB + 1 byte. Simply concatenating the two would leave a single 1-byte extent in the middle of the file; most filesystems have no way of representing this, as they only use partial extents at the end of a file.

Related

Squashfs check compressed file size

is there any way to check the final size of a specific file after compression in a squashfs filesystem?
I'm looking through mksquashfs/unsquashfs command line options but I can't find anything.
Using the -info option in mksquashfs only prints the size before the compression.
Thanks
This isn't feasible to do with much granularity, because compression is done at block level, not file level.
A file may be marked at starting 50kb into the size of the buffer created by decompressing block 50, and continuing to end 50 bytes into the decompressed block 52 (ignoring fragments here, which are a separate concern) -- but that doesn't let you map back to the position inside the compressed copy of block-50 where that file starts. (You can easily determine the compression ratio for block 51, but you can't easily figure out the ratios for the parts of the file contained in 50 and 52 in our example, because they're shared with other contents).
So the information isn't exposed because it isn't easily available. This actually makes storage of numerous (similar) small files significantly more efficient, because a single compression context is used for all of them (and decompressing a block to retrieve one file may mean that you've got files next to it already decompressed in memory)... but without potentially-unfounded assumptions (such as assuming that all contents within a block share that block's average ratio) it doesn't help with trying to backtrace how well each individual item compressed, because the items aren't compressed individually in the first place.

If torrent contain multiple files, how to know what piece correspond to each file?

I'm building a BitTorrent client application in Java and I have 2 small question:
Can torrent contain folders? recursively?
If a torrent contains n files (not directories - for simplicity), do I need to create n files with their corresponding size? When I receive a piece from a peer, how do I know to which file it (the piece) belong?
For example, here is a torrent which contains 2 files:
TorrentInfo{Created By: ruTorrent (PHP Class - Adrien Gibrat)
Main tracker: http://tracker.hebits.net:35777/tracker.php?do=announce&passkey=5d3ab309eda55c1e7183975099958ab2
Comment: null
Info_hash: c504216ca4a113d26f023a10a1249ca3a6217997
Name: Veronica.2017.1080p.BluRay.DTS-HD.MA.5.1.x264-HDH
Piece Length: 16777216
Pieces: 787
Total Size: null
Is Single File Torrent: false
File List:
TorrentFile{fileLength=13202048630, fileDirs=[Veronica.2017.1080p.BluRay.DTS-HD.MA.5.1.x264-HDH.mkv]}
TorrentFile{fileLength=62543, fileDirs=[Veronica.2017.1080p.BluRay.DTS-HD.MA.5.1.x264-HDH.srt]}
The docs doesn't say much: https://wiki.theory.org/index.php/BitTorrentSpecification
what you are doing is similar to mine...
The following bold fonts are important to your questions.
1.yes; no
Info in Multiple File Mode
name: the name of the directory in which to store all the files. This is purely advisory. (string)
path: a list containing one or more string elements that together represent the path and filename. Each element in the list corresponds to either a directory name or (in the case of the final element) the filename. For example, a the file "dir1/dir2/file.ext" would consist of three string elements: "dir1", "dir2", and "file.ext". This is encoded as a bencoded list of strings such as l4:*dir*14:*dir*28:file.exte
Info in Single File Mode
name: the filename. This is purely advisory. (string)
Filename includes floder name.
2.maybe;
Whether you need to create n files with their corresponding size depend on whether you need to download n files.
Peer wire protocol (TCP)
piece:
The piece message is variable length, where X is the length of the block. The payload contains the following information:
index: integer specifying the zero-based piece index
begin: integer specifying the zero-based byte offset within the piece
block: block of data, which is a subset of the piece specified by index.
For the purposes of piece boundaries in the multi-file case, consider the file data as one long continuous stream, composed of the concatenation of each file in the order listed in the files list. The number of pieces and their boundaries are then determined in the same manner as the case of a single file. Pieces may overlap file boundaries.
I am sorry for my english, because I am not native speaker...
Can torrent contain folders? recursively?
Yes.
Sortof. In BEP3 Nested directories are mapped into path elements, i.e. /dir1/dir2/dir3/file.ext is represented as path: ["dir1", "dir2", "dir3", "file.ext"] in the file list. BEP52 changes this to a tree-based structure more closely resembling a directory tree.
If a torrent contains n files (not directories - for simplicity), do I need to create n files with their corresponding size? When I receive a piece from a peer, how do I know to which file it (the piece) belong?
The bittorrent wire protocol deals with a contiguous address space of bytes which are grouped into fixed-sized pieces. How a client stores those bytes locally is in principle up to the implementation. But if you want to store it in the file layout described in the .torrent then you have to calculate a mapping between the pieces address space and file offsets. In BEP3 files are not aligned to piece boundaries, so a single piece can straddle multiple files. BEP 47 and BEP 52 aim to simplify this by introducing padding files or implicit alignment gaps respectively.

Zipping a folder into equal size parts

I've been using 7Zip for a few years now and always liked that I could zip a folder into several parts of a specific size. For example, the website BOX only allows uploads under 100MB so anything I wanted to put into BOX, I just split the zip file into 95MB files. However, recently I've needed to do something similar except instead of breaking into a certain size, I need to split them up into a specific number of files but all equaling the same size. Right now, 7zip breaks them into the max size you allow and the last file is any remaining data ranging from 1KB up to the limit specified.
For example, say I have a 826MB file, I want it to zip up 5 files that are all the same size. Is there any program out there that will do this?
Thanks in advanced!
I don't know of any program that does this, but if this is something that you're doing regularly, you could write a script that:
Finds out the size of the file
Calculates the maximum piece size to use if you want to split it into n pieces.
Constructs a corresponding 7zip command

data pointers in inode data structure

I have gone through the code of inode in linux kernel code but I am unable to figure where are the data pointers in inode. I know that there are 15 pointers [0-14] out of which 12 are direct, 1 single indirect, 1 double indirect and 1 triple indirect.
Can some one please locate these data members. Also please specify how you located these as I have searched on google many time with different key words but all in vain.
It is up to a specific file system to access its data, so there's no "data pointers" in general (some file systems may be virtual, that means generating their data on the fly or retrieving it from network).
If you're interested in ext4, you can look up the ext4-specific inode structure (struct ext4_inode) in fs/ext4/ext4.h, where data of an inode is indeed referenced by indices of 12 direct blocks, 1 of single indirection, 1 of double indirection and 1 of triple indirection.
This means that blocks [0..11] of an inode's data have numbers e4inode->i_block[0/1/.../11], whereas e4inode->i_block[12] is a number of a block which is filled with data block numbers itself (so it holds indices of inode's data blocks in range [12, 12 + fs->block_size / sizeof(__le32)]. The same trick is applied to i_block[13], only it holds double-indirected indices (blocks filled with indices of blocks that hold list of blocks holding the actual data) starting from index 12 + fs->block_size / sizeof(__le32), and i_block[14] holds triple indirected indices.
As explained here:
http://computer-forensics.sans.org/blog/2010/12/20/digital-forensics-understanding-ext4-part-1-extents
Ext4 uses extents instead of block pointers to track the file content.
If you are interested in ext3/ext2 datastructure where content pointer are used:
http://www.slashroot.in/how-does-file-deletion-work-linux
has many good diagrams to elaborate it. And here:
http://mcgrewsecurity.com/training/extx.pdf
at page 16 has examples of the details of "block pointers" (which are basically block numbers, or offset values relative to the start of the disk image, 1 block usually 512 bytes).
If you want to walk the filesystem phyiscally, say for a ext3 formatted hard drive, see this:
http://wiki.sleuthkit.org/index.php?title=FS_Analysis
but you can always use just "dd" command to do everything, just need to know where to start reading and stop reading, and input for the dd command is usually a replica of the harddisk image itself, for many reasons.

Atomic file save on Linux without losing metadata

I'm working on a Perl-based file synchronization tool. It downloads files into a temporary directory (which is guaranteed to be on the same filesystem as the real file) and then moves the temporary files into place over the old ones, preserving metadata like permissions, ownership, and ACLs. I'm wondering how to achieve that last step on Linux.
On Mac OS X, at least in C, I would use the exchangedata function. This takes two filenames as arguments and swaps their contents, leaving all metadata (besides mtime) intact. It guarantees that the operation is atomic—all readers will see either the old file or the new one, never something in between. Unfortunately, I don't think it's available on Linux.
I know that rename moves atomically, but it doesn't preserve metadata. On the other hand, I could open the file and overwrite the data with the contents of the new one, which would preserve all metadata but would not be an atomic operation. Any suggestions on tackling this problem?
The only approach I see here is to read the metadata from the file you are replacing, apply that to the temporary file, and then rename the temporary file over the old file. (rename preserves the source file attributes, obviously.)
Filesystem-specific, but...
The XFS_IOC_SWAPEXT ioctl swaps the extents of two file descriptors on XFS.
#include <xfs/xfs.h>
#include <xfs/xfs_dfrag.h>
xfs_swapext_t sx = {
...,
.sx_fdtarget = fd1,
.sx_fdtmp = fd2,
...
};
xfs_swapext(fd1, &sx);
See the sources to xfs_fsr for example usage.

Resources