Linux: writes are split into 512K chunks
The blame is indeed on the block layer, the SCSI layer itself has little regard to the size. You should check though that the underlying layers are indeed able to pass your request, especially with regard to direct io since that may be split into many small pages and requires a scatter-gather list that is longer than what can be supported by the hardware or even just the drivers (libata is/was somewhat limited).
You should look and tune /sys/class/block/$DEV/queue there are assorted files there and the most likely to match what you need is max_sectors_kb but you can just try it out and see what works for you. You may also need to tune the partitions variables as well.
How to split a struct into chunks in order to save the bytes to another data structure in C/C++?
You haven't allocated memory for disk
.
int main()
{
disk = malloc(sizeof(*disk)*NUM_SECTORS); // Allocate memory for disk
Update_Bitmaps();
free(disk); // Free the allocated memory.
return 0;
}
Also, the following lines are not correct.
memcpy(&bytes, b, 512);
//Disk_Write(2, (char *) bytes); /* offset for data bitmap is 2 */
memcpy(&bytes, (char *) &dataMap + 512, 512);
They need to be
memcpy(bytes, b, 512);
// ^^ just bytes, not &bytes.
//Disk_Write(2, (char *) bytes); /* offset for data bitmap is 2 */
memcpy(bytes, (char *) &dataMap + 512, 512);
// ^^ just bytes, not &bytes.
splitting files in unix
I assume you're using split -b
which will be more CPU-efficient than splitting by lines, but still reads the whole input file and writes it out to each file. If the serial nature of the execution of this portion of split
is your bottleneck, you can use dd
to extract the chunks of the file in parallel. You will need a distinct dd
command for each parallel process. Here's one example command line (assuming the_input_file
is a large file this extracts a bit from the middle):
dd skip=400 count=1 if=the_input_file bs=512 of=_output
To make this work you will need to choose appropriate values of count
and bs
(those above are very small). Each worker will also need to choose a different value of skip
so that the chunks don't overlap. But this is efficient; dd
implements skip
with a seek operation.
Of course, this is still not as efficient as implementing your data consumer process in such a way that it can read a specified chunk of the input file directly, in parallel with other similar consumer processes. But I assume if you could do that you would not have asked this question.
What does O_DIRECT really mean?
(This answer pertains to Linux - other OSes may have different caveats/semantics)
Let's start with the sub-question:
If I open a file with O_DIRECT flag, does it mean that whenever a write(blocking mode) to that file returns, the data is on disk?
No (as @michael-foukarakis commented) - if you need a guarantee your data made it to non-volatile storage you must use/add something else.
What does O_DIRECT really mean?
It's a hint that you want your I/O to bypass the Linux kernel's caches. What will actually happen depends on things like:
- Disk configuration
- Whether you are opening a block device or a file in a filesystem
- If using a file within a filesystem
- The exact filesystem used and the options in use on the filesystem and the file
- Whether you've correctly aligned your I/O
- Whether a filesystem has to do a new block allocation to satisfy your I/O
- If the underlying disk is local, what layers you have in your kernel storage stack before you reach the disk block device
- Linux kernel version
- ...
The list above is not exhaustive.
In the "best" case, setting O_DIRECT
will avoid making extra copies of data while transferring it and the call will return after transfer is complete. You are more likely to be in this case when directly opening block devices of "real" local disks. As previously stated, even this property doesn't guarantee that data of a successful write()
call will survive sudden power loss. IF the data is DMA'd out of RAM to non-volatile storage (e.g. battery backed RAID controller) or the RAM itself is persistent storage THEN you may have a guarantee that the data reached stable storage that can survive power loss. To know if this is the case you have to qualify your hardware stack so you can't assume this in general.
In the "worst" case, O_DIRECT
can mean nothing at all even though setting it wasn't rejected and subsequent calls "succeed". Sometimes things in the Linux storage stack (like certain filesystem setups) can choose to ignore it because of what they have to do or because you didn't satisfy the requirements (which is legal) and just silently do buffered I/O instead (i.e. write to a buffer/satisfy read from already buffered data). It is unclear whether extra effort will be made to ensure that the data of an acknowledged write was at least "with the device" (but in the O_DIRECT
and barriers thread Christoph Hellwig posts that the O_DIRECT
fallback will ensure data has at least been sent to the device). A further complication is that using O_DIRECT
implies nothing about file metadata so even if write data is "with the device" by call completion, key file metadata (like the size of the file because you were doing an append) may not be. Thus you may not actually be able to get at the data you thought had been transferred after a crash (it may appear truncated, or all zeros etc).
While brief testing can make it look like data using O_DIRECT
alone always implies data will be on disk after a write returns, changing things (e.g. using an Ext4 filesystem instead of XFS) can weaken what is actually achieved in very drastic ways.
As you mention "guarantee that the data" (rather than metadata) perhaps you're looking for O_DSYNC
/fdatasync()
? If you want to guarantee metadata was written too, you will have to look at O_SYNC
/fsync()
.
References
- Ext4 Wiki: Clarifying Direct IO's Semantics. Also contains notes about what
O_DIRECT
does on a few non-Linux OSes. - The "[PATCH 1/1 linux-next] ext4: add compatibility flag check to the patch" LKML thread has a reply from Ext4 lead dev Ted Ts'o talking about how filesystems can fallback to buffered I/O for
O_DIRECT
rather than failing theopen()
call. - In the "ubifs: Allow O_DIRECT" LKML thread Btrfs lead developer Chris Mason states Btrfs resorts to buffered I/O when
O_DIRECT
is requested on compressed files. - ZFS on Linux commit message discussing the semantics of
O_DIRECT
in different scenarios. Also see the (at the time of writing mid-2020) proposed newO_DIRECT
semantics for ZFS on Linux (the interactions are complex and defy a brief explanation). - Linux open(2) man page (search for
O_DIRECT
in the Description section and the Notes section) - Ensuring data reaches disk LWN article
- Infamous Linus Torvalds O_DIRECT LKML thread summary (for even more context you can see the full LKML thread)
Linux: Using split on limited space
My attempt:
#! /bin/bash
if [ $# -gt 2 -o $# -lt 1 -o ! -f "$1" ]; then
echo "Usage: ${0##*/} <filename> [<split size in M>]" >&2
exit 1
fi
bsize=${2:-100}
bucket=$( echo $bsize '* 1024 * 1024' | bc )
size=$( stat -c '%s' "$1" )
chunks=$( echo $size / $bucket | bc )
rest=$( echo $size % $bucket | bc )
[ $rest -ne 0 ] && let chunks++
while [ $chunks -gt 0 ]; do
let chunks--
fn=$( printf '%s_%03d.%s' "${1%.*}" $chunks "${1##*.}" )
skip=$(( bsize * chunks ))
dd if="$1" of="$fn" bs=1M skip=${skip} || exit 1
truncate -c -s ${skip}M "$1" || exit 1
done
The above assumes bash(1)
, and Linux implementations of stat(1)
, dd(1)
, and truncate(1)
. It should be pretty much as fast as it gets, since it uses dd(1)
to copy chunks of the initial file. It also uses bc(1)
to make sure arithmetic operations in the 20GB range don't overflow anything. However, the script was only tested on smaller files, so double check it before running it against your data.
How do I extract a single chunk of bytes from within a file?
Try dd
:
dd skip=102567 count=253 if=input.binary of=output.binary bs=1
The option bs=1
sets the block size, making dd
read and write one byte at a time. The default block size is 512 bytes.
The value of bs
also affects the behavior of skip
and count
since the numbers in skip
and count
are the numbers of blocks that dd
will skip and read/write, respectively.
Related Topics
Rename Part of File Name Based on Exact Match in Contents of Another File
How to Run a Shell Script by Cron Job
Cuda Compiler Not Working with Gcc 4.5 +
Writing a Program for Hiding Processes from Ps Command Result
Unix - Count of Columns in File
Get First Line of a Shell Command's Output
Determine Target Isa Extensions of Binary File in Linux (Library or Executable)
How to Find a Specific File from a Linux Terminal
Subtract Days from a Date in Bash
How to Automate Telnet Session Using Expect
Printing an Integer with X86 32-Bit Linux Sys_Write (Nasm)
How to Use Stdin Twice from Pipe
Why Didn't I Get Segmentation Fault When Storing Past the End of the Bss
Trying to Ping Linux Vm Hosted on Azure Does Not Work
Create a Dedicated Folder for Every Zip Files in a Directory and Extract Zip Files
Permission Denied When Trying to Append a File to a Root Owned File with Sudo