How the Util of iOStat Is Computed

How the util of iostat is computed?

%util is named busy in the source code of iostat: https://code.google.com/p/tester-higkoo/source/browse/trunk/Tools/iostat/iostat.c#380

Busy is counted as percent ratio of Ticks to deltams, limited to 100%

busy = 100.0 * blkio.ticks / deltams; /* percentage! */
if (busy > 100.0) busy = 100.0;

DeltaMS is sum of system load for the period of time (user time + system time + idle time + iowait)/ ncpu.

double deltams = 1000.0 *
((new_cpu.user + new_cpu.system +
new_cpu.idle + new_cpu.iowait) -
(old_cpu.user + old_cpu.system +
old_cpu.idle + old_cpu.iowait)) / ncpu / HZ;

Ticks - is the Time of requests in queue for the period

blkio.ticks = new_blkio[p].ticks
- old_blkio[p].ticks;

In more current version of sysstat the code is bit different:
http://sources.debian.net/src/sysstat/10.2.0-1/iostat.c#L959

/*       rrq/s wrq/s   r/s   w/s  rsec  wsec  rqsz  qusz await r_await w_await svctm %util */
printf(" %8.2f %8.2f %7.2f %7.2f %8.2f %8.2f %8.2f %8.2f %7.2f %7.2f %7.2f %6.2f %6.2f\n",
...
/*
* Again: Ticks in milliseconds.
* In the case of a device group (option -g), shi->used is the number of
* devices in the group. Else shi->used equals 1.
*/
shi->used ? xds.util / 10.0 / (double) shi->used
: xds.util / 10.0); /* shi->used should never be null here */

xds is filled in the compute_ext_disk_stats(&sdc, &sdp, itv, &xds); http://sources.debian.net/src/sysstat/10.2.0-1/common.c?hl=679#L679

/*
* Macros used to display statistics values.
*
* HZ is 1024 on IA64 and % should be normalized to 100.
*/
#define S_VALUE(m,n,p) (((double) ((n) - (m))) / (p) * HZ)

xds->util = S_VALUE(sdp->tot_ticks, sdc->tot_ticks, itv);

And there is the filling of tot_ticks from iostat.c

  * @ioi        Current sample statistics.
* @ioj Previous sample statistics.
* @itv Interval of time.
...

sdc.tot_ticks = ioi->tot_ticks;
sdp.tot_ticks = ioj->tot_ticks;

tot_ticks are read from "sysfs stat for current block device or partition" in read_sysfs_file_stat (iostat.c:487), and ioi and ioj are current and previous stat.

how to get disk read/write bytes per second from /proc in programming on linux?

TLDR Sector is 512 bytes (octets; 1 sector is 512 bytes; each bytes is 8 bits; every bit is either 0 or 1, but not superposition of them).

"The standard sector size of 512 bytes for magnetic disks was established ....[dubious – discuss] " (c) wiki https://en.wikipedia.org/wiki/Disk_sector

How to check sector size for io statistics (in /proc) in linux:

Check how iostat tool works (it shows kilobyte per second when started as iostat 1) - it is part of sysstat package:

https://github.com/sysstat/sysstat/blob/master/iostat.c

 * Read stats from /proc/diskstats.
void read_diskstats_stat(int curr)
...
/* major minor name rio rmerge rsect ruse wio wmerge wsect wuse running use aveq */
i = sscanf(line, "%u %u %s %lu %lu %lu %lu %lu %lu %lu %u %u %u %u",
&major, &minor, dev_name,
&rd_ios, &rd_merges_or_rd_sec, &rd_sec_or_wr_ios, &rd_ticks_or_wr_sec,
&wr_ios, &wr_merges, &wr_sec, &wr_ticks, &ios_pgr, &tot_ticks, &rq_ticks);

if (i == 14) {
....
sdev.rd_sectors = rd_sec_or_wr_ios;
....
sdev.wr_sectors = wr_sec;
....

* @fctr Conversion factor.
...
if (DISPLAY_KILOBYTES(flags)) {
printf(" kB_read/s kB_wrtn/s kB_read kB_wrtn\n");
*fctr = 2;
}
...
/* rrq/s wrq/s r/s w/s rsec wsec rqsz qusz await r_await w_await svctm %util */
... 4 columns skipped
cprintf_f(4, 8, 2,
S_VALUE(ioj->rd_sectors, ioi->rd_sectors, itv) / fctr,
S_VALUE(ioj->wr_sectors, ioi->wr_sectors, itv) / fctr,

So, read sector count and divide by two to get kilobyte/s (seems like 1 sector read is 0.5 kb read; 2 sector read is 1 kb read and so on). We can conclude that the sector is always 512 bytes. Same is stated in the doc, isn't it?:

internet search for "/proc/diskstats" ->

https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats ->

https://www.kernel.org/doc/Documentation/iostats.txt "I/O statistics fields" by ricklind from usa's ibm

Field  3 -- # of sectors read
This is the total number of sectors read successfully.

Field 7 -- # of sectors written
This is the total number of sectors written successfully.

No info about sector size here (why?). Is the source code being the best documentation (it may be)? The writer of /proc/diskstats is in kernel sources in file block/genhd.c, function diskstats_show:

http://lxr.free-electrons.com/source/block/genhd.c?v=4.4#L1149

1170                 seq_printf(seqf, "%4d %7d %s %lu %lu %lu "
1171 "%u %lu %lu %lu %u %u %u %u\n",
...
1176 part_stat_read(hd, sectors[READ]),
...
1180 part_stat_read(hd, sectors[WRITE]),

Structure sectors is defined in http://lxr.free-electrons.com/source/include/linux/genhd.h?v=4.4#L82

 82 struct disk_stats {
83 unsigned long sectors[2]; /* READs and WRITEs */

It is read with part_stat_read and written with __part_stat_add

http://lxr.free-electrons.com/source/include/linux/genhd.h?v=4.4#L307

Adding to the sectors counter ... is... at http://lxr.free-electrons.com/source/block/blk-core.c?v=4.4#L2264

2264 void blk_account_io_completion(struct request *req, unsigned int bytes)
2265 {
2266 if (blk_do_io_stat(req)) {
2267 const int rw = rq_data_dir(req);
2268 struct hd_struct *part;
2269 int cpu;
2270
2271 cpu = part_stat_lock();
2272 part = req->part;
2273 part_stat_add(cpu, part, sectors[rw], bytes >> 9);
2274 part_stat_unlock();
2275 }
2276 }

It uses hard-coded "bytes >> 9" to compute sector size from request size in bytes (why round down??) or for human, not no-floating-point compiler, it is the same as bytes / 512.

There is also blk_rq_sectors function (unused here...) to get sector count from request, which does the same >>9 from bytes to sectors
http://lxr.free-electrons.com/source/include/linux/blkdev.h?v=4.4#L853

841 static inline unsigned int blk_rq_bytes(const struct request *rq)
842 {
843 return rq->__data_len;
844 }

853 static inline unsigned int blk_rq_sectors(const struct request *rq)
854 {
855 return blk_rq_bytes(rq) >> 9;
856 }

Authors of FS/VFS subsystem in Linux says in reply to https://lkml.org/lkml/2015/8/17/234 "Why is SECTOR_SIZE = 512 inside kernel ?" (2015):

 #define SECTOR_SHIFT 9

Message https://lkml.org/lkml/2015/8/17/269 by Theodore Ts'o:

It's cast in stone. There are too many places all over the kernel,
especially in a huge number of file systems, which assume that the
sector size is 512 bytes. So above the block layer, the sector size
is always going to be 512.

This is actually better for user space programs using
/proc/diskstats, since they don't need to know whether a particular
underlying hardware is using 512, 4k, (or if the HDD manufacturers
fantasies become true 32k or 64k) sector sizes.

For similar reason, st_blocks in struct size is always in units of 512
bytes. We don't want to force userspace to have to figure out whether
the underlying file system is using 1k, 2k, or 4k. For that reason
the units of st_blocks is always going to be 512 bytes, and this is
hard-coded in the POSIX standard.

Retrieve CPU usage and memory usage of a single process on Linux?

ps -p <pid> -o %cpu,%mem,cmd

(You can leave off "cmd" but that might be helpful in debugging).

Note that this gives average CPU usage of the process over the time it has been running.

Will an IO blocked process show 100% CPU utilization in 'top' output?

Even a single IO-bound process will rarely show high CPU utilization because the operating system has scheduled its IO and is usually just waiting for it to complete. So top cannot accurately distinguish between an IO-bound process and a non-IO-bound process that merely periodically uses the CPU. In fact, a system horribly overloaded with all IO-bound processes, barely able to accomplish anything can exhibit very low CPU utilization.

Using only top, as a first pass, you can indeed merely keep adding threads/processes until CPU utilization levels off to determine the approximate configuration for a given machine.

Heavy mysql usage CPU or Memory

I would go for 32GB memory and maybe more harddisks in RAID. CPU won't help that much - you have eough cpu power. You also need to configure mysql correctly.

  • Leave 1-2 GB for OS cache and for temp tables.
  • Increase tmp_table_size
  • remove swap
  • optimize query_cache_size (don't make it too big - see mysql documentation about it)
  • periodically run FLUSH QUERY CACHE. if your query cache is <512 MB - run it every 5 minutes. This doesn't clean the cache, it optimizes it (defragment). This is from mysql docs:

Defragment the query cache to better
utilize its memory. FLUSH QUERY CACHE
does not remove any queries from the
cache, unlike FLUSH TABLES or RESET
QUERY CACHE.

However I noticed that the other solution has the half disk space: 850GB, which might be reduced number of hard disks. That's generally a bad idea. The biggest problem in databases is hard disks. If you use RAID5 - make sure you don't use less hard disks. If you don't use raid at all - I would suggest raid 0.



Related Topics



Leave a reply



Submit