Guide to Unix/Commands/File Analysing

file
file displays the file type. To get the mimetype, use the -i option.

Examples

$ file Unix.txt Unix.txt: ASCII text

$ file -i Unix.txt Unix.txt: text/plain; charset=us-ascii

Links:
 * file, opengroup.org
 * file man page, man.cat-v.org

wc
wc tells you the number of lines, words and characters in a file.

Examples: $ wc hello.txt 2      6      29 hello.txt

$ wc -l hello.txt 2 hello.txt

$ wc -w hello.txt 6 hello.txt

$ wc -c hello.txt 29 hello.txt

Links:
 * wc, opengroup.org
 * wc man page, man.cat-v.org
 * 6.1 wc in GNU Coreutils manual, gnu.org

cksum
Outputs a particular variant of 32-bit cyclic redundancy check (CRC) checksum of a file, files or standard input, together with sizes; in latest GNU Coreutils and some other implementations, it can output other checksums via -a option. This variant of 32-bit CRC is different from the CRC-32 used by zip, PNG and zlib; for one thing, cksum calculates the CRC not only from the octet stream of the file or input but rather from the stream to which the stream length has been appended.

The CRC output by cksum can be used to protect against accidental modifications to files: if the checksum has not changed, the file is very likely undamaged. The default CRC checksum is not cryptographic: it protects only against modifications that are not malicious (intentional).

Latest GNU Coreutils cksum allows a choice from multiple different kinds of checksums, including cryptographic ones, via -a option. These include sysv, bsd, crc, md5, sha1, sha224, sha256, sha384, sha512, blake2b, and sm3. None of the checksums is the CRC-32 of zip, PNG and zlib. OpenBSD cksum provides -a option as well, while the list of algorithms differs slightly. FreeBSD cksum allows a choice of one of three checksum algorithms in addition to the default one via -o1, -o2 and -o3 options; -o3 is the CRC-32 of zip, PNG and zlib; this applies to macOS as well.

Examples: $ cksum /etc/passwd 3052342160 2119 /etc/passwd

Some "cksum" implementations provide other algorithms, such as "md5" and "sha1": $ cksum -a sha1 /etc/passwd SHA1 (/etc/passwd) = 816d937ca4cdb4dee92d5002610fae63b639d224

You can test "cksum" by feeding it a string via standard input: $ printf 'Guide to UNIX'|cksum 2195826759 13

Links:
 * cksum, opengroup.org
 * sum man page, man.cat-v.org
 * cksum, freebsd.org
 * 6.3 cksum in GNU Coreutils manual, gnu.org
 * cksum, openbsd.org
 * , wikipedia.org
 * Catalogue of parametrised CRC algorithms, reveng.sourceforge.io
 * cksum.c in coreutils, github.com
 * cksum folder in file_cmds-188, opensource.apple.com

sum
A legacy tool, outputs a certain kind of checksum of a file, files or standard input, together with sizes. Is not covered by POSIX; POSIX codified as a replacement tool instead, using a kind of checksum different from those used by legacy sum. Different variants of legacy sum used different algorithms. The legacy algorithms used by variants of sum are provided by the FreeBSD cksum via -o1 and -o2 options, and by latests GNU Coreutils cksum via -a option.

GNU Coreutils sum allows choice of legacy algorithm via -r and -s options.

The two commonly used legacy algorithms are as follows.

The BSD sum, -r in GNU sum:
 * Initialize checksum to 0
 * For each byte of the input stream
 * Perform 16-bit bitwise right rotation by 1 bit on the checksum
 * Add the byte to the checksum, and apply modulo 2 ^ 16 to the result, thereby keeping it within 16 bits
 * The result is a 16-bit checksum

The System V sum, -s in GNU sum:
 * checksum0 = sum of all bytes of the input stream modulo 2 ^ 32
 * checksum1 = checksum0 modulo 2 ^ 16 + checksum0 / 2 ^ 16;
 * checksum = checksum1 modulo 2 ^16 + checksum1 / 2 ^ 16;
 * The result is a 16-bit checksum calculated from the initial 32-bit plain byte sum

Links:
 * cksum, opengroup.org
 * sum man page, man.cat-v.org
 * cksum, freebsd.org
 * 6.2 sum in GNU Coreutils manual, gnu.org
 * , wikipedia.org
 * sum.c in coreutils, github.com
 * sum1.c in freebsd-src, github.com
 * sum2.c in freebsd-src, github.com
 * sum.c in Seventh Edition Unix, tuhs.org

stat
Outputs file or file system status, including size, access rights, creation and modification times and more. The command seems absent from POSIX; POSIX only specifies system call stat.

Links:
 * stat, man7.org
 * stat, freebsd.org
 * stat in GNU Coreutils manual, gnu.org

grep
Outputs lines matching a regular expression, not matching it, and similar, depending on options and the regular expression used. See Grep Wikibook.

Links:
 * grep, opengroup.org
 * grep man page, man.cat-v.org
 * grep, freebsd.org
 * GNU Grep 3.0, gnu.org

diff
Compares file content of two files line by line and outputs differences. See also diff3.

Links:
 * diff, opengroup.org
 * diff man page, man.cat-v.org
 * diff, freebsd.org
 * Comparing and Merging Files, gnu.org

diff3
Compares file content of three files line by line and outputs differences. See also diff.

Links:
 * diff3 man page, man.cat-v.org
 * diff3, freebsd.org
 * Comparing and Merging Files, gnu.org

cmp
Compares files byte by byte, outputting the byte number and the line number where a first difference is found, if any. Outputs nothing if the files are binary identical. No indication is made of the further differences beyond the first one unless option -l is used.

Links:
 * cmp, opengroup.org
 * cmp man page, man.cat-v.org
 * cmp, freebsd.org
 * Invoking cmp in Comparing and Merging Files, gnu.org

strings
Outputs printable strings found in files, useful when these files are binary.

Links:
 * strings, opengroup.org
 * strings, freebsd.org
 * strings in GNU Binary Utilities, sourceware.org
 * strings (Unix)