Skip to content

Text Manipulating Cmds

Linux Text Manipulation Commands

awk suitable for smaller data processing.

  • Works like sed, line-by-line, but separate a line into parts to process.
  • The default separation char is space or [tab]
  • awk 'condition1{action1} condition2{action2} ...' filename
  • can use $numb to access which part, starting from 1:
    • last -n 5 | awk '{print $1 "\t" $3}'
    • $0 represents the entire line
  • additionally, awk has internal variables accessible:
    • NF how many parts on this line
    • NR which line is current line
    • FS current separation char
    • cat /etc/passwd | awk 'BEGIN {FS=":"} $3 < 10 {print $1 "\t " $3}'
  • using conditions for different outputs
    • i.e. awk 'NR==1{printf "%10s %10s %10s %10s %10s\n",$1,$2,$3,$4,"Total" } NR>=2{total = $2 + $3 + $4; printf "%10s %10d %10d %10d %10.2f\n", $1, $2, $3, $4, total}'

col for simple process of a text file, like converting [tab] with spaces, etc.

cut gets some part of info out of a line of text, like a log

  • -d: [sparation char]
  • -f: nth_part
  • -c: get range of characters

diff compare two pure-text files/dirs and output the differences.

  • diff [-bBi] from-file to-file
  • -b: ignore diff of one of more spaces, like "about me" and "about me" are the same
  • -B: ignore empty lines
  • -i: ignore capitalized differences
  • diff can be used on directories to show difference in files

grep supports regex, analyze a block of text and get the lines containing the match

  • grep [-A] [-B] [--color=auto] 'search_regex' filename
  • -A is display n lines after the result line
  • -B: is display n lines before the result line
  • -n: show line-number
  • -v: reverse the condition
  • Extended regex
    • '|' means OR i.e: grep -v '^$' file | grep -v '^#' gives the same result as grep -v -E '^$|^#' file, to show lines without empty lines and commented lines
    • grouping '()', egrep -n 'g(la|oo)d' file finds 'good' or 'glad' lines

join, paste, expand

  • join merge two files by comparing them and only put together similar parts/lines.
    • Files should be sorted before doing join.
  • paste is simpler, just connected two lines together with a [tab]
  • expand converts [tab] as a number of spaces

patch

  • use diff to generate difference file, then apply difference file on the old file to patch updates.
  • diff -Naur passwd.old passwd.new > passwd.patch
  • patch -pN < patch_file apply patch
  • patch -R -pN < patch_file restore old file from patch

pr for processing pure text and format to be print-ready.

printf format lines columns to be visually appealing

sed useful for analyzing input, replace, delete, append, or extract text and lines

sed [-nefr] ['action'] [filename] - -n: silent mode, only processed lines being output - -e [script]: have the script added to the command to be executed - -f [filename]: read script from a file - -r: let sed work with extended regex - -i: direct modify the file instead of output results - [action]: - in the form of [n1[,n2]]function; function has: - a: insert a line after i.e. nl /etc/passwd | sed '2a drink tea' - c: replace lines b/w n1,n2 - d: delete matched line i.e. nl /etc/passwd | sed '2,5d' - i: insert a line before - p: print (stdout) selected lines of data/text i.e. nl /etc/passwd | sed -n '5,7p' is same as ln file | head -n 7 | tail -n 3 - s: find and replace inline! 1,20s/old_phrase/new_phrase/g here the phrase part supports regex!

sort arranges text lines in the order we want.

  • -f: ignore capitalized difference
  • -b: ignore the space at the beginning
  • -M: arrange using month
  • -n: use number to arrange
  • -r: reversed order
  • -u: uniq lines only (filter out repeated lines)
  • -t: separation char for columns (fields), default is [tab]
  • -k [n]: use nth field to arrange

split useful for splitting a large file into smaller ones according to size, or number of lines.

tee redirects data as well as saving part of the data.

  • last | tee last.list | cut -d " " -f1

tr deletes / replaces some text within a block of text

  • last | tr '[a-z]' 'A-Z' will replace all lower case with upper

wc shows text stats like number of characters, lines, english words.

  • -l: lines
  • -w: words
  • -m: characters

uniq shows only unique (non-repeated) lines only

  • -c show count

xargs provides pipe access to the commands that don't support pipes

xclip copy STDOUT piped from other commands to the clipboard; MacOS equivalent is pbcopy