Text Manipulating Cmds
Linux Text Manipulation Commands¶
awk suitable for smaller data processing.
- Works like sed, line-by-line, but separate a line into parts to process.
- The default separation char is space or [tab]
awk 'condition1{action1} condition2{action2} ...' filename
- can use $numb to access which part, starting from 1:
last -n 5 | awk '{print $1 "\t" $3}'
- $0 represents the entire line
- additionally, awk has internal variables accessible:
NF
how many parts on this lineNR
which line is current lineFS
current separation charcat /etc/passwd | awk 'BEGIN {FS=":"} $3 < 10 {print $1 "\t " $3}'
- using conditions for different outputs
- i.e.
awk 'NR==1{printf "%10s %10s %10s %10s %10s\n",$1,$2,$3,$4,"Total" } NR>=2{total = $2 + $3 + $4; printf "%10s %10d %10d %10d %10.2f\n", $1, $2, $3, $4, total}'
- i.e.
col for simple process of a text file, like converting [tab] with spaces, etc.
cut gets some part of info out of a line of text, like a log
-d
: [sparation char]-f
: nth_part-c
: get range of characters
diff compare two pure-text files/dirs and output the differences.
diff [-bBi] from-file to-file
-b
: ignore diff of one of more spaces, like "about me" and "about me" are the same-B
: ignore empty lines-i
: ignore capitalized differences- diff can be used on directories to show difference in files
grep supports regex, analyze a block of text and get the lines containing the match
grep [-A] [-B] [--color=auto] 'search_regex' filename
-A
is display n lines after the result line-B
: is display n lines before the result line-n
: show line-number-v
: reverse the condition- Extended regex
- '|' means OR i.e:
grep -v '^$' file | grep -v '^#'
gives the same result asgrep -v -E '^$|^#' file
, to show lines without empty lines and commented lines - grouping '()',
egrep -n 'g(la|oo)d' file
finds 'good' or 'glad' lines
- '|' means OR i.e:
join, paste, expand
- join merge two files by comparing them and only put together similar parts/lines.
- Files should be sorted before doing join.
- paste is simpler, just connected two lines together with a [tab]
- expand converts [tab] as a number of spaces
patch
- use diff to generate difference file, then apply difference file on the old file to patch updates.
diff -Naur passwd.old passwd.new > passwd.patch
patch -pN < patch_file
apply patchpatch -R -pN < patch_file
restore old file from patch
pr for processing pure text and format to be print-ready.
printf format lines columns to be visually appealing
sed useful for analyzing input, replace, delete, append, or extract text and lines
sed [-nefr] ['action'] [filename]
- -n
: silent mode, only processed lines being output - -e
[script]: have the script added to the command to be executed - -f
[filename]: read script from a file - -r
: let sed work with extended regex - -i
: direct modify the file instead of output results - [action]
: - in the form of [n1[,n2]]function
; function has: - a
: insert a line after i.e. nl /etc/passwd | sed '2a drink tea'
- c
: replace lines b/w n1,n2 - d
: delete matched line i.e. nl /etc/passwd | sed '2,5d'
- i
: insert a line before - p
: print (stdout) selected lines of data/text i.e. nl /etc/passwd | sed -n '5,7p'
is same as ln file | head -n 7 | tail -n 3
- s
: find and replace inline! 1,20s/old_phrase/new_phrase/g
here the phrase part supports regex!
sort arranges text lines in the order we want.
-f
: ignore capitalized difference-b
: ignore the space at the beginning-M
: arrange using month-n
: use number to arrange-r
: reversed order-u
: uniq lines only (filter out repeated lines)-t
: separation char for columns (fields), default is [tab]-k
[n]: use nth field to arrange
split useful for splitting a large file into smaller ones according to size, or number of lines.
tee redirects data as well as saving part of the data.
last | tee last.list | cut -d " " -f1
tr deletes / replaces some text within a block of text
last | tr '[a-z]' 'A-Z'
will replace all lower case with upper
wc shows text stats like number of characters, lines, english words.
-l
: lines-w
: words-m
: characters
uniq shows only unique (non-repeated) lines only
-c
show count
xargs provides pipe access to the commands that don't support pipes
xclip copy STDOUT piped from other commands to the clipboard; MacOS equivalent is pbcopy