Skip to content

Text Processing

Commands for transforming, filtering, sorting, and comparing text. Most text processing commands produce TextOutput objects. The notable exception is sort, which is a pipeline bridge that preserves the original typed objects passing through it.

Stream editor for filtering and transforming text line by line.

sed [OPTIONS] EXPRESSION [FILE...]
FlagDescription
-nSuppress default output; only print when explicitly requested with p
-iEdit files in place
-EUse extended regular expressions (unescaped +, ?, |, (), {})
-e EXPRAdd a sed expression (use multiple -e for chained edits)
CommandSyntaxDescription
Substitutes/pattern/replacement/Replace first match on each line
Substitute globals/pattern/replacement/gReplace all matches on each line
DeletedDelete the line
PrintpPrint the line (useful with -n)
Transliteratey/src/dst/Replace each character in src with the corresponding character in dst

Addresses restrict which lines a command applies to.

AddressExampleDescription
Line number3dApply to line 3 only
Range2,5dApply to lines 2 through 5
Range to end3,$dApply from line 3 to end of input
Regex/error/dApply to lines matching the pattern
Regex range/start/,/end/dApply from first match of start through first match of end

TextOutput objects (one per output line).

Terminal window
# Replace first occurrence of 'foo' with 'bar' on each line
echo "foo foo foo" | sed 's/foo/bar/'
bar foo foo
Terminal window
# Global replacement
echo "foo foo foo" | sed 's/foo/bar/g'
bar bar bar
Terminal window
# Delete lines containing 'debug'
cat log.txt | sed '/debug/d'
Terminal window
# Print only lines matching a pattern (suppress default output)
cat data.txt | sed -n '/ERROR/p'
Terminal window
# Edit a file in place
sed -i 's/oldname/newname/g' config.txt
Terminal window
# Multiple expressions with -e
cat input.txt | sed -e 's/alpha/ALPHA/g' -e '/^$/d'
Terminal window
# Transliterate characters (rot13-style swap)
echo "hello" | sed 'y/helo/HELO/'
HELLO

Pattern-scanning and processing language. Each input line is split into fields that you can reference by position.

awk [OPTIONS] 'PROGRAM' [FILE...]
FlagDescription
-F SEPSet field separator (default: whitespace)
-v VAR=VALSet a variable before execution begins
ReferenceDescription
$0The entire line
$1, $2, …Individual fields (1-indexed)
$NFThe last field
VariableDescription
NRCurrent line number (1-based)
NFNumber of fields on the current line
FSInput field separator
OFSOutput field separator (default: space)

An awk program consists of pattern/action pairs. The action runs for each line where the pattern matches.

PatternExampleDescription
(none){ print $1 }Runs on every line
BEGINBEGIN { OFS="," }Runs once before input
ENDEND { print NR }Runs once after all input
/regex//error/ { print }Lines matching the regex
Expression$3 > 100 { print $1 }Lines where the expression is true
FunctionDescription
length(s)Length of string s (or $0 if omitted)
substr(s, start)Substring from position start (1-based)
substr(s, start, len)Substring of length len from start
tolower(s)Convert to lowercase
toupper(s)Convert to uppercase
gsub(/re/, repl)Replace all matches in $0
sub(/re/, repl)Replace first match in $0
FunctionDescription
print expr, ...Print values separated by OFS
printf "fmt", ...Formatted output (C-style %s, %d, %f)

TextOutput objects (one per output line).

Terminal window
# Print the second field of each line
echo "alice 90" | awk '{ print $2 }'
90
Terminal window
# Set field separator to colon
cat /etc/passwd | awk -F: '{ print $1 }'
Terminal window
# Filter lines where column 3 exceeds a threshold
ps aux | awk '$3 > 5.0 { print $1, $11, $3 }'
Terminal window
# Use BEGIN/END blocks
cat data.csv | awk -F, 'BEGIN { sum=0 } { sum+=$2 } END { print "Total:", sum }'
Terminal window
# Printf for formatted output
echo "hello world" | awk '{ printf "%s has %d chars\n", $1, length($1) }'
hello has 5 chars
Terminal window
# Pattern matching with regex
cat log.txt | awk '/ERROR/ { print NR, $0 }'

Extract selected fields or character positions from each line.

cut [OPTIONS] [FILE...]
FlagDescription
-d DELIMSet field delimiter (default: tab)
-f FIELDSSelect fields by number (e.g. 1, 1,3, 1-3)
-c CHARSSelect characters by position (e.g. 1-5, 2,4)
SpecDescription
NSingle field or character position
N,MMultiple specific positions
N-MRange from N to M (inclusive)

TextOutput objects (one per output line).

Terminal window
# Extract second field from colon-delimited input
echo "alice:90:A" | cut -d: -f2
90
Terminal window
# Extract multiple fields
echo "one:two:three:four" | cut -d: -f1,3
one:three
Terminal window
# Extract a range of fields
echo "a:b:c:d:e" | cut -d: -f2-4
b:c:d
Terminal window
# Extract characters by position
echo "abcdefgh" | cut -c1-4
abcd

Translate, squeeze, or delete characters.

pipeline | tr [OPTIONS] SET1 [SET2]
FlagDescription
-dDelete characters in SET1
-sSqueeze repeated characters in SET1 (or SET2 when translating)

Character ranges are supported as a-z, A-Z, 0-9, etc.

SetExpands to
a-zAll lowercase letters
A-ZAll uppercase letters
0-9All digits

TextOutput objects (one per output line).

Terminal window
# Convert lowercase to uppercase
echo "hello world" | tr 'a-z' 'A-Z'
HELLO WORLD
Terminal window
# Delete digits
echo "abc123def" | tr -d '0-9'
abcdef
Terminal window
# Squeeze repeated spaces
echo "hello world" | tr -s ' '
hello world
Terminal window
# Translate and squeeze
echo "aabbcc" | tr -s 'abc' 'xyz'
xyz

Filter adjacent duplicate lines.

uniq [OPTIONS] [FILE...]
FlagDescription
-cPrefix each line with its count of consecutive occurrences
-dOnly print lines that are repeated

TextOutput objects (one per output line).

Terminal window
# Remove consecutive duplicates
echo "apple" "apple" "banana" "banana" "apple" | uniq
apple
banana
apple
Terminal window
# Count occurrences
echo "a" "a" "b" "a" "a" "a" | uniq -c
2 a
1 b
3 a
Terminal window
# Show only repeated lines
echo "x" "y" "y" "z" | uniq -d
y

Sort lines of text. This is a pipeline bridge command: it sorts by the BashText representation but preserves the original typed objects.

sort [OPTIONS] [FILE...]
FlagDescription
-rReverse the sort order
-nNumeric sort
-uRemove duplicates (unique output)
-fCase-insensitive sort
-k NSort by field N (1-based)
-t SEPUse SEP as the field delimiter
-hHuman-numeric sort (e.g. 2K, 1G)
-VVersion-number sort (e.g. 1.2, 1.10)
-MMonth sort (e.g. Jan, Feb)
-cCheck whether input is already sorted (sets exit code)

Original pipeline objects are passed through unchanged. When reading from files, TextOutput objects are produced.

Terminal window
# Basic alphabetical sort
echo "banana" "apple" "cherry" | sort
apple
banana
cherry
Terminal window
# Numeric sort in reverse
echo "10" "2" "30" "1" | sort -rn
30
10
2
1
Terminal window
# Sort by a specific field
echo "alice 90" "bob 75" "carol 88" | sort -k2 -n
bob 75
carol 88
alice 90
Terminal window
# Pipeline bridge: typed objects survive sorting
$procs = ps aux | sort -k3 -rn | head 5
$procs[0].CPU # Real decimal value
$procs[0].Command # Process name
Terminal window
# Sort file sizes (human-numeric)
ls -lh | sort -k5 -h
Terminal window
# Unique lines only
echo "a" "b" "a" "c" "b" | sort -u
a
b
c

Format input into aligned columns.

column [OPTIONS] [FILE...]
FlagDescription
-tTable mode: detect columns and align them
-s SEPUse SEP as the input column separator (used with -t)

TextOutput objects (one per output line).

Terminal window
# Align whitespace-delimited data into columns
echo "name age city" "alice 30 portland" "bob 25 seattle" | column -t
name age city
alice 30 portland
bob 25 seattle
Terminal window
# Use a custom separator
echo "alice:30:portland" "bob:25:seattle" | column -t -s:
alice 30 portland
bob 25 seattle
Terminal window
# Format command output into a table
mount | column -t

Join two sorted files on a common field (like a relational join).

join [OPTIONS] FILE1 FILE2
FlagDescription
-t SEPUse SEP as the field delimiter (default: space)
-1 NJoin on field N of the first file (default: 1)
-2 NJoin on field N of the second file (default: 1)

TextOutput objects (one per matched pair).

Terminal window
# Join two files on the first field
# names.txt: "1 alice\n2 bob\n3 carol"
# scores.txt: "1 90\n2 75\n3 88"
join names.txt scores.txt
1 alice 90
2 bob 75
3 carol 88
Terminal window
# Join on different fields with a custom delimiter
join -t, -1 2 -2 1 data.csv lookup.csv
Terminal window
# Join employee and department data
join employees.txt departments.txt

Merge lines from multiple files side by side.

paste [OPTIONS] FILE1 [FILE2...]
FlagDescription
-d DELIMUse DELIM as the output delimiter (default: tab)
-sSerial mode: paste each file as a single line

TextOutput objects (one per output line).

Terminal window
# Merge two files side by side (tab-separated)
# names.txt: "alice\nbob\ncarol"
# ages.txt: "30\n25\n28"
paste names.txt ages.txt
alice 30
bob 25
carol 28
Terminal window
# Use comma as delimiter
paste -d, names.txt ages.txt
alice,30
bob,25
carol,28
Terminal window
# Serial mode: each file becomes one line
paste -s names.txt
alice bob carol

Compare two sorted files line by line. Output is displayed in three columns: lines only in file 1, lines only in file 2, and lines common to both.

comm [OPTIONS] FILE1 FILE2
FlagDescription
-1Suppress column 1 (lines only in file 1)
-2Suppress column 2 (lines only in file 2)
-3Suppress column 3 (lines common to both)

TextOutput objects (one per output line). Columns are tab-indented.

Terminal window
# Show all three columns
# file1.txt (sorted): "apple\nbanana\ncherry"
# file2.txt (sorted): "banana\ncherry\ndate"
comm file1.txt file2.txt
apple
banana
cherry
date
Terminal window
# Show only lines common to both files
comm -12 file1.txt file2.txt
banana
cherry
Terminal window
# Show lines unique to the first file
comm -23 file1.txt file2.txt
apple

Compare two files and show the differences.

diff [OPTIONS] FILE1 FILE2
FlagDescription
-uUnified diff format (shows context lines with +/- markers)

TextOutput objects (one per output line). Produces no output if the files are identical.

Terminal window
# Normal diff format
# old.txt: "alpha\nbeta\ngamma"
# new.txt: "alpha\nBETA\ngamma\ndelta"
diff old.txt new.txt
2c2
< beta
---
> BETA
3a4
> delta
Terminal window
# Unified diff format
diff -u old.txt new.txt
--- old.txt
+++ new.txt
@@ -1,3 +1,4 @@
alpha
-beta
+BETA
gamma
+delta
Terminal window
# Check if two files are identical (no output means identical)
diff config.txt config.bak