Text Processing

Commands for transforming, filtering, sorting, and comparing text. Most text processing commands produce TextOutput objects. The notable exception is sort, which is a pipeline bridge that preserves the original typed objects passing through it.

sed

Stream editor for filtering and transforming text line by line.

sed [OPTIONS] EXPRESSION [FILE...]

Flags

Flag	Description
`-n`	Suppress default output; only print when explicitly requested with `p`
`-i`	Edit files in place
`-E`	Use extended regular expressions (unescaped `+`, `?`, `\|`, `()`, `{}`)
`-e EXPR`	Add a sed expression (use multiple `-e` for chained edits)

Supported commands

Command	Syntax	Description
Substitute	`s/pattern/replacement/`	Replace first match on each line
Substitute global	`s/pattern/replacement/g`	Replace all matches on each line
Delete	`d`	Delete the line
Print	`p`	Print the line (useful with `-n`)
Transliterate	`y/src/dst/`	Replace each character in src with the corresponding character in dst

Address ranges

Addresses restrict which lines a command applies to.

Address	Example	Description
Line number	`3d`	Apply to line 3 only
Range	`2,5d`	Apply to lines 2 through 5
Range to end	`3,$d`	Apply from line 3 to end of input
Regex	`/error/d`	Apply to lines matching the pattern
Regex range	`/start/,/end/d`	Apply from first match of start through first match of end

Return type

TextOutput objects (one per output line).

Examples

# Replace first occurrence of 'foo' with 'bar' on each line
echo "foo foo foo" | sed 's/foo/bar/'

bar foo foo

# Global replacement
echo "foo foo foo" | sed 's/foo/bar/g'

bar bar bar

# Delete lines containing 'debug'
cat log.txt | sed '/debug/d'

# Print only lines matching a pattern (suppress default output)
cat data.txt | sed -n '/ERROR/p'

# Edit a file in place
sed -i 's/oldname/newname/g' config.txt

# Multiple expressions with -e
cat input.txt | sed -e 's/alpha/ALPHA/g' -e '/^$/d'

# Transliterate characters (rot13-style swap)
echo "hello" | sed 'y/helo/HELO/'

HELLO

awk

Pattern-scanning and processing language. Each input line is split into fields that you can reference by position.

awk [OPTIONS] 'PROGRAM' [FILE...]

Flags

Flag	Description
`-F SEP`	Set field separator (default: whitespace)
`-v VAR=VAL`	Set a variable before execution begins

Field access

Reference	Description
`$0`	The entire line
`$1`, `$2`, …	Individual fields (1-indexed)
`$NF`	The last field

Built-in variables

Variable	Description
`NR`	Current line number (1-based)
`NF`	Number of fields on the current line
`FS`	Input field separator
`OFS`	Output field separator (default: space)

Program structure

An awk program consists of pattern/action pairs. The action runs for each line where the pattern matches.

Pattern	Example	Description
(none)	`{ print $1 }`	Runs on every line
`BEGIN`	`BEGIN { OFS="," }`	Runs once before input
`END`	`END { print NR }`	Runs once after all input
`/regex/`	`/error/ { print }`	Lines matching the regex
Expression	`$3 > 100 { print $1 }`	Lines where the expression is true

String functions

Function	Description
`length(s)`	Length of string s (or `$0` if omitted)
`substr(s, start)`	Substring from position start (1-based)
`substr(s, start, len)`	Substring of length len from start
`tolower(s)`	Convert to lowercase
`toupper(s)`	Convert to uppercase
`gsub(/re/, repl)`	Replace all matches in `$0`
`sub(/re/, repl)`	Replace first match in `$0`

Output functions

Function	Description
`print expr, ...`	Print values separated by OFS
`printf "fmt", ...`	Formatted output (C-style `%s`, `%d`, `%f`)

Return type

TextOutput objects (one per output line).

Examples

# Print the second field of each line
echo "alice 90" | awk '{ print $2 }'

# Set field separator to colon
cat /etc/passwd | awk -F: '{ print $1 }'

# Filter lines where column 3 exceeds a threshold
ps aux | awk '$3 > 5.0 { print $1, $11, $3 }'

# Use BEGIN/END blocks
cat data.csv | awk -F, 'BEGIN { sum=0 } { sum+=$2 } END { print "Total:", sum }'

# Printf for formatted output
echo "hello world" | awk '{ printf "%s has %d chars\n", $1, length($1) }'

hello has 5 chars

# Pattern matching with regex
cat log.txt | awk '/ERROR/ { print NR, $0 }'

cut

Extract selected fields or character positions from each line.

cut [OPTIONS] [FILE...]

Flags

Flag	Description
`-d DELIM`	Set field delimiter (default: tab)
`-f FIELDS`	Select fields by number (e.g. `1`, `1,3`, `1-3`)
`-c CHARS`	Select characters by position (e.g. `1-5`, `2,4`)

Field/character specifications

Spec	Description
`N`	Single field or character position
`N,M`	Multiple specific positions
`N-M`	Range from N to M (inclusive)

Return type

TextOutput objects (one per output line).

Examples

# Extract second field from colon-delimited input
echo "alice:90:A" | cut -d: -f2

# Extract multiple fields
echo "one:two:three:four" | cut -d: -f1,3

one:three

# Extract a range of fields
echo "a:b:c:d:e" | cut -d: -f2-4

b:c:d

# Extract characters by position
echo "abcdefgh" | cut -c1-4

abcd

tr

Translate, squeeze, or delete characters.

pipeline | tr [OPTIONS] SET1 [SET2]

Flags

Flag	Description
`-d`	Delete characters in SET1
`-s`	Squeeze repeated characters in SET1 (or SET2 when translating)

Character sets

Character ranges are supported as a-z, A-Z, 0-9, etc.

Set	Expands to
`a-z`	All lowercase letters
`A-Z`	All uppercase letters
`0-9`	All digits

Return type

TextOutput objects (one per output line).

Examples

# Convert lowercase to uppercase
echo "hello world" | tr 'a-z' 'A-Z'

HELLO WORLD

# Delete digits
echo "abc123def" | tr -d '0-9'

abcdef

# Squeeze repeated spaces
echo "hello    world" | tr -s ' '

hello world

# Translate and squeeze
echo "aabbcc" | tr -s 'abc' 'xyz'

xyz

uniq

Filter adjacent duplicate lines.

uniq [OPTIONS] [FILE...]

Flags

Flag	Description
`-c`	Prefix each line with its count of consecutive occurrences
`-d`	Only print lines that are repeated

Return type

TextOutput objects (one per output line).

Examples

# Remove consecutive duplicates
echo "apple" "apple" "banana" "banana" "apple" | uniq

apple
banana
apple

# Count occurrences
echo "a" "a" "b" "a" "a" "a" | uniq -c

      2 a
      1 b
      3 a

# Show only repeated lines
echo "x" "y" "y" "z" | uniq -d

sort

Sort lines of text. This is a pipeline bridge command: it sorts by the BashText representation but preserves the original typed objects.

sort [OPTIONS] [FILE...]

Flags

Flag	Description
`-r`	Reverse the sort order
`-n`	Numeric sort
`-u`	Remove duplicates (unique output)
`-f`	Case-insensitive sort
`-k N`	Sort by field N (1-based)
`-t SEP`	Use SEP as the field delimiter
`-h`	Human-numeric sort (e.g. `2K`, `1G`)
`-V`	Version-number sort (e.g. `1.2`, `1.10`)
`-M`	Month sort (e.g. `Jan`, `Feb`)
`-c`	Check whether input is already sorted (sets exit code)

Return type

Original pipeline objects are passed through unchanged. When reading from files, TextOutput objects are produced.

Examples

# Basic alphabetical sort
echo "banana" "apple" "cherry" | sort

apple
banana
cherry

# Numeric sort in reverse
echo "10" "2" "30" "1" | sort -rn

# Sort by a specific field
echo "alice 90" "bob 75" "carol 88" | sort -k2 -n

bob 75
carol 88
alice 90

# Pipeline bridge: typed objects survive sorting
$procs = ps aux | sort -k3 -rn | head 5
$procs[0].CPU      # Real decimal value
$procs[0].Command  # Process name

# Sort file sizes (human-numeric)
ls -lh | sort -k5 -h

# Unique lines only
echo "a" "b" "a" "c" "b" | sort -u

a
b
c

column

Format input into aligned columns.

column [OPTIONS] [FILE...]

Flags

Flag	Description
`-t`	Table mode: detect columns and align them
`-s SEP`	Use SEP as the input column separator (used with `-t`)

Return type

TextOutput objects (one per output line).

Examples

# Align whitespace-delimited data into columns
echo "name age city" "alice 30 portland" "bob 25 seattle" | column -t

name   age  city
alice  30   portland
bob    25   seattle

# Use a custom separator
echo "alice:30:portland" "bob:25:seattle" | column -t -s:

alice  30  portland
bob    25  seattle

# Format command output into a table
mount | column -t

join

Join two sorted files on a common field (like a relational join).

join [OPTIONS] FILE1 FILE2

Flags

Flag	Description
`-t SEP`	Use SEP as the field delimiter (default: space)
`-1 N`	Join on field N of the first file (default: 1)
`-2 N`	Join on field N of the second file (default: 1)

Return type

TextOutput objects (one per matched pair).

Examples

# Join two files on the first field
# names.txt: "1 alice\n2 bob\n3 carol"
# scores.txt: "1 90\n2 75\n3 88"
join names.txt scores.txt

1 alice 90
2 bob 75
3 carol 88

# Join on different fields with a custom delimiter
join -t, -1 2 -2 1 data.csv lookup.csv

# Join employee and department data
join employees.txt departments.txt

paste

Merge lines from multiple files side by side.

paste [OPTIONS] FILE1 [FILE2...]

Flags

Flag	Description
`-d DELIM`	Use DELIM as the output delimiter (default: tab)
`-s`	Serial mode: paste each file as a single line

Return type

TextOutput objects (one per output line).

Examples

# Merge two files side by side (tab-separated)
# names.txt: "alice\nbob\ncarol"
# ages.txt: "30\n25\n28"
paste names.txt ages.txt

alice  30
bob  25
carol  28

# Use comma as delimiter
paste -d, names.txt ages.txt

alice,30
bob,25
carol,28

# Serial mode: each file becomes one line
paste -s names.txt

alice  bob  carol

comm

Compare two sorted files line by line. Output is displayed in three columns: lines only in file 1, lines only in file 2, and lines common to both.

comm [OPTIONS] FILE1 FILE2

Flags

Flag	Description
`-1`	Suppress column 1 (lines only in file 1)
`-2`	Suppress column 2 (lines only in file 2)
`-3`	Suppress column 3 (lines common to both)

Return type

TextOutput objects (one per output line). Columns are tab-indented.

Examples

# Show all three columns
# file1.txt (sorted): "apple\nbanana\ncherry"
# file2.txt (sorted): "banana\ncherry\ndate"
comm file1.txt file2.txt

apple
    banana
    cherry
  date

# Show only lines common to both files
comm -12 file1.txt file2.txt

banana
cherry

# Show lines unique to the first file
comm -23 file1.txt file2.txt

apple

diff

Compare two files and show the differences.

diff [OPTIONS] FILE1 FILE2

Flags

Flag	Description
`-u`	Unified diff format (shows context lines with `+`/`-` markers)

Return type

TextOutput objects (one per output line). Produces no output if the files are identical.

Examples

# Normal diff format
# old.txt: "alpha\nbeta\ngamma"
# new.txt: "alpha\nBETA\ngamma\ndelta"
diff old.txt new.txt

2c2
< beta
---
> BETA
3a4
> delta

# Unified diff format
diff -u old.txt new.txt

--- old.txt
+++ new.txt
@@ -1,3 +1,4 @@
 alpha
-beta
+BETA
 gamma
+delta

# Check if two files are identical (no output means identical)
diff config.txt config.bak