Text Processing
Commands for transforming, filtering, sorting, and comparing text. Most text processing commands produce TextOutput objects. The notable exception is sort, which is a pipeline bridge that preserves the original typed objects passing through it.
Stream editor for filtering and transforming text line by line.
sed [OPTIONS] EXPRESSION [FILE...]| Flag | Description |
|---|---|
-n | Suppress default output; only print when explicitly requested with p |
-i | Edit files in place |
-E | Use extended regular expressions (unescaped +, ?, |, (), {}) |
-e EXPR | Add a sed expression (use multiple -e for chained edits) |
Supported commands
Section titled “Supported commands”| Command | Syntax | Description |
|---|---|---|
| Substitute | s/pattern/replacement/ | Replace first match on each line |
| Substitute global | s/pattern/replacement/g | Replace all matches on each line |
| Delete | d | Delete the line |
p | Print the line (useful with -n) | |
| Transliterate | y/src/dst/ | Replace each character in src with the corresponding character in dst |
Address ranges
Section titled “Address ranges”Addresses restrict which lines a command applies to.
| Address | Example | Description |
|---|---|---|
| Line number | 3d | Apply to line 3 only |
| Range | 2,5d | Apply to lines 2 through 5 |
| Range to end | 3,$d | Apply from line 3 to end of input |
| Regex | /error/d | Apply to lines matching the pattern |
| Regex range | /start/,/end/d | Apply from first match of start through first match of end |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Replace first occurrence of 'foo' with 'bar' on each lineecho "foo foo foo" | sed 's/foo/bar/'bar foo foo# Global replacementecho "foo foo foo" | sed 's/foo/bar/g'bar bar bar# Delete lines containing 'debug'cat log.txt | sed '/debug/d'# Print only lines matching a pattern (suppress default output)cat data.txt | sed -n '/ERROR/p'# Edit a file in placesed -i 's/oldname/newname/g' config.txt# Multiple expressions with -ecat input.txt | sed -e 's/alpha/ALPHA/g' -e '/^$/d'# Transliterate characters (rot13-style swap)echo "hello" | sed 'y/helo/HELO/'HELLOPattern-scanning and processing language. Each input line is split into fields that you can reference by position.
awk [OPTIONS] 'PROGRAM' [FILE...]| Flag | Description |
|---|---|
-F SEP | Set field separator (default: whitespace) |
-v VAR=VAL | Set a variable before execution begins |
Field access
Section titled “Field access”| Reference | Description |
|---|---|
$0 | The entire line |
$1, $2, … | Individual fields (1-indexed) |
$NF | The last field |
Built-in variables
Section titled “Built-in variables”| Variable | Description |
|---|---|
NR | Current line number (1-based) |
NF | Number of fields on the current line |
FS | Input field separator |
OFS | Output field separator (default: space) |
Program structure
Section titled “Program structure”An awk program consists of pattern/action pairs. The action runs for each line where the pattern matches.
| Pattern | Example | Description |
|---|---|---|
| (none) | { print $1 } | Runs on every line |
BEGIN | BEGIN { OFS="," } | Runs once before input |
END | END { print NR } | Runs once after all input |
/regex/ | /error/ { print } | Lines matching the regex |
| Expression | $3 > 100 { print $1 } | Lines where the expression is true |
String functions
Section titled “String functions”| Function | Description |
|---|---|
length(s) | Length of string s (or $0 if omitted) |
substr(s, start) | Substring from position start (1-based) |
substr(s, start, len) | Substring of length len from start |
tolower(s) | Convert to lowercase |
toupper(s) | Convert to uppercase |
gsub(/re/, repl) | Replace all matches in $0 |
sub(/re/, repl) | Replace first match in $0 |
Output functions
Section titled “Output functions”| Function | Description |
|---|---|
print expr, ... | Print values separated by OFS |
printf "fmt", ... | Formatted output (C-style %s, %d, %f) |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Print the second field of each lineecho "alice 90" | awk '{ print $2 }'90# Set field separator to coloncat /etc/passwd | awk -F: '{ print $1 }'# Filter lines where column 3 exceeds a thresholdps aux | awk '$3 > 5.0 { print $1, $11, $3 }'# Use BEGIN/END blockscat data.csv | awk -F, 'BEGIN { sum=0 } { sum+=$2 } END { print "Total:", sum }'# Printf for formatted outputecho "hello world" | awk '{ printf "%s has %d chars\n", $1, length($1) }'hello has 5 chars# Pattern matching with regexcat log.txt | awk '/ERROR/ { print NR, $0 }'Extract selected fields or character positions from each line.
cut [OPTIONS] [FILE...]| Flag | Description |
|---|---|
-d DELIM | Set field delimiter (default: tab) |
-f FIELDS | Select fields by number (e.g. 1, 1,3, 1-3) |
-c CHARS | Select characters by position (e.g. 1-5, 2,4) |
Field/character specifications
Section titled “Field/character specifications”| Spec | Description |
|---|---|
N | Single field or character position |
N,M | Multiple specific positions |
N-M | Range from N to M (inclusive) |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Extract second field from colon-delimited inputecho "alice:90:A" | cut -d: -f290# Extract multiple fieldsecho "one:two:three:four" | cut -d: -f1,3one:three# Extract a range of fieldsecho "a:b:c:d:e" | cut -d: -f2-4b:c:d# Extract characters by positionecho "abcdefgh" | cut -c1-4abcdTranslate, squeeze, or delete characters.
pipeline | tr [OPTIONS] SET1 [SET2]| Flag | Description |
|---|---|
-d | Delete characters in SET1 |
-s | Squeeze repeated characters in SET1 (or SET2 when translating) |
Character sets
Section titled “Character sets”Character ranges are supported as a-z, A-Z, 0-9, etc.
| Set | Expands to |
|---|---|
a-z | All lowercase letters |
A-Z | All uppercase letters |
0-9 | All digits |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Convert lowercase to uppercaseecho "hello world" | tr 'a-z' 'A-Z'HELLO WORLD# Delete digitsecho "abc123def" | tr -d '0-9'abcdef# Squeeze repeated spacesecho "hello world" | tr -s ' 'hello world# Translate and squeezeecho "aabbcc" | tr -s 'abc' 'xyz'xyzFilter adjacent duplicate lines.
uniq [OPTIONS] [FILE...]| Flag | Description |
|---|---|
-c | Prefix each line with its count of consecutive occurrences |
-d | Only print lines that are repeated |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Remove consecutive duplicatesecho "apple" "apple" "banana" "banana" "apple" | uniqapplebananaapple# Count occurrencesecho "a" "a" "b" "a" "a" "a" | uniq -c 2 a 1 b 3 a# Show only repeated linesecho "x" "y" "y" "z" | uniq -dySort lines of text. This is a pipeline bridge command: it sorts by the BashText representation but preserves the original typed objects.
sort [OPTIONS] [FILE...]| Flag | Description |
|---|---|
-r | Reverse the sort order |
-n | Numeric sort |
-u | Remove duplicates (unique output) |
-f | Case-insensitive sort |
-k N | Sort by field N (1-based) |
-t SEP | Use SEP as the field delimiter |
-h | Human-numeric sort (e.g. 2K, 1G) |
-V | Version-number sort (e.g. 1.2, 1.10) |
-M | Month sort (e.g. Jan, Feb) |
-c | Check whether input is already sorted (sets exit code) |
Return type
Section titled “Return type”Original pipeline objects are passed through unchanged. When reading from files, TextOutput objects are produced.
Examples
Section titled “Examples”# Basic alphabetical sortecho "banana" "apple" "cherry" | sortapplebananacherry# Numeric sort in reverseecho "10" "2" "30" "1" | sort -rn301021# Sort by a specific fieldecho "alice 90" "bob 75" "carol 88" | sort -k2 -nbob 75carol 88alice 90# Pipeline bridge: typed objects survive sorting$procs = ps aux | sort -k3 -rn | head 5$procs[0].CPU # Real decimal value$procs[0].Command # Process name# Sort file sizes (human-numeric)ls -lh | sort -k5 -h# Unique lines onlyecho "a" "b" "a" "c" "b" | sort -uabccolumn
Section titled “column”Format input into aligned columns.
column [OPTIONS] [FILE...]| Flag | Description |
|---|---|
-t | Table mode: detect columns and align them |
-s SEP | Use SEP as the input column separator (used with -t) |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Align whitespace-delimited data into columnsecho "name age city" "alice 30 portland" "bob 25 seattle" | column -tname age cityalice 30 portlandbob 25 seattle# Use a custom separatorecho "alice:30:portland" "bob:25:seattle" | column -t -s:alice 30 portlandbob 25 seattle# Format command output into a tablemount | column -tJoin two sorted files on a common field (like a relational join).
join [OPTIONS] FILE1 FILE2| Flag | Description |
|---|---|
-t SEP | Use SEP as the field delimiter (default: space) |
-1 N | Join on field N of the first file (default: 1) |
-2 N | Join on field N of the second file (default: 1) |
Return type
Section titled “Return type”TextOutput objects (one per matched pair).
Examples
Section titled “Examples”# Join two files on the first field# names.txt: "1 alice\n2 bob\n3 carol"# scores.txt: "1 90\n2 75\n3 88"join names.txt scores.txt1 alice 902 bob 753 carol 88# Join on different fields with a custom delimiterjoin -t, -1 2 -2 1 data.csv lookup.csv# Join employee and department datajoin employees.txt departments.txtMerge lines from multiple files side by side.
paste [OPTIONS] FILE1 [FILE2...]| Flag | Description |
|---|---|
-d DELIM | Use DELIM as the output delimiter (default: tab) |
-s | Serial mode: paste each file as a single line |
Return type
Section titled “Return type”TextOutput objects (one per output line).
Examples
Section titled “Examples”# Merge two files side by side (tab-separated)# names.txt: "alice\nbob\ncarol"# ages.txt: "30\n25\n28"paste names.txt ages.txtalice 30bob 25carol 28# Use comma as delimiterpaste -d, names.txt ages.txtalice,30bob,25carol,28# Serial mode: each file becomes one linepaste -s names.txtalice bob carolCompare two sorted files line by line. Output is displayed in three columns: lines only in file 1, lines only in file 2, and lines common to both.
comm [OPTIONS] FILE1 FILE2| Flag | Description |
|---|---|
-1 | Suppress column 1 (lines only in file 1) |
-2 | Suppress column 2 (lines only in file 2) |
-3 | Suppress column 3 (lines common to both) |
Return type
Section titled “Return type”TextOutput objects (one per output line). Columns are tab-indented.
Examples
Section titled “Examples”# Show all three columns# file1.txt (sorted): "apple\nbanana\ncherry"# file2.txt (sorted): "banana\ncherry\ndate"comm file1.txt file2.txtapple banana cherry date# Show only lines common to both filescomm -12 file1.txt file2.txtbananacherry# Show lines unique to the first filecomm -23 file1.txt file2.txtappleCompare two files and show the differences.
diff [OPTIONS] FILE1 FILE2| Flag | Description |
|---|---|
-u | Unified diff format (shows context lines with +/- markers) |
Return type
Section titled “Return type”TextOutput objects (one per output line). Produces no output if the files are identical.
Examples
Section titled “Examples”# Normal diff format# old.txt: "alpha\nbeta\ngamma"# new.txt: "alpha\nBETA\ngamma\ndelta"diff old.txt new.txt2c2< beta---> BETA3a4> delta# Unified diff formatdiff -u old.txt new.txt--- old.txt+++ new.txt@@ -1,3 +1,4 @@ alpha-beta+BETA gamma+delta# Check if two files are identical (no output means identical)diff config.txt config.bak