Regular expressions come handy when trying to filter some output from a file with grep or similar Linux command.
The Apache access log is one of several log files produced by an Apache HTTP server. This particular log file is responsible for recording data for all requests processed by the Apache server.
To quickly find and display all referrers domain/subdomain and how many are there you can use the following command
grep -Eo 'https?://[a-zA-Z0-9!@#$&()\\-`.+,]*' access.log | cut -d '/' -f3 | sort | uniq -c | sort -nr
Example output:
296 aljazvidmar.si
60 www.apple.com
47 www.semrush.com
39 ahrefs.com
38 webmaster.petalsearch.com
22 www.google.com
9 opensiteexplorer.org
8 dataforseo.com
7 www.bing.com
5 www.google.com.hk
2 napoveda.seznam.cz
2 ecairn.com
2 duckduckgo.com
1 photo.adesignstudio.net
Command break-down with comments
-E, --extended-regexp
Interpret PATTERNS as extended regular expressions (EREs, see below).
-o, --only-matchingPrint only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
match http with optional char ‘s’ ?s, followed by :// and any aphanumeric char + special characters except slash [a-zA-Z0-9!@#$&()-`.+,]* multiple times
sort the output
uniq -c find unique lines and count them
sort -rn sort them again in reverse order as numbers







