I'm assuming that you're doing this in a shell loop over all IP addresses, possibly with the IP addresses coming from a text file. Yes, that would be slow, with one invocation of sed or grep per IP address.
Instead, you may get away with a single use of sed, if you prepare carefully.
First, we have to create a sed script, and we do that from a file ip.list which contains the IP addresses, one address per line:
sed -e 'h' \
-e 's/\./\\./g' \
-e 's#.*#/^&[[:blank:]]/w /tmp/access-#' \
-e 'G' \
-e 's/\n//' \
-e 's/$/.log/' ip.list >ip.sed
This sed stuff does, for each IP address,
- Copy the address to the "hold space" (an extra buffer in
sed).
- Change
. in the "pattern space" (the input line) into \. (to match the dots properly, your code did not do this).
- Prepend
^ and append [[:blank:]]/w /tmp/access- to the pattern space.
- Append the unmodified input line from the hold space to the pattern space with a newline in-between.
- Delete that newline.
- Append
.log to the end of the line (and implicitly output the result).
For a file that contains
127.0.0.1
10.0.0.1
10.0.0.100
this would create the sed script
/^127\.0\.0\.1[[:blank:]]/w /tmp/access-127.0.0.1.log
/^10\.0\.0\.1[[:blank:]]/w /tmp/access-10.0.0.1.log
/^10\.0\.0\.100[[:blank:]]/w /tmp/access-10.0.0.100.log
Note that you will have to match a blank character (space or tab) after the IP address, otherwise the log entries for 10.0.0.100 would go into the /tmp/access-10.0.0.1.log file. Your code omitted this.
This can then be used on your log file (no looping):
sed -n -f ip.sed /var/log/http/access.log
I haven't ever tested writing to 1200 files from one and the same sed script. If it doesn't work, then try the below awk variation instead.
A similar solution with awk involves reading the IP addresses into an array first and then matching them against each row. This requires one single awk invocation:
awk 'FNR == NR { list[$1] = 1; next }
$1 in list { name = $1 ".log"; print >>name; close name }' ip.list /var/log/http/access.log
Here, we give awk both the IP list and the log file at the same time. When NR == FNR we know we're still reading the first file (the list), and we add the IP numbers into the associative array list as keys, and continue with the next line of input.
If the FNR == NR condition is not true, we're reading from the second file (the log file) and we test whether the very first field of the input line is a key in list (this is a plain string comparison, not a regular expression match). If it is, we append the line to the appropriately named file.
We have to be careful with closing the output file, as we might otherwise run out of opened file descriptors. So there's going to be a lot of opening and closing files for appending, but it's still going to be faster than calling awk (or any utility) once per IP address.
I'd be interested in knowing if these things work for you and what the approximate running time might be. I have tested the solutions only on extremely small sets of data.
Of course, we could go with your idea of just brute forcing it through throwing multiple instances of e.g. grep on the system in parallel:
Ignoring the fact that we don't match the dots in the IP addresses correctly, we might to something like
xargs -P 4 -n 100 sh -c '
for n do
grep "^$n[[:blank:]]" /var/log/http/access.log >"/tmp/access-$n.log"
done' sh <ip.list
Here, xargs will give at most 100 IP addresses at a time from the ip.list file to a short shell script. It will arrange with four parallel invocations of the script.
The short shell script:
for n do
grep "^$n[[:blank:]]" /var/log/http/access.log >"/tmp/access-$n.log"
done
This will just iterate over the 100 IP addresses that xargs gives it on its command line, and apply pretty much the same grep command that you had, the difference is that there will be four of these loops running in parallel.
Increase -P 4 to -P 16 or something related to the number of CPUs that you have. The speedup probably would not be linear as each parallel instance of grep would read from and write to the same disk.
Except for the -P flag to xargs, all things in this answer should be able to run on any POSIX system. The -P flag for xargs is non-standard but implemented in GNU xargs and on BSD systems.
-iwithsed? Did that not modify the original log file?-iis for doing in-place editing of the input file.