1129

I am looking for a command that will accept (as input) multiple lines of text, each line containing a single integer, and output the sum of these integers.

As a bit of background, I have a log file which includes timing measurements. Through grepping for the relevant lines and a bit of sed reformatting I can list all of the timings in that file. I would like to work out the total. I can pipe this intermediate output to any command in order to do the final sum. I have always used expr in the past, but unless it runs in RPN mode I do not think it is going to cope with this (and even then it would be tricky).

How can I get the summation of integers?

2

47 Answers 47

1687
+500

Bit of awk should do it?

awk '{s+=$1} END {print s}' mydatafile

Note: some versions of awk have some odd behaviours if you are going to be adding anything exceeding 2^31 (2147483647). See comments for more background. One suggestion is to use printf rather than print:

awk '{s+=$1} END {printf "%.0f", s}' mydatafile
Sign up to request clarification or add additional context in comments.

27 Comments

There's a lot of awk love in this room! I like how a simple script like this could be modified to add up a second column of data just by changing the $1 to $2
There's not a practical limit, since it will process the input as a stream. So, if it can handle a file of X lines, you can be pretty sure it can handle X+1.
I once wrote a rudimentary mailing list processer with an awk script run via the vacation utility. Good times. :)
just used this for a: count all documents’ pages script: ls $@ | xargs -i pdftk {} dump_data | grep NumberOfPages | awk '{s+=$2} END {print s}'
Be careful, it will not work with numbers greater than 2147483647 (i.e., 2^31), that's because awk uses a 32 bit signed integer representation. Use awk '{s+=$1} END {printf "%.0f", s}' mydatafile instead.
|
813

Paste typically merges lines of multiple files, but it can also be used to convert individual lines of a file into a single line. The delimiter flag allows you to pass a x+x type equation to bc.

paste -s -d+ infile | bc

Alternatively, when piping from stdin,

<commands> | paste -s -d+ - | bc

20 Comments

Very nice! I would have put a space before the "+", just to help me parse it better, but that was very handy for piping some memory numbers through paste & then bc.
Much easier to remember and type than the awk solution. Also, note that paste can use a dash - as the filename - which will allow you to pipe the numbers from the output of a command into paste's standard output without the need to create a file first: <commands> | paste -sd+ - | bc
I have a file with 100 million numbers. The awk command takes 21s; the paste command takes 41s. But good to meet 'paste' nevertheless!
@Abhi: Interesting :D I guess it would take me 20s to figure out the awk command so it evens out though until I try 100 million and one numbers :D
@George You can leave out the -, though. (It is useful if you wanted to combine a file with stdin).
|
164

The one-liner version in Python:

$ python -c "import sys; print(sum(int(l) for l in sys.stdin))"

or, in a functional formulation rather than using a loop:

$ python -c "import sys; print(sum(map(int, sys.stdin)))"

11 Comments

Above one-liner doesn't work for files in sys.argv[], but that one does stackoverflow.com/questions/450799/…
True- the author said he was going to pipe output from another script into the command and I was trying to make it as short as possible :)
Shorter version would be python -c"import sys; print(sum(map(int, sys.stdin)))"
I love this answer for its ease of reading and flexibility. I needed the average size of files smaller than 10Mb in a collection of directories and modified it to this: find . -name '*.epub' -exec stat -c %s '{}' \; | python -c "import sys; nums = [int(n) for n in sys.stdin if int(n) < 10000000]; print(sum(nums)/len(nums))"
You can also filter out non numbers if you have some text mixed in: import sys; print(sum(int(''.join(c for c in l if c.isdigit())) for l in sys.stdin))
|
130

I would put a big WARNING on the commonly approved solution:

awk '{s+=$1} END {print s}' mydatafile # DO NOT USE THIS!!

that is because in this form awk uses a 32 bit signed integer representation: it will overflow for sums that exceed 2147483647 (i.e., 2^31).

A more general answer (for summing integers) would be:

awk '{s+=$1} END {printf "%.0f\n", s}' mydatafile # USE THIS INSTEAD

11 Comments

Because the problem is actually in the "print" function. Awk uses 64 bit integers, but for some reason print donwscales them to 32 bit.
The print bug appears to be fixed, at least for awk 4.0.1 & bash 4.3.11, unless I'm mistaken: echo -e "2147483647 \n 100" |awk '{s+=$1}END{print s}' shows 2147483747
Using floats just introduces a new problem: echo 999999999999999999 | awk '{s+=$1} END {printf "%.0f\n", s}' produces 1000000000000000000
Shouldn't just using "%ld" on 64bit systems work to not have printf truncate to 32bit? As @Patrick points out, floats aren't a great idea here.
@yerforkferchips, where should %ld be placed in the code? I tried echo -e "999999999999999999" | awk '{s+=$1} END {printf "%ld\n", s}' but it still produced 1000000000000000000.
|
111

With jq:

seq 10 | jq -s 'add' # 'add' is equivalent to 'reduce .[] as $item (0; . + $item)'

4 Comments

Is there a way to do this with rq?
I think I know what could be the next question, so I will add the answer here :) calculate average: seq 10 | jq -s 'add / length' ref here
this is slow. jq -s 'add' is 8x slower than the C version
@milahu adding 1 million values using jq -s add takes 1 second, I can live with that.
98

Plain bash:

$ cat numbers.txt 
1
2
3
4
5
6
7
8
9
10
$ sum=0; while read num; do ((sum += num)); done < numbers.txt; echo $sum
55

4 Comments

@rjack, where is num defined? I believe somehow it is connected to the < numbers.txt expression, but it is not clear how.
@Atcold num is defined in the while expression. while read XX means "use while to read a value, then store that value in XX"
this is slow, 100x slower than the C version
72
dc -f infile -e '[+z1<r]srz1<rp'

Note that negative numbers prefixed with minus sign should be translated for dc, since it uses _ prefix rather than - prefix for that. For example, via tr '-' '_' | dc -f- -e '...'.

Edit: Since this answer got so many votes "for obscurity", here is a detailed explanation:

The expression [+z1<r]srz1<rp does the following:

[   interpret everything to the next ] as a string
  +   push two values off the stack, add them and push the result
  z   push the current stack depth
  1   push one
  <r  pop two values and execute register r if the original top-of-stack (1)
      is smaller
]   end of the string, will push the whole thing to the stack
sr  pop a value (the string above) and store it in register r
z   push the current stack depth again
1   push 1
<r  pop two values and execute register r if the original top-of-stack (1)
    is smaller
p   print the current top-of-stack

As pseudo-code:

  1. Define "add_top_of_stack" as:
    1. Remove the two top values off the stack and add the result back
    2. If the stack has two or more values, run "add_top_of_stack" recursively
  2. If the stack has two or more values, run "add_top_of_stack"
  3. Print the result, now the only item left in the stack

To really understand the simplicity and power of dc, here is a working Python script that implements some of the commands from dc and executes a Python version of the above command:

### Implement some commands from dc
registers = {'r': None}
stack = []
def add():
    stack.append(stack.pop() + stack.pop())
def z():
    stack.append(len(stack))
def less(reg):
    if stack.pop() < stack.pop():
        registers[reg]()
def store(reg):
    registers[reg] = stack.pop()
def p():
    print stack[-1]

### Python version of the dc command above

# The equivalent to -f: read a file and push every line to the stack
import fileinput
for line in fileinput.input():
    stack.append(int(line.strip()))

def cmd():
    add()
    z()
    stack.append(1)
    less('r')

stack.append(cmd)
store('r')
z()
stack.append(1)
less('r')
p()

4 Comments

dc is just the tool of choice to use. But I would do it with a little less stack ops. Assumed that all lines really contain a number: (echo "0"; sed 's/$/ +/' inp; echo 'pq')|dc.
The online algorithm: dc -e '0 0 [+?z1<m]dsmxp'. So we don't save all the numbers on stack before processing but read and process them one by one (to be more precise, line by line, since one line can contain several numbers). Note that empty line can terminate an input sequence.
@ikrabbe that's great. It can actually be shortened by one more character: the space in the sed substitution can be removed, as dc doesn't care about spaces between arguments and operators. (echo "0"; sed 's/$/+/' inputFile; echo 'pq')|dc
this is slow. dc -f - -e '[+z1<r]srz1<rp' is 250x slower than the C version. dc -e '0 0 [+?z1<m]dsmxp' is 15x slower than the C version
55

Pure and short bash.

f=$(cat numbers.txt)
echo $(( ${f//$'\n'/+} ))

8 Comments

This is the best solution because it does not create any subprocess if you replace first line with f=$(<numbers.txt).
any way of having the input from stdin ? like from a pipe ?
@njzk2 If you put f=$(cat); echo $(( ${f//$'\n'/+} )) in a script, then you can pipe anything to that script or invoke it without arguments for interactive stdin input (terminate with Control-D).
@loentar The <numbers.txt is an improvement, but, overall, this solution is only efficient for small input files; for instance, with a file of 1,000 input lines the accepted awk solution is about 20 times faster on my machine - and also consumes less memory, because the file is not read all at once.
i didn't downvote, but wanna note this is a horrific solution - summing up from 1 to 99999 took 26.7 seconds on a machine with M1 Max and bash 5.2.15, versus 0.053 secs on awk using jot, and 0.22 secs generating via another awk. Even summing every integer to 100 mil was only 11.5 seconds, and just 1 min 55secs summing all the way to 1 billion. perl came in just slower than awk
|
47
perl -lne '$x += $_; END { print $x; }' < infile.txt

7 Comments

And I added them back: "-l" ensures that output is LF-terminated as shell `` backticks and most programs expect, and "<" indicates this command can be used in a pipeline.
You are right. As an excuse: Each character in Perl one-liners requires a mental work for me, therefore I prefer to strip as many characters as possible. The habit was harmful in this case.
One of the few solutions that doesn't load everything into RAM.
I find it curious just how undervalued this answer is in comparison with the top-rated ones (that use non-shell tools) -- while it's faster and simpler than those. It's almost the same syntax as awk but faster (as benchmarked in another well-voted answer here) and without any caveats, and it's much shorter and simpler than python, and faster (flexibility can be added just as easily). One needs to know the basics of the language used for it, but that goes for any tool. I get the notion of a popularity of a tool but this question is tool agnostic. All these were published the same day.
(disclaimer for my comment above: I know and use and like Perl and Python, as good tools.)
|
44

I've done a quick benchmark on the existing answers which

  • use only standard tools (sorry for stuff like lua or rocket),
  • are real one-liners,
  • are capable of adding huge amounts of numbers (100 million), and
  • are fast (I ignored the ones which took longer than a minute).

I always added the numbers of 1 to 100 million which was doable on my machine in less than a minute for several solutions.

Here are the results:

Python

:; seq 100000000 | python -c 'import sys; print sum(map(int, sys.stdin))'
5000000050000000
# 30s
:; seq 100000000 | python -c 'import sys; print sum(int(s) for s in sys.stdin)'
5000000050000000
# 38s
:; seq 100000000 | python3 -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 27s
:; seq 100000000 | python3 -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 22s
:; seq 100000000 | pypy -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 11s
:; seq 100000000 | pypy -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 11s

Awk

:; seq 100000000 | awk '{s+=$1} END {print s}'
5000000050000000
# 22s

Paste & Bc

This ran out of memory on my machine. It worked for half the size of the input (50 million numbers):

:; seq 50000000 | paste -s -d+ - | bc
1250000025000000
# 17s
:; seq 50000001 100000000 | paste -s -d+ - | bc
3750000025000000
# 18s

So I guess it would have taken ~35s for the 100 million numbers.

Perl

:; seq 100000000 | perl -lne '$x += $_; END { print $x; }'
5000000050000000
# 15s
:; seq 100000000 | perl -e 'map {$x += $_} <> and print $x'
5000000050000000
# 48s

Ruby

:; seq 100000000 | ruby -e "puts ARGF.map(&:to_i).inject(&:+)"
5000000050000000
# 30s

C

Just for comparison's sake I compiled the C version and tested this also, just to have an idea how much slower the tool-based solutions are.

#include <stdio.h>
int main(int argc, char** argv) {
    long sum = 0;
    long i = 0;
    while(scanf("%ld", &i) == 1) {
        sum = sum + i;
    }
    printf("%ld\n", sum);
    return 0;
}

 

:; seq 100000000 | ./a.out 
5000000050000000
# 8s

Conclusion

C is of course fastest with 8s, but the Pypy solution only adds a very little overhead of about 30% to 11s. But, to be fair, Pypy isn't exactly standard. Most people only have CPython installed which is significantly slower (22s), exactly as fast as the popular Awk solution.

The fastest solution based on standard tools is Perl (15s).

8 Comments

The paste + bc approach was just what I was looking for to sum hex values, thanks!
Just for fun, in Rust: use std::io::{self, BufRead}; fn main() { let stdin = io::stdin(); let mut sum: i64 = 0; for line in stdin.lock().lines() { sum += line.unwrap().parse::<i64>().unwrap(); } println!("{}", sum); }
awesome answer. not to nitpick but it is the case that if you decided to include those longer-running results, the answer would be even more awesome!
@StevenLu I felt the answer would just be longer and thus less awesome (to use your words). But I can understand that this feeling needs not be shared by everybody :)
Next: numba + parallelisation
|
40

My fifteen cents:

$ cat file.txt | xargs  | sed -e 's/\ /+/g' | bc

Example:

$ cat text
1
2
3
3
4
5
6
78
9
0
1
2
3
4
576
7
4444
$ cat text | xargs  | sed -e 's/\ /+/g' | bc 
5148

3 Comments

My input could contain blank lines, so I used what you posted here plus a grep -v '^$'. Thanks!
wow!! your answer is amazing! my personal favorite from all in the tread
Love this and +1 for pipeline. Very simple and easy solution for me
30

Using the GNU datamash util:

seq 10 | datamash sum 1

Output:

55

If the input data is irregular, with spaces and tabs at odd places, this may confuse datamash, then either use the -W switch:

<commands...> | datamash -W sum 1

...or use tr to clean up the whitespace:

<commands...> | tr -d '[[:blank:]]' | datamash sum 1

If the input is large enough, the output will be in scientific notation.

seq 100000000 | datamash sum 1

Output:

5.00000005e+15

To convert that to decimal, use the the --format option:

seq 100000000 | datamash  --format '%.0f' sum 1

Output:

5000000050000000

2 Comments

This works great with my github CLI pagination counting <3
limitation: datamash works only with streams, not with files, so it will use only one CPU core, no parallelization
21

BASH solution, if you want to make this a command (e.g. if you need to do this frequently):

addnums () {
  local total=0
  while read val; do
    (( total += val ))
  done
  echo $total
}

Then usage:

addnums < /tmp/nums

Comments

20

Plain bash one liner

$ cat > /tmp/test
1 
2 
3 
4 
5
^D

$ echo $(( $(cat /tmp/test | tr "\n" "+" ) 0 ))

2 Comments

No cat needed: echo $(( $( tr "\n" "+" < /tmp/test) 0 ))
tr isn't exactly "plain Bash" /nitpick
17

You can using num-utils, although it may be overkill for what you need. This is a set of programs for manipulating numbers in the shell, and can do several nifty things, including of course, adding them up. It's a bit out of date, but they still work and can be useful if you need to do something more.

https://suso.suso.org/programs/num-utils/index.phtml

It's really simple to use:

$ seq 10 | numsum
55

But runs out of memory for large inputs.

$ seq 100000000 | numsum
Terminado (killed)

2 Comments

Example: numsum numbers.txt.
Example with pipe: printf "%s\n" 1 3 5 | numsum
15

Cannot avoid submitting this, it is the most generic approach to this Question, please check:

jot 1000000 | sed '2,$s/$/+/;$s/$/p/' | dc

It is to be found over here, I was the OP and the answer came from the audience:

And here are its special advantages over awk, bc, perl, GNU's datamash and friends:

  • it uses standards utilities common in any unix environment
  • it does not depend on buffering and thus it does not choke with really long inputs.
  • it implies no particular precision limits -or integer size for that matter-, hello AWK friends!
  • no need for different code, if floating point numbers need to be added, instead.
  • it theoretically runs unhindered in the minimal of environments

6 Comments

Please include the code related to the question in the answer and not refer to a link
It also happens to be much slower than all the other solutions, more than 10 times slower than the datamash solution
@GabrielRavier OP doesn't define speed as a first requirement, so in absence of that a generic working solution would be preferred. FYI. datamash is not standard across all Unix platforms, fi. MacOSX appears to be lacking that.
@fgeorgatos this is true, but I just wanted to point out to everyone else looking at this question that this answer is, in fact, very slow compared to what you can get on most Linux systems.
@fgeorgatos After some calculations, I can confirm it is even more than 10 times faster. time seq 10000000 | sed '2,$s/$/+/;$s/$/p/' | dc gives me the correct result in 43 seconds whereas time seq 10000000 | datamash sum 1 does it in 1 second, making it more than 40 times faster. Also, a "compiled assembly program" as you call it, would be much more convoluted a solution, likely not much faster and would be much more likely to give incorrect solutions
|
12

I realize this is an old question, but I like this solution enough to share it.

% cat > numbers.txt
1 
2 
3 
4 
5
^D
% cat numbers.txt | perl -lpe '$c+=$_}{$_=$c'
15

If there is interest, I'll explain how it works.

3 Comments

Please don't. We like to pretend that -n and -p are nice semantic things, not just some clever string pasting ;)
Yes please, do explain :) (I'm not a Perl typea guy.)
Try running "perl -MO=Deparse -lpe '$c+=$_}{$_=$c'" and looking at the output, basically -l uses newlines and both input and output separators, and -p prints each line. But in order to do '-p', perl first adds some boiler plate (which -MO=Deparse) will show you, but then it just substitutes and compiles. You can thus cause an extra block to be inserted with the '}{' part and trick it into not printing on each line, but print at the very end.
12
sed 's/^/.+/' infile | bc | tail -1

Comments

11

The following works in bash:

I=0

for N in `cat numbers.txt`
do
    I=`expr $I + $N`
done

echo $I

2 Comments

Command expansion should be used with caution when files can be arbitrarily large. With numbers.txt of 10MB, the cat numbers.txt step would be problematic.
Indeed, however (if not for the better solutions found here) I would use this one until I actually encountered that problem.
8

Pure bash and in a one-liner :-)

$ cat numbers.txt
1
2
3
4
5
6
7
8
9
10


$ I=0; for N in $(cat numbers.txt); do I=$(($I + $N)); done; echo $I
55

2 Comments

Why are there two (( parenthesis ))?
Not really pure bash due to cat. make it pure bash by replacing cat with $(< numbers.txt)
7

Here's a nice and clean Raku (formerly known as Perl 6) one-liner:

say [+] slurp.lines

We can use it like so:

% seq 10 | raku -e "say [+] slurp.lines"
55

It works like this:

slurp without any arguments reads from standard input by default; it returns a string. Calling the lines method on a string returns a list of lines of the string.

The brackets around + turn + into a reduction meta operator which reduces the list to a single value: the sum of the values in the list. say then prints it to standard output with a newline.

One thing to note is that we never explicitly convert the lines to numbers—Raku is smart enough to do that for us. However, this means our code breaks on input that definitely isn't a number:

% echo "1\n2\nnot a number" | raku -e "say [+] slurp.lines"
Cannot convert string to number: base-10 number must begin with valid digits or '.' in '⏏not a number' (indicated by ⏏)
  in block <unit> at -e line 1

4 Comments

say [+] lines is actually enough :-)
@ElizabethMattijsen: cool! how does that work?
lines without any arguments has the same semantics as slurp without any semantics, but it produces a Seq of Str, rather than a single Str.
See extended discussion (and other variations) here: nntp.perl.org/group/perl.perl6.users/2019/09/msg7027.html
6

Alternative pure Perl, fairly readable, no packages or options required:

perl -e "map {$x += $_} <> and print $x" < infile.txt

2 Comments

or a tiny bit shorter: perl -e 'map {$x += $_} <>; print $x' infile.txt
Memory required is almost 2GB for a large input of 10 million numbers
6

For Ruby Lovers

ruby -e "puts ARGF.map(&:to_i).inject(&:+)" numbers.txt

Comments

5

You can do it in Python:

lines = input();
ints = map(int, lines);
s = sum(ints);
print s;

Sebastian pointed out a one liner script:

cat filename | python -c"from fileinput import input; print sum(map(int, input()))"

A benefit of using fileinput.input is its support both for using the standard input stream as the input, and alternatively for specifying one or more input files as arguments.

5 Comments

python -c"from fileinput import input; print sum(map(int, input()))" numbers.txt
cat is overused, redirect stdin from file: python -c "..." < numbers.txt
@rjack: cat is used to demonstrate that script works both for stdin and for files in argv[] (like while(<>) in Perl). If your input is in a file then '<' is unnecessary.
But < numbers.txt demonstrates that it works on stdin just as well as cat numbers.txt | does. And it doesn't teach bad habits.
@XiongChiamiov If you care so much about habits, using the notation command < file is a bad habit itself. Use < file command instead. Bash one-liners should be easy to read left-to-right from input to output.
4

The following should work (assuming your number is the second field on each line).

awk 'BEGIN {sum=0} \
 {sum=sum + $2} \
END {print "tot:", sum}' Yourinputfile.txt

1 Comment

You don't really need the {sum=0} part
4

C (not simplified)

seq 1 10 | tcc -run <(cat << EOF
#include <stdio.h>
int main(int argc, char** argv) {
    int sum = 0;
    int i = 0;
    while(scanf("%d", &i) == 1) {
        sum = sum + i;
    }
    printf("%d\n", sum);
    return 0;
}
EOF)

3 Comments

I had to upvote the comment. There's nothing wrong with the answer - it's quite good. However, to show that the comment makes the answer awesome, I'm just upvoting the comment.
impressive, this has the same performance as the gcc-compiled C version
nitpick: int should be long. there should be a newline between EOF and )
3
$ cat n
2
4
2
7
8
9
$ perl -MList::Util -le 'print List::Util::sum(<>)' < n
32

Or, you can type in the numbers on the command line:

$ perl -MList::Util -le 'print List::Util::sum(<>)'
1
3
5
^D
9

However, this one slurps the file so it is not a good idea to use on large files. See j_random_hacker's answer which avoids slurping.

Comments

3

One-liner in Racket:

racket -e '(define (g) (define i (read)) (if (eof-object? i) empty (cons i (g)))) (foldr + 0 (g))' < numlist.txt

1 Comment

this is slow, 25x slower than the C version
3

My version:

seq -5 10 | xargs printf "- - %s" | xargs  | bc

1 Comment

Shorter: seq -s+ -5 10 | bc
2

Real-time summing to let you monitor progress of some number-crunching task.

$ cat numbers.txt 
1
2
3
4
5
6
7
8
9
10

$ cat numbers.txt | while read new; do total=$(($total + $new)); echo $total; done
1
3
6
10
15
21
28
36
45
55

(There is no need to set $total to zero in this case. Neither you can access $total after the finish.)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.