Which is faster? Use 1 thread and loop on the whole array by it or use 4 threads and loop the whole row by each thread.
This depends highly on what needs to be done to each element in the array. If it is a processor bound operation that takes a while to run, then you can get some performance improvements by processing each element in its own thread (or by setting up a fixed thread thread-pool and submitting each element as its own task). Otherwise as @kan mentioned, you probably won't get any improvement.
I want just to print each element!
If the processing of each element is mostly an IO operation then you are going to be limited by IO and not CPU. In this case your program is not going to run any faster if you run each element's print operation in a separate thread.
which is better as a multi-core concept (performance - memory) !?
Performance is going to be the same. A single threaded solution will use less memory because only one thread is going to be used but the memory increase with multiple threads (at least with only 16) is relatively minimal.