I have a complex interpreter reading in commands from (sometimes) multiples files (the exact details are out of scope) but it requires iterating over these multiple files (some could be GB is size, preventing nice buffering) multiple times.
I am looking to increase the speed of reading in each command from a file.
I have used the RDTSC (program counter) register to micro benchmark the code enough to know about >80% of the time is spent reading in from the files.
Here is the thing: the program that generates the input file is literally faster than to read in the file in my small interpreter. i.e. instead of outputting the file i could (in theory) just link the generator of the data to the interpreter and skip the file but that shouldn't be faster, right?
What am I doing wrong? Or is writing suppose to be 2x to 3x (at least) faster than reading from a file?
I have considered mmap but some of the results on http://lemire.me/blog/archives/2012/06/26/which-is-fastest-read-fread-ifstream-or-mmap/ appear to indicate it is no faster than ifstream. or would mmap help in this case?
details:
I have (so far) tried adding a buffer, tweaking parameters, removing the ifstream buffer (that slowed it down by 6x in my test case), i am currently at a loss for ideas after searching around.
The important section of the code is below. It does the following:
- if data is left in buffer, copy form buffer to memblock (where it is then used)
- if data is not left in the buffer, check to see how much data is left in the file, if more than the size of the buffer, copy a buffer sized chunk
if less than the file
//if data in buffer if(leftInBuffer[activefile] > 0) { //cout <<bufferloc[activefile] <<"\n"; memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16); bufferloc[activefile]+=16; leftInBuffer[activefile]-=16; } else //buffers blank { //read in block long blockleft = (cfilemax -cfileplace) / 16 ; int read=0; /* slow block starts here */ if(blockleft >= MAXBUFELEMENTS) { currentFile->read((char *)(&(buffer[activefile][0])),16*MAXBUFELEMENTS); leftInBuffer[activefile] = 16*MAXBUFELEMENTS; bufferloc[activefile]=0; read =16*MAXBUFELEMENTS; } else //read in part of the block { currentFile->read((char *)(&(buffer[activefile][0])),16*(blockleft)); leftInBuffer[activefile] = 16*blockleft; bufferloc[activefile]=0; read =16*blockleft; } /* slow block ends here */ memcpy(memblock,(buffer[activefile])+bufferloc[activefile],16); bufferloc[activefile]+=16; leftInBuffer[activefile]-=16; }
edit: this is on a mac, osx 10.9.5, with an i7 with a SSD
Solution:
as was suggested below, mmap was able to increase the speed by about 10x.
(for anyone else who searches for this) specifically open with:
uint8_t * openMMap(string name, long & size)
{
int m_fd;
struct stat statbuf;
uint8_t * m_ptr_begin;
if ((m_fd = open(name.c_str(), O_RDONLY)) < 0)
{
perror("can't open file for reading");
}
if (fstat(m_fd, &statbuf) < 0)
{
perror("fstat in openMMap failed");
}
if ((m_ptr_begin = (uint8_t *)mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, m_fd, 0)) == MAP_FAILED)
{
perror("mmap in openMMap failed");
}
uint8_t * m_ptr = m_ptr_begin;
size = statbuf.st_size;
return m_ptr;
}
read by:
uint8_t * mmfile = openMMap("my_file", length);
uint32_t * memblockmm;
memblockmm = (uint32_t *)mmfile; //cast file to uint32 array
uint32_t data = memblockmm[0]; //take int
mmfile +=4; //increment by 4 as I read a 32 bit entry and each entry in mmfile is 8 bits.
std::outand the interpreter read fromstd::inand pipe the two.