1

I'm playing with some new JavaScript features like async/await and generators. I have function readPages with signature

async function* readPages(....): AsyncIterableIterator<string> {}

and I want to concat result of this function with some delimiter. Here is how I'm doing it now

let array = new Array<string>();

for await (const page of readPages(...))
    array.push(page);

let result = array.join(pagesDelimiter);

That's pretty verbose I think. Can it be done better?

Here is full code for reference

import * as fs from 'fs';
import { PDFJSStatic, PDFDocumentProxy } from 'pdfjs-dist';
const PDFJS: PDFJSStatic = require('pdfjs-dist');
PDFJS.disableWorker = true;

async function* readPages(doc: PDFDocumentProxy, wordsDelimiter = '\t'): AsyncIterableIterator<string> {
    for (let i = 1; i <= doc.numPages; i++) {
        const page = await doc.getPage(i);
        const textContent = await page.getTextContent();
        yield textContent.items.map(item => item.str).join(wordsDelimiter);
    }
}

async function pdfToText(filename: string, pagesDelimiter = '\n', wordsDelimiter = '\t') {
    const data = new Uint8Array(fs.readFileSync(filename));
    const doc = await PDFJS.getDocument(data);

    const array = new Array<string>();

    for await (const page of readPages(doc, wordsDelimiter))
        array.push(page);

    return array.join(pagesDelimiter);
}

pdfToText('input.pdf').then(console.log);
6
  • 4 lines (1 declarations, one loop, with array push, and one last for concatenation) is... verbose ? I don't see repetition of code, I don't see a bug ball of mud or a plate full of incomprehensible entangled spaghetti. I don't understand what are you trying to "refactor" exactly ? What are you planning to improve? Commented Sep 15, 2018 at 11:23
  • Something like readPages(...).join(delimiter) :-) Commented Sep 15, 2018 at 11:24
  • 1
    So you want a more "functional" code instead of a plain loop. It makes me think automatically about rxjs Observables. You might end up with more lines, but with a functional approach. Commented Sep 15, 2018 at 11:28
  • I just think, this is good candidate for avoiding for loops. If writing ['a', 'b', 'c'].join('_') is preferable than for loop, this should be too (if solution exist). RxJS can't operate with Promises (or maybe I'm wrong) and it is another lib. I'm looking for native JavaScript solution. Commented Sep 15, 2018 at 11:38
  • RxJS can operate with promise (Observable.fromPromise() ;) ) . It can operate with Iterable as well (Observable.from) . Not sure how to mix both right now, though. Commented Sep 15, 2018 at 11:45

1 Answer 1

1

OK, I'm playing with that code little bit more and I think it is not currently possible to handle this task better than with for-await-of loop. But, you can hide that loop behind prototyped function...

declare global {
    interface AsyncIterableIterator<T> {
        toPromise(): Promise<T[]>;
    }
}

(async function* (): any {})().constructor.prototype.toPromise = async function<T>(this: AsyncIterableIterator<T>): Promise<T[]> {
    let result = new Array<T>();

    for await (const item of this)
        result.push(item);

    return result;
};

so my code

const array = new Array<string>();

for await (const page of readPages(...))
    array.push(page);

const result = array.join(pagesDelimiter);

becomes

const array = await readPages(...).toPromise();
const result = array.join(pagesDelimiter);

Yeah, and I'm aware, that prototyping is questionable. But it was interesting, how to prototype async iterator :-).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.