Clojure getting data from db, transform and print into console

Question

I have the following task.
I need to create a console application which takes one param which is the number of data to generate. The data is person address and name. I create a table adress with state, city, zip-code fields. I also create a table with first and last name columns. I use HugSQL to deal with PostgreSQL. So I want to dynamically mix addresses, first and last name and print such the result into the console, the number of generated values depends on the argument passed to the application. This is my code:

(ns project.core
  (:require
    [project.db.get :as get]))

(defn parse-int [s]
  (Integer. (re-find  #"\d+" s )))

(def usa-data (get/usa))

(defn usa-adress-getter []
  (let [data (into {} (shuffle  usa-data))
        city (get data :city)
        state (get data :state)
        zip (get data :zip_code)]
    (str state " " city " " zip)))

(defn repeater [times]
  (dotimes [i times]
    (println (usa-adress-getter))))

(defn -main [value]
  (repeater (parse-int value)))

Here I just check the result of usa-adress-getter function. But the time of the evaluation of function is too big, i have limit which is 1 million values in 1 minutes. How to increase the speed of the evaluation? Function (get/usa) retrieve all data from adress table.

your usa-address-getter looks strange. does it even work properly? It really shouldn't, because you shadow clojure.core/get with project.db/get. Please check the code — leetwinski
– leetwinski, Commented Oct 11, 2017 at 5:28
also (into {} (shuffle usa-data)) looks suspicious, since usa-data should return a sequence of records, so adding it to the map looks like a nonsence. Maybe it should be (into {} (first (shuffle usa-data)))? Anyhow thos seebs to be the key to low performance: you eagerly shuffle million items on every iteration and it is really slow (about 250ms on my machine). I would advice you to go with rand-nth: (into {} (rand-nth usa-data)) — leetwinski
– leetwinski, Commented Oct 11, 2017 at 5:33
also if it is not mandatory to print records one by one, it may be better to construct the whole collection and then just print it once.. like (clojure.pprint/pprint (repeatedly (parse-int value) usa-address-getter)) and throw the repeater funtion away — leetwinski
– leetwinski, Commented Oct 11, 2017 at 5:47
usa-address-getter funtion work fine, i have just separate code from db retrieve and put in get.clj. This file contains path to SQL and function (defn usa [] (all-usa db/dbspec)) — Andrei Belkevich
– Andrei Belkevich, Commented Oct 11, 2017 at 8:38
could you please provide the usa-data example, and desired output — leetwinski
– leetwinski, Commented Oct 11, 2017 at 10:13

Wout Neirynck · Accepted Answer · 2017-10-11 09:44:31Z

It's difficult to say from this piece of code where the performance bottlenecks are, but here are some tips:

Use type hints for hotspots. Sometimes the Clojure compiler can't figure out the types, and then type hints can speed things up a lot. In this case, you could set it on the usa-address-getter fn: (defn ^String usa-adress-getter [] ...)
You could consider modifying your query so that it returns the concatenated string from the database (using an SQL function like concat). That way, you won't need to get the values from the hash and build the string yourself.
Printing stuff may be slow, so you could eliminate that, like @leetwinsky says.
You have to measure in detail the performance of your code, otherwise you won't be able to tell if a modification has gained you some speed. For instance, you could put a timer that prints the number msecs for every 1000 records processed. An of course, only do one change at a time.

Hope this helps.

Collectives™ on Stack Overflow

Clojure getting data from db, transform and print into console

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related