0

I am looking for some high level advice on how to get started with the following requirements:

We have a ruby sinatra API service that is running on heroku that syncs users email with our system.

We store the users emails in a postgres database that is broken into subject and text and html fields.

I want to use elasticsearch to search these emails but the search must only search for emails that are in the users inbox.

Can anyone give me my first steps in how to index the postgres emails table and also how to filter the search so that it is confined to only the users emails?

The schema for the emails table is:

CREATE TABLE emails
(
  id serial NOT NULL,
  subject text,
  body text,
  personal boolean,
  sent_at timestamp without time zone,
  created_at timestamp without time zone,
  updated_at timestamp without time zone,
  addresses text,
  account_id integer NOT NULL,
  sender_user_id integer,
  sender_contact_id integer,
  html text,
  folder text,
  draft boolean DEFAULT false,
  check_for_response timestamp without time zone,
  send_time timestamp without time zone,
  send_time_jid text,
  check_for_response_jid text,
  message_id text,
  in_reply_to text,
  CONSTRAINT emails_pkey PRIMARY KEY (id),
  CONSTRAINT emails_account_id_fkey FOREIGN KEY (account_id)
      REFERENCES accounts (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT emails_sender_contact_id_fkey FOREIGN KEY (sender_contact_id)
      REFERENCES contacts (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT emails_sender_user_id_fkey FOREIGN KEY (sender_user_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
)
2
  • what's the schema for the email database? Commented Dec 5, 2013 at 17:28
  • @kielni I have updated the question with the schema Commented Dec 5, 2013 at 20:54

1 Answer 1

2

It sounds like the only fields you care about for search purposes are body, account_id, and folder. You could always add more if needed (for example, it may be useful to index the dates to enable date range searches). The folder name should not be analyzed, so that Elasicsearch won't apply stemming to it, and you can do a term (exact match) filter to retrieve only emails in a specific folder.

Here's a mapping that includes just those three fields:

{
  "email" : {
    "properties" : {
      "account_id" : { "type" : "integer" },
      "body" : { "type" : "string" },
      "folder" : { "type" : "string", "index" : "not_analyzed" },
      "id" : { "type" : "integer" }
    }
  }
}

Here's how you can search. Use term filters to restrict the results to a specific folder ("Inbox") and to a specific user (account_id=123). Use the query string to find specific words, phrases, etc.

{
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "term": {
              "folder": "Inbox"
            }
          },
          {
            "term": {
              "account_id": 111
            }
          }
        ]
      },
      "query": {
        "query_string": {
          "query": "one"
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.