2

I want to extract data in elasticsearch

and my function is like:

##Using regex to get the image name.
#it is inefficient to fetch them one by one using  doc['hits']['hits'][n]['_source']['docker_image_short_name']
#because thousands of documents are stored per images
regex = "docker_image_short_name': u'(.+?)'"
pattern=re.compile(regex)
query={
        "query":{
            "bool":{ "must":[{"range":{"@timestamp":{"gt":vulTime}}}] }
        }
    }
page = es.search(index='crawledframe-*', body = query, scroll='1m', size=1000)
sid = page['_scroll_id']
num_page = page['hits']['total']

imglist=[]
while num_page > 0:
    print num_page
    print vulTime
    imgs = re.findall(pattern, str(page))
    imglist += imgs

    page = es.scroll(scroll_id = sid, scroll = '1m')
    num_page = len(page['hits']['hits'])

imglist = list(set(imglist))#remove duplicaton

And I want to extract only "docker_image_short_name"

But, I got the error (with print result):

num_page: 2327261
vulTime : 0001-01-01
Traceback (most recent call last):
  File "test.py", line 68, in <module>
    worker_main()
  File "test.py", line 63, in worker_main
    imgnames = recent_crawl_index(es, vulTime)
  File "test.py", line 45, in recent_crawl_index
    page = es.scroll(scroll_id = sid, scroll = '1m')
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 1024, in scroll
    params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: <exception str() failed>

I don't know why occur this error, because I use same logic at other code

and es.search() didn't occur error...

1 Answer 1

8

It seems you are using the wrong version of Elasticsearch DSL.

What you need to do is the following:

  • Check your elasticsearch version curl -XGET 'localhost:9200'
  • You should then match your elasticsearch version with the compatable version of Elasticsearch DSL. For example, if your Elasticsearch version is 1.x do the following:

    -pip uninstall elasticsearch-dsl

    -pip install "elasticsearch-dsl<2.0.0"

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.