Python: from http to csv

Question

I need to write a string from HTTP into CSV file. My columns have to be: LATITUDE,LONGITUDE, OSM_ID, HIGHWAY, UPDATED_AT

This is a sample of the output of HTTP link starting from top:

{
  "datetime": "2018-06-08T08:26:09.375Z",
  "success": true,
  "bbox": {
    "xmin": "12.335513",
    "ymin": "42.035682",
    "xmax": "12.758896",
    "ymax": "42.050826"
  },
  "data": [
    {
      "aggregate_id": 30201274,
      "ppe": 0.316954620298806,
      "geom": {
        "type": "Point",
        "coordinates": [
          12.532972800901,
          42.045435384225
        ]
      },
      "osm_id": "37015042",
      "highway": "motorway",
      "updated_at": "2018-01-20T03:27:11.047Z"
    },
    {
      "aggregate_id": 30201275,
      "ppe": 0.318124963244448,
      "geom": {
        "type": "Point",
        "coordinates": [
          12.5329908742,
          42.045615145535
        ]
      },
      "osm_id": "37015042",
      "highway": "motorway",
      "updated_at": "2018-01-20T03:27:11.047Z"
    },
    {
      "aggregate_id": 30201276,
      "ppe": 0.204792151096739,
      "geom": {
        "type": "Point",
        "coordinates": [
          12.533008947499,
          42.045794906844
        ]
      },
      "osm_id": "37015042",
      "highway": "motorway",
      "updated_at": "2018-01-20T03:27:11.047Z"
    },
    {
      "aggregate_id": 30201277,
      "ppe": 0.194797261691664,
      "geom": {
        "type": "Point",
        "coordinates": [
          12.533030586679,
          42.045974206816
        ]
      },
      "osm_id": "37015042",
      "highway": "motorway",
      "updated_at": "2018-01-20T03:27:11.047Z"
    }
  ]
}

Each row is separated by ','.

I wrote this code

import pandas as pd
import csv
import urllib.request

from urllib.request import urlopen

CSV_URL = 'http://www.smartroadsense.it/bb/12.335513/42.035682/12.758896/42.050826'

request = urllib.request.Request(CSV_URL)
response = urllib.request.urlopen(request)
response.read().decode('utf-8')

#write into csv
colNames = ["longitude","latitude","ppe","osm_id","highway","updated_at"]
data = pd.read_csv(CSV_URL, names=colNames, sep=',')

The problem is how to split the string from http into rows. Someone can help me?

The question has nothing to do with machine-learning - kindly do not spam the tag (edited & removed). — desertnaut
– desertnaut, Commented Jun 8, 2018 at 10:55
Your source string's format is not CSV, it's JSON. Use json.load() to parse it into a Python dict, then build your csv from this dict (you don't need panda for this, the stdlib's csv module will be enough). — bruno desthuilliers
– bruno desthuilliers, Commented Jun 8, 2018 at 11:13

Ashish Acharya · Accepted Answer · 2018-06-08 11:18:51Z

2

Here's a simple implementation using pandas and requests:

import pandas as pd
import requests

url = 'http://www.smartroadsense.it/bb/12.335513/42.035682/12.758896/42.050826'
response = requests.get(url).json()

df = pd.DataFrame(response['data'])

You can extract the longitude and latitude from the geom column like this:

df['longitude'] = df.apply(lambda row: dict(row['geom'])['coordinates'][0], axis=1)
df['latitude'] = df.apply(lambda row: dict(row['geom'])['coordinates'][1], axis=1)

Finally, save the required columns to a csv file like so:

df[['latitude', 'longitude', 'osm_id', 'highway', 'updated_at']].to_csv('output.csv', index=False)

edited Jun 8, 2018 at 11:18

answered Jun 8, 2018 at 11:13

Ashish Acharya

3,4091 gold badge19 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Manuel Over a year ago

Thanks for your time! I have another question: could you explain me the following instruction, please? 'lambda row: dict(row['geom'])['coordinates'][0]', in particular I didn't understand how the program can read each row.

Ashish Acharya Over a year ago

the apply function (with axis=1) applies some transformation to all the rows of the DataFrame.

Sacha · Accepted Answer · 2018-06-08 11:25:19Z

1

Here is another way to do it :

import pandas as pd
import urllib.request
from urllib.request import urlopen
CSV_URL = "http://www.smartroadsense.it/bb/12.335513/42.035682/12.758896/42.050826"

request = urllib.request.Request(CSV_URL)
response = urllib.request.urlopen(request)
test = json.loads(response.read())["data"]

def transform_row(x):
  x["longitude"] = x["geom"]["coordinates"][0]
  x["latitude"] = x["geom"]["coordinates"][1]
  del x["geom"]
  return x

res = list(map(transform_row, test))
pd.DataFrame(res)[['latitude', 'longitude', 'osm_id', 'highway', 'updated_at']].to_csv('output.csv', index=False)

edited Jun 8, 2018 at 11:25

answered Jun 8, 2018 at 11:19

Sacha

2703 silver badges10 bronze badges

Comments

Shahar · Accepted Answer · 2018-06-08 11:46:24Z

A straightforward approach.

Fetch the data
Order the data
Export to CSV

import csv
import requests


def main():
    url = 'http://www.smartroadsense.it/bb/12.335513/42.035682/12.758896/42.050826'
    data = fetch_data(url)
    rows = [row(r) for r in data]
    fields = ['LATITUDE', 'LONGITUDE', 'OSM_ID', 'HIGHWAY', 'UPDATED_AT']
    with open('output.csv', "w") as output:
        writer = csv.writer(output, lineterminator='\n')
        writer.writerows([fields])
        writer.writerows(rows)


def row(data):
    r = list()
    r.append(data['geom']['coordinates'][0])  # LATITUDE
    r.append(data['geom']['coordinates'][1])  # LONGITUDE
    r.append(data['osm_id'])  # OSM_ID
    r.append(data['highway'])  # HIGHWAY
    r.append(data['updated_at'])  # UPDATED_AT
    return r


def fetch_data(url):
    response = requests.get(url).json()
    return response['data']

RoadRunner · Accepted Answer · 2018-06-08 12:31:33Z

Adding to other answers, here is another way of doing it with csv.writer:

from requests import get
from csv import writer

CSV_URL = 'http://www.smartroadsense.it/bb/12.335513/42.035682/12.758896/42.050826'

response = get(CSV_URL).json()

headers = ['LATITUDE', 'LONGITUDE', 'OSM_ID', 'HIGHWAY', 'UPDATED_AT']
columns = ["osm_id", "highway", "updated_at"]

with open('output.csv', 'w') as out:
    csv_writer = writer(out)
    csv_writer.writerow(headers)
    for row in response['data']:
        items = row['geom']['coordinates'] + [row[col] for col in columns]
        csv_writer.writerow(items)

print(open('output.csv').read())

Which outputs this csv file:

LATITUDE,LONGITUDE,OSM_ID,HIGHWAY,UPDATED_AT
12.532972800901,42.045435384225,37015042,motorway,2018-01-20T03:27:11.047Z
12.5329908742,42.045615145535,37015042,motorway,2018-01-20T03:27:11.047Z
12.533008947499,42.045794906844,37015042,motorway,2018-01-20T03:27:11.047Z
12.533030586679,42.045974206816,37015042,motorway,2018-01-20T03:27:11.047Z
...

Collectives™ on Stack Overflow

Python: from http to csv

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related