How to make a nested dictionary from a text file in python?

Question

I have a text file that is structured like so:

SOURCE: RCM
DESTINATIONS BEGIN
JCK SF3
DESTINATIONS END
SOURCE: TRO
DESTINATIONS BEGIN
GFN SF3
SYD SF3 DH4
DESTINATIONS END

I am trying to create a nested dictionary where the resulting dictionary would look like:

handout_routes = {
'RCM': {'JCK': ['SF3']},
'TRO': {'GFN': ['SF3'], 'SYD': ['SF3', 'DH4']}
}

Now this is just a sample of the data but when reading the data we can assume the following: The very first line begins with SOURCE: followed by a three letter IATA airport code. The line after every line that begins with SOURCE: is DESTINATIONS BEGIN. There are one or more lines between DESTINATIONS BEGIN and DESTINATIONS END. After every line with DESTINATIONS BEGIN there is a corresponding line with DESTINATIONS END. The lines between DESTINATIONS BEGIN and DESTINATIONS END start with a three-letter IATA airport code, which is followed by one or more three-character alphaneumeric plane codes. Each code is separated by a space. The lines after DESTINATIONS END will begin with SOURCE:, or you will have reached the end of the file.

So far I've tried

with open ("file_path", encoding='utf-8') as text_data:
    answer = {}
    for line in text_data:
        line = line.split()
        if not line:  # empty line?
            continue
        answer[line[0]] = line[1:]
    print(answer)

But it returns the data like this:

{'SOURCE:': ['WYA'], 'DESTINATIONS': ['END'], 'KZN': ['146'], 'DYU': ['320']}

I think it's how I structured the code to read the file. Any help will be appreciated. It's possible my code is way too simple for what needs to be done with the file. Thank you.

Matthew B · Accepted Answer · 2021-08-09 03:22:40Z

1

Here's a program I wrote that works quite well:

def unpack(file):
  contents:dict = {}
  source:str
  
  for line in file.split('\n'):

    if line[:12] == 'DESTINATIONS':
      pass
    #these lines don't affect the program so we ignore them

    elif not line:
      pass
    #empty line so we ignore it
    
    elif line[:6] == 'SOURCE':
      source = line.rpartition(' ')[-1]
      if source not in contents:
        contents[source] = {}
      
    else:
      idx, *data = line.split(' ')
      contents[source][idx] = list(data)

  return contents
      

with open('file.txt') as file:
  handout_routes = unpack(file.read())
  print(handout_routes)

answered Aug 9, 2021 at 3:22

Matthew B

4071 gold badge4 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Trent Xavier Over a year ago

This is putting me on the right track but it only returns this: {'AER': {}} Perhaps I'm implementing your code incorrectly? What does it return for you?

Matthew B Over a year ago

That's odd, for me it returns {'RCM': {'JCK': ['SF3']}, 'TRO': {'GFN': ['SF3'], 'SYD': ['SF3', 'DH4']}}, exactly the dict you said it should return. Could you show me the file you're trying to open?

Trent Xavier Over a year ago

Sure, it's a .dat file. It's a large dataset so how can I show you?

Trent Xavier Over a year ago

Never mind, it was my implementation that was wrong. This works great! Thanks!

David Culbreth · Accepted Answer · 2021-08-09 04:29:15Z

I know there's already an accepted answer, but I used an approach that may actually help you find the formatting errors in your file, rather than just ignoring the extra bits:

from tokenize import TokenInfo, tokenize, ENCODING, ENDMARKER, NEWLINE, NAME
from typing import Callable, Generator

class TripParseException(Exception):
    pass

def assert_token_string(token:TokenInfo, expected_string: str):
    if token.string != expected_string:
        raise TripParseException("Unable to parse trip file: expected {}, found {} in line {} ({})".format(
            expected_string, token.string, str(token.start[0]), token.line
        ))
def assert_token_type(token:TokenInfo, expected_type: int):
    if token.type != expected_type:
        raise TripParseException("Unable to parse trip file: expected type {}, found type {} in line {} ({})".format(
            expected_type, token.type, str(token.start[0]), token.line
        ))

def parse_destinations(token_stream: Generator[TokenInfo, None, None])->dict:
    destinations = dict()
    assert_token_string(next(token_stream), "DESTINATIONS")
    assert_token_string(next(token_stream), "BEGIN")
    assert_token_type(next(token_stream), NEWLINE)
    current_token = next(token_stream)
    while(current_token.string != "DESTINATIONS"):
        assert_token_type(current_token, NAME)
        destination = current_token.string
        plane_codes = list()
        current_token = next(token_stream)
        while(current_token.type != NEWLINE):
            assert_token_type(current_token, NAME)
            plane_codes.append(current_token.string)
            current_token = next(token_stream)
        destinations[destination] = plane_codes
        # current token is NEWLINE, get the first token on the next line.
        current_token = next(token_stream)


    # Just parsed "DESTINATIONS", expecting "DESTINATIONS END"
    assert_token_string(next(token_stream), "END")
    assert_token_type(next(token_stream), NEWLINE)
    return destinations

def parse_trip(token_stream: Generator[TokenInfo, None, None]):
    current_token = next(token_stream)
    if(current_token.type == ENDMARKER):
        return None, None
    assert_token_string(current_token, "SOURCE")
    assert_token_string(next(token_stream), ":")
    tok_origin = next(token_stream)
    assert_token_type(tok_origin, NAME)
    assert_token_type(next(token_stream), NEWLINE)
    destinations = parse_destinations(token_stream)

    return tok_origin.string, destinations

def parse_trips(readline: Callable[[], bytes]) -> dict:
    token_gen = tokenize(readline)
    assert_token_type(next(token_gen), ENCODING)
    trips = dict()
    while(True):
        origin, destinations = parse_trip(token_gen)
        if(origin is not None and destinations is not None):
            trips[origin] = destinations
        else:
            break

    return trips

Then your implementation would look like this:

import pprint

with open("trips.dat", "rb") as trips_file:
    trips = parse_trips(trips_file.readline)
    pprint.pprint(
        trips
    )

which yields the expected result:

{'RCM': {'JCK': ['SF3']}, 'TRO': {'GFN': ['SF3'], 'SYD': ['SF3', 'DH4']}}

This also is more flexible if you end up wanting to throw other information into your files later.

frogcoder · Accepted Answer · 2021-08-09 06:24:23Z

0

from itertools import takewhile
import re


def destinations(lines):
    if next(lines).startswith('DESTINATIONS BEGIN'):
        dest = takewhile(lambda l: not l.startswith('DESTINATIONS END'), lines)
        yield from map(str.split, dest)


def sources(lines):
    source = re.compile('SOURCE:\s*(\w+)')
    while m := source.match(next(lines, '')):
        yield (m.group(1),
               {dest: crafts for dest, *crafts in destinations(lines)})


handout_routes = {s: d for s, d in sources(open('file_path', encoding='utf-8'))}
print(handout_routes)

answered Aug 9, 2021 at 6:24

frogcoder

1,0031 gold badge8 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to make a nested dictionary from a text file in python?

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related