4

I'm a chemist and very new to programming. I try to write programs to make my life easier when handling data. After scouring StackOverflow all day, I was finally able to write a short python script that parses a text file containing blocks of similar data separated by a blank line. My code works well, but it doesn't parse the last block. I'm not sure why. I tried searching for an answer but couldn't find one that helped.

In a typical text file, there are 361 blocks of data each containing information to construct a molecule in 3-D space with a different torsion angle for a set of four atoms. Here is an example of a text file I have tried parsing which only includes the first two blocks.

!Coordinate: -51.45857  Energy: *****
6 0.006074 0.000915 0.000760
6 0.003070 -0.004811 1.496641
6 1.065644 -0.015789 2.367841
6 2.500078 -0.010542 1.993114
6 3.043633 -0.885454 1.109936
6 2.319723 -2.061360 0.571949
6 1.651211 -3.009615 1.308815
16 0.964940 -4.223294 0.280714
6 1.598121 -3.476004 -1.156548
6 2.300403 -2.353600 -0.830192
1 2.774538 -1.713316 -1.566133
6 1.370973 -4.039010 -2.492108
6 2.306097 -3.847669 -3.514857
6 2.051238 -4.378854 -4.772466
7 0.959825 -5.084236 -5.080872
6 0.075629 -5.271691 -4.098835
6 0.226680 -4.776825 -2.808825
1 -0.547454 -4.952070 -2.067650
1 -0.811208 -5.846075 -4.358490
1 2.771093 -4.237936 -5.576037
1 3.231185 -3.312215 -3.327250
6 1.484740 -3.110171 2.791981
1 2.271126 -2.537323 3.291578
1 0.521994 -2.699519 3.116631
1 1.545489 -4.149268 3.130100
6 4.425208 -0.728995 0.567929
6 5.293981 -1.825349 0.536092
6 6.575924 -1.699782 0.012540
6 7.002467 -0.480078 -0.506308
6 6.138453 0.611969 -0.498798
6 4.860426 0.488085 0.033453
1 4.189564 1.341843 0.040929
1 6.459401 1.563510 -0.912065
1 8.000697 -0.382509 -0.922563
1 7.242127 -2.557541 0.005802
1 4.957135 -2.781274 0.928240
6 3.298894 1.044689 2.682189
6 2.806965 2.352662 2.756428
6 3.525634 3.346796 3.410575
6 4.740700 3.044040 4.018965
6 5.230033 1.741208 3.969123
6 4.514468 0.749369 3.308424
1 4.901734 -0.264238 3.270300
1 6.171693 1.494548 4.450468
1 5.300110 3.817950 4.536063
1 3.131670 4.358007 3.451132
1 1.851909 2.586965 2.294231
6 0.644628 0.032167 3.735978
6 -0.708788 0.041750 3.903716
16 -1.501825 0.018225 2.355367
6 -1.460523 0.074589 5.163238
6 -0.916630 -0.463354 6.334489
6 -1.645855 -0.393694 7.514376
7 -2.861426 0.150339 7.612820
6 -3.380262 0.652483 6.490232
6 -2.733763 0.643955 5.260195
1 -3.211536 1.093615 4.394957
1 -4.369681 1.095963 6.579511
1 -1.232419 -0.806908 8.432018
1 0.055022 -0.946356 6.323493
1 1.348290 0.078304 4.560069
1 -0.126732 -1.007882 -0.406234
1 -0.790297 0.637669 -0.396423
1 0.964526 0.378020 -0.366958

!Coordinate: -52.45859  Energy: *****
6 0.016006 0.016117 -0.001167
6 0.008091 0.004202 1.494640
6 1.068924 -0.017801 2.367520
6 2.503392 -0.009246 1.992562
6 3.048080 -0.887580 1.113704
6 2.322345 -2.062968 0.576734
6 1.653555 -3.010561 1.314091
16 0.963790 -4.222595 0.286393
6 1.595670 -3.475347 -1.151441
6 2.300257 -2.354228 -0.825550
1 2.774156 -1.714212 -1.561877
6 1.365619 -4.037046 -2.487061
6 2.299829 -3.846714 -3.510831
6 2.042363 -4.376373 -4.768547
7 0.949180 -5.079357 -5.076134
6 0.065835 -5.265841 -4.093142
6 0.219443 -4.772314 -2.802916
1 -0.554143 -4.946542 -2.060928
1 -0.822495 -5.838195 -4.352173
1 2.761473 -4.236208 -5.572914
1 3.226175 -3.313192 -3.323941
6 1.489754 -3.111703 2.797517
1 2.276398 -2.538063 3.295797
1 0.527124 -2.702199 3.123917
1 1.552391 -4.150812 3.135284
6 4.429609 -0.733119 0.571119
6 5.297405 -1.830292 0.541209
6 6.579288 -1.706863 0.016976
6 7.006698 -0.488561 -0.504432
6 6.143617 0.604255 -0.498839
6 4.865654 0.482526 0.034036
1 4.195506 1.336862 0.039937
1 6.465258 1.554673 -0.914138
1 8.004858 -0.392683 -0.921241
1 7.244728 -2.565225 0.011647
1 4.959792 -2.785154 0.935276
6 3.299443 1.049518 2.679214
6 2.802410 2.355625 2.752465
6 3.517994 3.353520 3.404255
6 4.735140 3.056448 4.011255
6 5.229631 1.755519 3.962371
6 4.517166 0.759897 3.304042
1 4.908495 -0.252167 3.266846
1 6.172959 1.513307 4.442708
1 5.292160 3.833294 4.526530
1 3.119965 4.363158 3.444095
1 1.845729 2.585480 2.291417
6 0.646126 0.024358 3.735114
6 -0.707598 0.038379 3.900610
16 -1.498235 0.027194 2.350780
6 -1.461222 0.067452 5.159060
6 -0.920246 -0.476558 6.328874
6 -1.650941 -0.410067 7.508021
7 -2.865451 0.136232 7.607012
6 -3.381624 0.644001 6.485729
6 -2.733382 0.639158 5.256592
1 -3.209059 1.093310 4.392548
1 -4.370246 1.089176 6.575392
1 -1.239629 -0.827839 8.424551
1 0.050370 -0.961638 6.317360
1 1.348629 0.063341 4.560569
1 -0.118307 -0.990869 -0.412258
1 -0.776594 0.657152 -0.398933
1 0.977453 0.391011 -0.363519

Each block contains the following information:

  1. Header line containing the torsion angle.
  2. Each line after the header line contains 4 columns: atomic number, x, y, z

I need to do the following to each block:

  1. Extract the torsion angle. Delete line after extracting the torsion angle.
  2. Change each atomic number to the corresponding element.
  3. Write a separate *.xyz file which has the element instead of the atomic number and number of atoms at the top.

Here is a sample of my code:

import os
import re

#I just paste the file path for now. And change \ to \\ 
filepath = os.path.normpath("file.xyz") 

#Dictionary for atomic number and element
replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'} 

#Open read and write files
originalFile = open(filepath, 'r') 
writeEditedFile = open('output_all(edited).txt', 'w')
readEditedFile = open('output_all(edited).txt', 'r')

#Replace atomic numbers with element symbol
for lines in originalFile:
    writeEditedFile.write(re.sub('(^\d+)', lambda m: replacements[m.group()], lines)) 

#Extract torsion angle and append to array
with open('output_all(edited).txt', 'r') as wEF: 
    torsionAngles = []
    for line in wEF:
        if '!' in line:
            for number in line.split():
                try:
                    torsionAngles.append(str(float(number)))
                except ValueError:
                    pass

#Write each line into a new file until a blank line
#The file is closed and a new one is opened
#This should continue until the last block
with readEditedFile as rEF:
    record = 0
    separateFile = open('Step_' + str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
    separateFile.write('64 \n \n')
    for lines in rEF:
        if lines == "\n":
            record += 1
            separateFile.close()
            separateFile = open('Step_'+ str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
            separateFile.write('64 \n \n')
        else:
            if '!' in lines:
                lines = ''
            else:
                separateFile.write(lines)

Sorry for the sloppy code! Here is an example of the first two files it outputs:

Filename: Step_1_TorsionAngle_-51.45857.xyz

64 

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114
C 3.043633 -0.885454 1.109936
C 2.319723 -2.061360 0.571949
C 1.651211 -3.009615 1.308815
S 0.964940 -4.223294 0.280714
C 1.598121 -3.476004 -1.156548
C 2.300403 -2.353600 -0.830192
H 2.774538 -1.713316 -1.566133
C 1.370973 -4.039010 -2.492108
C 2.306097 -3.847669 -3.514857
C 2.051238 -4.378854 -4.772466
N 0.959825 -5.084236 -5.080872
C 0.075629 -5.271691 -4.098835
C 0.226680 -4.776825 -2.808825
H -0.547454 -4.952070 -2.067650
H -0.811208 -5.846075 -4.358490
H 2.771093 -4.237936 -5.576037
H 3.231185 -3.312215 -3.327250
C 1.484740 -3.110171 2.791981
H 2.271126 -2.537323 3.291578
H 0.521994 -2.699519 3.116631
H 1.545489 -4.149268 3.130100
C 4.425208 -0.728995 0.567929
C 5.293981 -1.825349 0.536092
C 6.575924 -1.699782 0.012540
C 7.002467 -0.480078 -0.506308
C 6.138453 0.611969 -0.498798
C 4.860426 0.488085 0.033453
H 4.189564 1.341843 0.040929
H 6.459401 1.563510 -0.912065
H 8.000697 -0.382509 -0.922563
H 7.242127 -2.557541 0.005802
H 4.957135 -2.781274 0.928240
C 3.298894 1.044689 2.682189
C 2.806965 2.352662 2.756428
C 3.525634 3.346796 3.410575
C 4.740700 3.044040 4.018965
C 5.230033 1.741208 3.969123
C 4.514468 0.749369 3.308424
H 4.901734 -0.264238 3.270300
H 6.171693 1.494548 4.450468
H 5.300110 3.817950 4.536063
H 3.131670 4.358007 3.451132
H 1.851909 2.586965 2.294231
C 0.644628 0.032167 3.735978
C -0.708788 0.041750 3.903716
S -1.501825 0.018225 2.355367
C -1.460523 0.074589 5.163238
C -0.916630 -0.463354 6.334489
C -1.645855 -0.393694 7.514376
N -2.861426 0.150339 7.612820
C -3.380262 0.652483 6.490232
C -2.733763 0.643955 5.260195
H -3.211536 1.093615 4.394957
H -4.369681 1.095963 6.579511
H -1.232419 -0.806908 8.432018
H 0.055022 -0.946356 6.323493
H 1.348290 0.078304 4.560069
H -0.126732 -1.007882 -0.406234
H -0.790297 0.637669 -0.396423
H 0.964526 0.378020 -0.366958

Filename: Step_2_TorsionAngle_-52.45859.xyz

64 

C 0.016006 0.016117 -0.001167
C 0.008091 0.004202 1.494640
C 1.068924 -0.017801 2.367520
C 2.503392 -0.009246 1.992562
C 3.048080 -0.887580 1.113704
C 2.322345 -2.062968 0.576734
C 1.653555 -3.010561 1.314091
S 0.963790 -4.222595 0.286393
C 1.595670 -3.475347 -1.151441
C 2.300257 -2.354228 -0.825550
H 2.774156 -1.714212 -1.561877
C 1.365619 -4.037046 -2.487061
C 2.299829 -3.846714 -3.510831
C 2.042363 -4.376373 -4.768547
N 0.949180 -5.079357 -5.076134
C 0.065835 -5.265841 -4.093142
C 0.219443 -4.772314 -2.802916
H -0.554143 -4.946542 -2.060928
H -0.822495 -5.838195 -4.352173
H 2.761473 -4.236208 -5.572914
H 3.226175 -3.313192 -3.323941
C 1.489754 -3.111703 2.797517
H 2.276398 -2.538063 3.295797
H 0.527124 -2.702199 3.123917
H 1.552391 -4.150812 3.135284
C 4.429609 -0.733119 0.571119
C 5.297405 -1.830292 0.541209
C 6.579288 -1.706863 0.016976
C 7.006698 -0.488561 -0.504432
C 6.143617 0.604255 -0.498839
C 4.865654 0.482526 0.034036
H 4.195506 1.336862 0.039937
H 6.465258 1.554673 -0.914138
H 8.004858 -0.392683 -0.921241
H 7.244728 -2.565225 0.011647
H 4.959792 -2.785154 0.935276
C 3.299443 1.049518 2.679214
C 2.802410 2.355625 2.752465
C 3.517994 3.353520 3.404255
C 4.735140 3.056448 4.011255
C 5.229631 1.755519 3.962371
C 4.517166 0.759897 3.304042
H 4.908495 -0.252167 3.266846
H 6.172959 1.513307 4.442708
H 5.292160 3.833294 4.526530
H 3.119965 4.363158 3.444095
H 1.845729 2.585480 2.291417
C 0.646126 0.024358 3.735114
C -0.707598 0.038379 3.900610
S -1.498235 0.027194 2.350780
C -1.461222 0.067452 5.159060
C -0.920246 -0.476558 6.328874
C -1.650941 -0.410067 7.508021
N -2.865451 0.136232 7.607012
C -3.381624 0.644001 6.485729
C -2.733382 0.639158 5.256592
H -3.209059 1.093310 4.392548
H -4.370246 1.089176 6.575392
H -1.239629 -0.827839 8.424551
H 0.050370 -0.961638 6.317360
H 1.348629 0.063341 4.560569
H -0.118307 -0.990869 -0.412258
H -0.776594 0.657152 -0.398933
H 0.977453 0.391011 -0.363519

The simple code does what I want it to do for every block except the last one! Any suggestions or tips would be greatly appreciated! Thanks for reading my post!

1 Answer 1

1

This script will read the sample input data (as written in question) from file.txt and writes two files Step_1_TorsionAngle_-51.45857.xyz and Step_2_TorsionAngle_-52.45859.xyz:

import re

replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'}

with open('file.txt', 'r') as f_in:
    data = f_in.read()

torsion_angles = re.findall(r'!Coordinate:\s+(.*?)\s+Energy', data)
blocks = [b.splitlines() for b in re.findall(r'^(\d.*?)(?=\s*!|\Z)', data, flags=re.DOTALL|re.M)]

for step, (angle, block) in enumerate(zip(torsion_angles, blocks), 1):
    with open('Step_{}_TorsionAngle_{}.xyz'.format(step, angle), 'w') as f_out:
        f_out.write(str(len(block)) + '\n\n')
        lines = [' '.join([replacements[s[0]], *s[1:]]) for s in [v.split() for v in block]]
        f_out.write('\n'.join(lines))

The content of files are like this:

64

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114

...etc.
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! This does everything I needed! It does give me an error though which I assume is because it's trying to parse the non-existent 362 block. lines = [' '.join([replacements[s[0]], *s[1:]]) for s in [v.split() for v in block]] IndexError: list index out of range Is there any way to suppress this error? I'm not sure if this will be a problem when constructing a GUI out of this. Regardless, I get every file now! I can live with the error. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.