Writing Printed Output to CSV - Numpy

Question

I want this output written via CSV

['https://www.lendingclub.com/loans/personal-loans' '6.16% to 35.89%'] ['https://www.lendingclub.com/loans/personal-loans' '1% to 6%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

However when I run the code to write the output to CSV I only get the last line written to the CSV file:

['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

Could it be because my printed output is not comma separated? I attempted to circumvent having to put a comma in there by using a space as the delimiter. Let me know your thoughts. Would love some help on this because I am having the hardest time reshaping this collected data.

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        try:
            irate = str(matches[0])
            array = np.asarray(irate)
            array2 = np.append(link,irate)
            array2 = np.asarray(array2)
            print(array2)
            #with open('test.csv', "w") as csv_file:
            #    writer = csv.writer(csv_file, delimiter=' ')
            #    for line in test:
            #        writer.writerow(line)
        except IndexError:
            pass

arunppsg · Accepted Answer · 2018-08-02 00:29:44Z

1

When it comes to using csv file, pandas comes handy.

import datetime
import requests as r
from bs4 import BeautifulSoup as bs
import numpy as np
import regex as re
import pandas as pd

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

df = pd.DataFrame({'Link':[],'APR Rate':[]})
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        irate = ''
        try:
            irate = str(matches[0])
            df2 = pd.DataFrame({'Link':[link],'APR Rate':[irate]})
            df = pd.concat([df,df2],join="inner")
        except IndexError:
            pass
df.to_csv('CSV_File.csv',index=False)

I have stored each link and it's irate value in a data frame df2 and I concatenate it to parent data frame df. At the end, I write parent data frame df to a csv file.

answered Aug 2, 2018 at 0:29

arunppsg

1,60219 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

wavey Over a year ago

Clever way of going about this. I kept trying to put everything in the same dataframe in a previous iteration of code and couldn't get it to work. Thanks!

burgerhex · Accepted Answer · 2018-08-01 23:47:30Z

0

I think the problem is that you are opening the file in write-mode (the "w" in open('test.csv', "w")), meaning that Python overwrites what's already written in the file. I think you're looking for append-mode:

# open the file before the loop, and close it after
csv_file = open("test.csv", 'a')             # change the 'w' to an 'a'
csv_file.truncate(0)                         # clear the contents of the file
writer = csv.writer(csv_file, delimiter=' ') # make the writer beforehand for efficiency

for n in paragraph:
    matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
    try:
        irate = str(matches[0])
        array = np.asarray(irate)
        array2 = np.append(link,irate)
        array2 = np.asarray(array2)
        print(array2)

        for line in test:
            writer.writerow(line)

    except IndexError:
        pass

# close the file
csv_file.close()

If this doesn't work, please let me know!

answered Aug 1, 2018 at 23:47

burgerhex

1,0481 gold badge10 silver badges24 bronze badges

3 Comments

wavey Over a year ago

Switching it to 'a' worked and is adding all the lines; however, it's now parsing each character with a space between it now. Ex: h t t p s : / / w w w . l e n d i n g c l u b . c o m / l o a n s / p e r s o n a l - l o a n s 6 . 1 6 % " " t o " " 3 5 . 8 9 %

burgerhex Over a year ago

I don’t work with csv files often, but my guess would be that that is caused by the delimiter parameter on line 4. Try removing that and see what happens.

hpaulj Over a year ago

Turning a string into a list will split it into characters, e.g. ' '.join(list('abc')).

Collectives™ on Stack Overflow

Writing Printed Output to CSV - Numpy

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related