0

I want this output written via CSV

['https://www.lendingclub.com/loans/personal-loans' '6.16% to 35.89%'] ['https://www.lendingclub.com/loans/personal-loans' '1% to 6%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.marcus.com/us/en/personal-loans' '6.99% to 24.99%'] ['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

However when I run the code to write the output to CSV I only get the last line written to the CSV file:

['https://www.discover.com/personal-loans/' '6.99% to 24.99%']

Could it be because my printed output is not comma separated? I attempted to circumvent having to put a comma in there by using a space as the delimiter. Let me know your thoughts. Would love some help on this because I am having the hardest time reshaping this collected data.

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        try:
            irate = str(matches[0])
            array = np.asarray(irate)
            array2 = np.append(link,irate)
            array2 = np.asarray(array2)
            print(array2)
            #with open('test.csv', "w") as csv_file:
            #    writer = csv.writer(csv_file, delimiter=' ')
            #    for line in test:
            #        writer.writerow(line)
        except IndexError:
            pass

2 Answers 2

1

When it comes to using csv file, pandas comes handy.

import datetime
import requests as r
from bs4 import BeautifulSoup as bs
import numpy as np
import regex as re
import pandas as pd

plcompetitors = ['https://www.lendingclub.com/loans/personal-loans',
                'https://www.marcus.com/us/en/personal-loans',
                'https://www.discover.com/personal-loans/']

df = pd.DataFrame({'Link':[],'APR Rate':[]})
#cycle through links in array until it finds APR rates/fixed or variable using regex
for link in plcompetitors:
    cdate = datetime.date.today()
    l = r.get(link)
    l.encoding = 'utf-8'
    data = l.text
    soup = bs(data, 'html.parser')
    #captures Discover's rate perfectly but catches too much for lightstream/prosper
    paragraph = soup.find_all(text=re.compile('[0-9]%'))
    for n in paragraph:
        matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
        irate = ''
        try:
            irate = str(matches[0])
            df2 = pd.DataFrame({'Link':[link],'APR Rate':[irate]})
            df = pd.concat([df,df2],join="inner")
        except IndexError:
            pass
df.to_csv('CSV_File.csv',index=False)        

I have stored each link and it's irate value in a data frame df2 and I concatenate it to parent data frame df. At the end, I write parent data frame df to a csv file.

Sign up to request clarification or add additional context in comments.

1 Comment

Clever way of going about this. I kept trying to put everything in the same dataframe in a previous iteration of code and couldn't get it to work. Thanks!
0

I think the problem is that you are opening the file in write-mode (the "w" in open('test.csv', "w")), meaning that Python overwrites what's already written in the file. I think you're looking for append-mode:

# open the file before the loop, and close it after
csv_file = open("test.csv", 'a')             # change the 'w' to an 'a'
csv_file.truncate(0)                         # clear the contents of the file
writer = csv.writer(csv_file, delimiter=' ') # make the writer beforehand for efficiency

for n in paragraph:
    matches = re.findall('(?i)\d+(?:\.\d+)?%\s*(?:to|-)\s*\d+(?:\.\d+)?%', n.string)
    try:
        irate = str(matches[0])
        array = np.asarray(irate)
        array2 = np.append(link,irate)
        array2 = np.asarray(array2)
        print(array2)

        for line in test:
            writer.writerow(line)

    except IndexError:
        pass

# close the file
csv_file.close()

If this doesn't work, please let me know!

3 Comments

Switching it to 'a' worked and is adding all the lines; however, it's now parsing each character with a space between it now. Ex: h t t p s : / / w w w . l e n d i n g c l u b . c o m / l o a n s / p e r s o n a l - l o a n s 6 . 1 6 % " " t o " " 3 5 . 8 9 %
I don’t work with csv files often, but my guess would be that that is caused by the delimiter parameter on line 4. Try removing that and see what happens.
Turning a string into a list will split it into characters, e.g. ' '.join(list('abc')).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.