Skip to main content
Filter by
Sorted by
Tagged with
Advice
0 votes
4 replies
87 views

I am working with a university faculty salary dataset where the same person appears across many years, but their name strings are inconsistent. The dataset has about 8,000 unique people and years from ...
Mengyang Cao's user avatar
-1 votes
2 answers
167 views

I'd like to process some input queries in 3 possible ways: query: select * from People query: select * from People exclude addresses query: select * from People include department I have two regex1 ...
DayaMoon's user avatar
  • 364
0 votes
2 answers
87 views

I'm working with many tabular datasets (Excel, CSV) that contain inconsistent or messy column names due to typos, different naming conventions, spacing, punctuation, etc. I have a standard schema (as ...
Ste347789's user avatar
0 votes
1 answer
56 views

In SLES15 SP6 on x86_64 I'm using a bash script and expect-5.45.4 to do automated program testing. Basically I'm checking whether the program to test (./pwg.pl) outputs a specific string. Starting to ...
U. Windl's user avatar
  • 4,748
-2 votes
1 answer
116 views

I'm working with two datasets for German NUTS-3 level regions: A shapefile from Eurostat via the giscoR package: > library(giscoR) > nuts3_germany <- gisco_get_nuts(country = "Germany&...
Saïd Maanan's user avatar
4 votes
4 answers
169 views

Let's say I want to match any sequence of the hash sign # at the start of a string; so I'd want to match ## here: local mystr = "##First line\nSecond line\nThird line" ... and ### here: ...
sdbbs's user avatar
  • 5,948
2 votes
3 answers
123 views

I have a column in Pandas DataFrame(Names) with a large collection of names. I have another DataFrame(Title) text column and in between text, the names in Name frame are there. What would be the ...
Totura's user avatar
  • 167
2 votes
0 answers
88 views

This question is a little complicated, so I try to describe it through an example. First, we get a string foo, and put it into collection S. Then we get a string sample, and put it into S too. Next, ...
differentrain's user avatar
1 vote
1 answer
71 views

I have a database with three columns: name, occupation, and organization. In these columns, I have duplicates with slightly different names. For example, Anne Sue Frank and Anne S. Frank refer to the ...
Vitoria Sanchez's user avatar
0 votes
2 answers
86 views

savvy people, I will have participants of an event sign up where they, aside from their personal details, also provide a duo partners name or leave that blank. So, I will have two columns, ...
Lex Plantenga's user avatar
1 vote
3 answers
94 views

I have a large pandas DataFrames like below. import pandas as pd import numpy as np df = pd.DataFrame( [ ("1", "Dixon Street", "Auckland"), ("2&...
Totura's user avatar
  • 167
0 votes
1 answer
90 views

this seems like it should be an easy problem to solve, but I've been battling with it and cannot seem to find a solution. I have two dataframes of different sizes and different column names. I am ...
Rose_Trojan's user avatar
1 vote
1 answer
79 views

I'm trying to write a regex that matches every occurrence of some_function(...), but it should not match when it's part of an object method like my.some_function(...) or if it is a substring of ...
JVS's user avatar
  • 2,682
2 votes
2 answers
88 views

Do Kotlin's List/Array data structures have a findSublist method analogous to String.indexOf(CharSequence), that takes a List/Array/Sequence to match against the list?
tpdi's user avatar
  • 35.3k
1 vote
0 answers
78 views

What I'm trying to do is find and correct similar names in my database, like 'Patrick Maxwell' and 'Patrick Maxwel.' However, the issue I'm facing is that the best match for each name is often itself, ...
Kauan Randall Oliveira Ferreir's user avatar
1 vote
1 answer
273 views

I wish to list only the signed certificates for our application and not the chain signing certificate from a java store i.e <jdk_home>/jre/lib/security/cacerts or any such JKS store. The idea ...
Ashar's user avatar
  • 3,195
1 vote
1 answer
2k views

I have a string that is returned from an api call , the string is something like ".\controllers\myaction c:\test\path" I want to use Powershell to check if the string contains c:\ ...
Kate's user avatar
  • 61
0 votes
1 answer
123 views

Working with irregular Excel tables, I am trying to match questions by looking at a string in a column in a dataframe and if it is a close match to my target string, score the % match. The way I tried ...
Arthur D. Howland's user avatar
0 votes
2 answers
80 views

I have a DataFrame with a column of publisher names that contains various minor variations of the same publisher. For example, entries such as "Harlequin Romance", "Harlequin Blaze"...
Claudine U's user avatar
1 vote
2 answers
360 views

I have a function that is used in a script that I am writing to remove redundant blocking keywords from a list. Basically, with the input (in any order) of: {"apple", "bapple", &...
user26571886's user avatar
0 votes
1 answer
62 views

Major edit: Apparently it is difficult to understand my question, so I'll do my best to concretize it. I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
Calle Flygare's user avatar
0 votes
2 answers
136 views

I have a SortedDictionary<string, string>, ordered by key length descending, of the form: red fox - address1 weasel - address2 foxes - address3 fox - address3 etc. and a list of phrases e.g. &...
alexb's user avatar
  • 33
0 votes
2 answers
75 views

I have searched high and low and nobody seems to have asked that exact question, so I'm at loss. I have a data frame with a couple columns. One of this column contains various sentences that don't ...
Laue28's user avatar
  • 1
0 votes
1 answer
67 views

I have a dataframe as the following, showing the relationship of different entities in each row. Child Parent Ult_Parent Full_Family A032 A001 A039 A001, A032, A039, A040, A041, A043, A043, A045, A046 ...
L H's user avatar
  • 27
2 votes
6 answers
143 views

I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is. stringa <- c("apple_tart", &...
Katie Helm's user avatar
1 vote
1 answer
353 views

Hi I recently came across an interesting question and had a hard time trying to optimize it beyond O(N*N!). Here is the question: Given a string, return the number of possible combination that satisfy ...
Zi Ming's user avatar
  • 13
2 votes
2 answers
341 views

Goal: I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python. I'd also like to know the location of these occurrences in the longer string. To define ...
Franck Dernoncourt's user avatar
1 vote
1 answer
95 views

I'm testing fuzzywuzzy's process.extractBests() as follows: from fuzzywuzzy import process # Define the query string query = "Apple" # Define the list of choices choices = ["Apple&...
Franck Dernoncourt's user avatar
0 votes
0 answers
85 views

I'm working on a problem involving string matching where I need to compute the similarity scores for each prefix of a string C against another string S. The similarity score for a prefix P of C and S ...
NatsumiStar's user avatar
-2 votes
2 answers
101 views

How can I count number of cells in a column that contains partial text I want the result to become 6 since text AB (A and B) can be found from all those rows except Row 4 that has only C in it. COL ...
Harvey's user avatar
  • 137
0 votes
1 answer
637 views

I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
L H's user avatar
  • 27
1 vote
0 answers
138 views

I have a paragraph: In today's world, keeping your personal information safe online is more important than ever. With cyber-attacks on the rise, having a strong cybersecurity strategy is essential. ...
Manoj Kamble's user avatar
2 votes
3 answers
171 views

I have 2 dataframes that captured the hierarchy of the same dataset. Df1 is more complete compared to Df2, so I want to use Df1 as the standard to analyze if the hierarchy in Df2 is correct. However, ...
L H's user avatar
  • 27
-1 votes
1 answer
77 views

Master dataframe filled with a specific match's players and statistics. 34 columns and variable number of rows. Column "Player" has full names Player Goals Assists Dominic Calvert-Lewin 1 1 ...
filipakous's user avatar
-1 votes
1 answer
92 views

Reading a text file with the format: e2c=["(vsim-86)" ,'kkk', "pppp", "bbbbbb", #"old", "uio", " sds # sds", #"old2", " sds #...
taquionbcn's user avatar
0 votes
1 answer
76 views

I have two dataframes: df1 is based on survey responses and includes a non-restricted field for users to add their location in the UK (or refuse to do so) formatted as so (not real data): Name ...
Edward Blackburn's user avatar
0 votes
0 answers
65 views

I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows: # concise version of the function def match_strings(...
NIDHI SHASTRY's user avatar
-2 votes
1 answer
58 views

I have a Python function, match_strings, which is designed to match names from two different data sources. Here is the function definition: python def match_strings(strings1, strings2, ngram_n=2, ...
Rahul T's user avatar
1 vote
1 answer
62 views

As my question indicates, I would like to convert a vector of strings into a new vector one of two values that appears in every string. Here is an example of a very simple data frame I have: data <-...
jdenn0514's user avatar
  • 109
0 votes
1 answer
217 views

I am trying to filter a list of properties based on multiple keywords (e.g. "Cool Interior," "Terrace/Patio"). Here's a basic interpretation: The range I want to filter is on a ...
John Lane's user avatar
0 votes
0 answers
704 views

I'm trying to create a code to see if my predictions for games and the actual result of the games are the same. I was going to create a point value, like March Madness has, but I can't actually get ...
Dixon Gerber's user avatar
3 votes
1 answer
179 views

I have programmed an Aho-Corasick algorithm with a transition table that searches for a set of words in a text and displays the number of occurrences by using malloc(), but I am encountering this ...
Zazou's user avatar
  • 31
1 vote
1 answer
321 views

Been trying to use thefuzz to compare two different lists, and got the above error, which doesn't seem right. I've commented everything else out in my code except the below two test lines and still ...
user avatar
0 votes
1 answer
478 views

I have a list of words which I am searching in a pdf document using fitz in python The code generally works for most of the words except for a few like "efficiency" My code is given below : ...
vani's user avatar
  • 29
0 votes
0 answers
34 views

PS C:\Users\Administrator> $string = "hello world" PS C:\Users\Administrator> $string -ilike "hello" False the above is outputing false, and not true. not sure what I am ...
ctappy's user avatar
  • 177
0 votes
0 answers
118 views

I am just looking at various algorithm's efficiency. Not just big O efficiency, but practical efficiency. Anyway i was testing a Rabin Karp algorithm i wrote against a brute force string comparison ...
Alex's user avatar
  • 43
0 votes
2 answers
103 views

I am trying to join several messy datasets together without using "fuzzy matching". In the core dataset (example dataset1 below), I have simple names for companies. In the datasets I would ...
lyd-m's user avatar
  • 3
-1 votes
2 answers
74 views

I need to compare two columns which are in resulting data frame and those two columns are coming from a separate sources. Now, I would like to compare them and have a resulting (tag) column based on ...
sebekkg's user avatar
  • 17
1 vote
1 answer
577 views

I have a table with address1, city, state, and postal code. However, some address1 will also contains city, state and postal code (separated by either comma or space or both). Example: Address1: 9999 ...
shano's user avatar
  • 23
-1 votes
3 answers
234 views

Trying to strip server name from: //some.server.name/path/to/a/dir (finishing with /path/to/a/dir) I have tried 3 different regexes (hardcoded works), but the other two look like they should work but ...
Andy Knipp's user avatar

1
2 3 4 5
47