173 questions
0
votes
1
answer
70
views
fuzzy_join of two dataframes based on ZIP codes
I am trying to fuzzyjoin two dataframes. Both contain the column with ZIP codes and some other columns. However, in the parental dataframe there are more ZIP codes than in the secondary one. I would ...
1
vote
1
answer
82
views
How to merge two datasets based on different time interval conditions?
I have two different datasets and would like to join Dataset2 to Dataset1.
In Dataset1, there are different CBs in each day, and there are IV 1 to 3 in each CB.
In Dataset2, there are different time ...
1
vote
1
answer
352
views
Fuzzy and exact matching using Arrow and Duckdb R
I have a large dataset of over 43 million rows and 3.84 GB and another dataset of over 6000 rows and 459 KB. I am trying to do an inner_join() based on two columns: One exact column based on a common ...
0
votes
1
answer
637
views
How to do fuzzy merge with 2 large pandas dataframes?
I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
0
votes
2
answers
505
views
How can I parse out city and country information from a messy and non-uniform partial address column in an R data frame?
I have a very large data that contains a very messy and not uniform address field. I only care to extract a country name out of it. Most of the records contain country and city and some contain other ...
0
votes
0
answers
49
views
Fuzzyjoin two datframes
I have two data frames where I want to apply fuzzyjoin in R. I have written the code like this.
library(tidyverse)
library(fuzzyjoin)
library(readxl)
ex_hotels<-readRDS("expedia_hotels....
2
votes
3
answers
103
views
Rewrite Time interval Fuzzy join to be less memory intensive
This question is expanding on this post: Pairing Time series Data with Batch Data in R
A good solution was given to me that worked for the dputs I provided but the problem is that my dataset is quite ...
0
votes
1
answer
80
views
How can I filter one dataframe based on two columns of another dataframe where one column is the exact match and the other one is a substring match?
I have two dataframes, both have a Last_Name column. First dataframe has a column Contains_First_Name and the second has a column called First_Name. I want to join the two on the exact spelling of ...
0
votes
1
answer
72
views
Join two tables by index range but table length increased
I'm trying to follow this answer that join two tables with range: https://stackoverflow.com/a/46341899/6636572
I want to join two tables where one has some ranges and another are numbers and I want to ...
1
vote
1
answer
271
views
How to maximize R fuzzyjoin/stringdist speed and memory efficiency
I have 2 data frames containing short (length == 20) sequences that I want to compare with string distance analysis techniques, returning highly similar sequences with a hamming distance of no greater ...
2
votes
2
answers
216
views
Merge 2 data frames by the columns that do not match exactly
I have 2 data frames. I am trying to merge/join them together while specifying how I want rows to align. Mock data below.
df <- data.frame(Race = c("White", "NHPI", "AA&...
0
votes
1
answer
63
views
How to merge two dataframes on two variables - the first an exact match on factor variable, the second a fuzzy match for numeric variable
Here are the dataframes
library(dplyr)
set.seed(123)
id <- rep(c("A", "B", "C"), each = 5)
score <- sample(1:50, 15)
label <- paste(sample(LETTERS, 15 * 5, ...
0
votes
0
answers
53
views
Assign Id to fuzzy match name in new table - R
I have two tables. Table one has an id column and a full_name column. Table two has only a full name column but the names are near-matches and not full matches. I would like to apply the id column to ...
0
votes
1
answer
49
views
Is there an R function that joins a key that is contained within another key
I am trying to join two tables based on a code created within each table that identifies a prescribed drug. The problem is that the drug code sometimes has additional numbers at the end in one table. ...
1
vote
2
answers
274
views
R: How to left join two tables based on fuzzy matching strings that are not exactly the same
I am trying to left join table 1 'Person Name' to table 2 'Name' and get the values from the Work Group column in Table 2
df1 <- read.table(text="
Person_Name
PEREZ, MINDY
PEREZ, ABA
CLARKE, ...
0
votes
1
answer
91
views
reverse table order in R fuzzy anti join match_fun
I am trying to run this code :
main_df %>%
fuzzy_anti_join(secondary_df, match_fun = list(`==`, `%within%`),
by = c("ID","Date" = "Date_Interval"))...
0
votes
0
answers
53
views
Left join data frames by group and interval
I need to interval_left_join two dataframes by groups (the grouping variable is File), but using this code I get this error:
library(BiocManager)
library(fuzzyjoin)
df1 %>%
group_by(File) ...
1
vote
1
answer
186
views
How to group similar spelled character strings together?
I have a table of 10,000 unique names. Using the package(fuzzyjoin) I would like to match these unique names to names that are only spelled with one different letter. I would like to group the ...
0
votes
2
answers
101
views
R join two data.table with with exact on one column and fuzzy on second
I am working with two data.tables,
predicted yields over age based on a variety of stand condition
field measurements of yields at a particular field location, with a measured age
I would like to ...
0
votes
1
answer
205
views
Table joins with conditional "fuzzy" string matching in R
I'm attempting to join two tables, one is a smaller table with a column of names of common food items (e.g. "Corn", "Peppers", "Squash"...etc...), and the other is a ...
1
vote
1
answer
163
views
Return anti-join of two data frames with values outside a certain percentage difference
I would like to compare two mixed-type data frames and return the rows that are different between them--but I would like numeric values to only be returned within a certain percentage.
tbl1 <- ...
1
vote
1
answer
504
views
Fixing fuzzyjoin error message: vector memory exhausted
I'm trying to join two data sets using fuzzy matching through the stringdist_left_join function from the library fuzzy join, but I keep getting the error message "Error: vector memory exhausted (...
0
votes
0
answers
302
views
Fuzzy Matching player names in R
In R, I have two dataframes, one with full names and one with abbreviated names, I want to dplyr join them to see which one has a flag.
However, it is very hard to get matched names, even when I match ...
1
vote
2
answers
1k
views
Joining dataframes on text strings using fuzzy string matching (stringdist_join())
I'm trying to join two datasets on based on the values of two variables. Both datasets have the same variable names/number of columns but may have a different number of rows. I want to join them based ...
0
votes
0
answers
61
views
stringdist_join not merging data
I have three data frames that need to be merged. There are a few small differences between the competitor names in each data frame. For instance, one name might not have a space between their middle ...
1
vote
0
answers
35
views
Data consolidation and cleaning using fuzzy string comparisons with -matchit- command
I have two databases, one designated data and another data1 (reference), where I want to compare the codes of each data designation and data2, I have to do it by writing the designations, if they are ...
1
vote
2
answers
149
views
Join tables with inexact match in R. Only match if a whole word matches
I have a problem that can be reproduced in the following way:
library(tidyverse)
a <- tibble(navn=c("Oslo kommune", "Oslo kommune", "Kommunen i Os", "Kommunen i ...
2
votes
1
answer
525
views
Why is fuzzyjoin slower than data.table in R
When I want to join two data frames based on two intervals, I prefer to use the fuzzyjoin package because it is easy to read in my opinion. But when I need to work with large datasets, the fuzzyjoin ...
2
votes
2
answers
72
views
Inexact joining data based on greater equal condition
I have some values in
df:
# A tibble: 7 × 1
var1
<dbl>
1 0
2 10
3 20
4 210
5 230
6 266
7 267
that I would like to compare to a second dataframe called
value_lookup
# A ...
2
votes
1
answer
980
views
Join with closest value between two values in R
I was working in the following problem. I've got monthly data from a survey, let's call it df:
df1 = tibble(ID = c('1','2'), reported_value = c(1200, 31000), anchor_month = c(3,5))
ID ...
0
votes
1
answer
68
views
How to merge 2 dataframes with partial character strings?
i have a dataset that lists several possible genera of plants, and another dataset that lists all the species with their functional forms. I would like to merge these datasets in such a way that IF ...
0
votes
1
answer
272
views
dplyr::full_join two data frames with part-match in the "by" argument in R
I would like to join two data sets that look like the following data sets. The matching rule would be that the Item variable from mykey matches the first part of the Item entry in mydata to some ...
0
votes
1
answer
44
views
Fuzzy join on substring dask
I have two data frames with columns of interest 'ParseCom', which is the left index of this fuzzy join, and 'REF' which should be a substring of 'ParseCom' during a join.
This is iterating over the ...
2
votes
0
answers
233
views
confused about multi_by and multi_match_fun in R fuzzy_join
Can someone help me understand what "multi_by" and "multi_match_fun" actually do in comparison to "by" and "match_fun" in the R package fuzzyjoin?
This is from ...
1
vote
1
answer
130
views
interval join with extra key
I would like to do an interval join with an additional key. The simplest way in dplyr is quite slow
intervalDf <- tibble(id = rep(seq(1, 100000, 1), 10),
k1 = rep(seq(1, 1000, ...
1
vote
1
answer
672
views
regex_left_join (fuzzyjoin) not working as expected
I am trying to perform a join in R based on a regex pattern from one table. From what I understand, the fuzzyjoin package should be exactly what I need, but I can't get it to work. Here is an example ...
-1
votes
2
answers
809
views
How to merge based on a string in a column?
I would like to do exact joins for the columns state and name, but a fuzzy join for the "name" and "versus" columns:
year <- c("2002", "2002", "1999&...
0
votes
3
answers
2k
views
R - Fuzzy Inner Join on two fields, matching to a date range
I'm fairly new to R, and have been sifting through other questions all morning trying to figure this out, but can't find anything related enough or my knowledge of R is not good enough to understand ...
0
votes
2
answers
208
views
Merge two data frames in R by variable that is regular expression in one and string in other
I have two data frames I would like to merge
a<- data.frame(x=c(1,4,6,8,1,6,7,2),ID=c("132","14.","732","2..","132","14.","732",...
0
votes
0
answers
59
views
Using fuzzy join to insert one column from dataframe to another dataframe and match by a column in btoh dataframes
I have been trying to use the fuzzy join package to join the "conservation status" column from the con_filtered_report_groups data frame to the report_groups_order dataframe that has the ...
3
votes
2
answers
90
views
Joining two datasets by (non-uniform) names
I need to join two datasets and the only identifier in both are the company names. For example:
db1 <- tibble(
Company = c('Bombardier Inc.','Honeywell Development Corp','The Pepsi Bottling Group ...
2
votes
1
answer
607
views
Fuzzy matching two data frames
I want to merge two data frames df1 and df2.
df1<-tibble(x=c("FIDELITY FREEDOM 2015 FUND", "VANGUARD WELLESLEY INCOME FUND"),y=c(1,2))
df2<-tibble(x=c("FIDELITY ...
1
vote
1
answer
89
views
Partial matching in R
Is there a way I can partially match the two data frames in R?
df1<-data.frame("FIDELITY FREEDOM 2015 FUND", "ID")
df2<-data.frame("FIDELITY ABERDEEN STREET TRUST: ...
0
votes
1
answer
65
views
Complex join between two dataframes
I am working on a very advanced join of dataframes that is complex for me. I would like to ask you for some help if possible. I have two dataframes, df1 and df2 which I include at the end as dput(). ...
1
vote
1
answer
166
views
fuzzy joining a column with a list
The data is as follows:
library(fuzzyjoin)
nr <- c(1,2)
col2 <- c("b","a")
dat <- cbind.data.frame(
nr, col2
)
thelist <- list(
aa=c(1,2,3),
bb=c(1,2,3)
)
I would ...
0
votes
0
answers
135
views
Have R warn me when a match from Fuzzy Join is too far off
I previously asked a question here about how to use R to automatically "spellcheck" a big list of department names before I export a file and send it off. (Same data can be used as ...
1
vote
2
answers
511
views
"fuzzy" inner_join in dplyr to keep both rows that do AND not exactly match
I am working with two datasets that I would like to join based not exact matches between them, but rather approximate matches. My question is similar to this OP.
Here are examples of what my two ...
4
votes
1
answer
410
views
Return multiple possible matches when fuzzy joining two dataframes or vectors in R if they share a word in common
Is there a way of joining two dataframes via where a row in the first dataframe is joined with every row in the second dataframe if they share a word in common?
For example:
companies1 <- data....
2
votes
3
answers
2k
views
Join two dataframes on one column that contains substring of other
I am trying to left-join df2 onto df1.
df1 is my dataframe of interest, df2 contains additional information I need.
Example:
#df of interest onto which the other should be joined
key1 <- c("...
1
vote
2
answers
217
views
Match two tables based on a time difference criterium
I have a data table (lv_timest) with time stamps every 3 hours for each date:
# A tibble: 6 × 5
LV0_mean LV1_mean LV2_mean Date_time Date
<dbl> <dbl> <...