1

So I have two files, namely listed below, I've just added 2 examples however I have almost 100 lines in file1 which is in the format listed below.

Idea is the variables of file2 is used as a source to one of our internal script, at the moment we are manually populating the data from file1 into file2 as we are manually mapping the values.

I wanted to know if this can be achieved by Bash scripting to identify keywords of file1 and match them to keywords of file2 and append the values of those matched keywords of file1 into the matched keys of file2.

File 1

system-service:10/AF/GH/100-2020
mint-value-daemon:10/GH/KL-19GA1-2020
event-count-svc:10/LL/GH/LL-2020
node-daemon:ABCD_201612_1900_139

File 2

MINT_SERVICE=10/GH/KL-19GA1-2020 //value2 from file1
SYSTEM_SERVICE=10/AF/GH/100-2020 //Value1 from file1
EVENT_SERVICE=svc:10/LL/GH/LL-2020 //value3 from file1
NODE_SERVER_TAG=ABCD_201612_1900_139 //value 4 from file1

NOTE: Just an heads-up the keywords in file1 are usually 2-3 words with hypens, so I was thinking of a partial match of keywords like event from file2 to event from file1 and get the respective value.

3
  • 1
    Would a case-insensitive match on the first word (separated by _/-) work for you? Commented Mar 18, 2020 at 11:45
  • 1
    Also, you've described the expected result for File 2, but how would it be in its initial state? lines of VAR_NAME= with an empty value? If a value is already present, should it be updated based on File 1? Commented Mar 18, 2020 at 11:50
  • match of any case should be file, as I need to only populate data into file2. And in many cases, the file2 is usually having the old data. So yes, we need to overwrite the previous values. Commented Mar 18, 2020 at 11:56

1 Answer 1

1

If file2 needs to be created from scratch:

awk -F: -v OFS='=' '{
    split($1, a, "-")
    print toupper(a[1])"_SERVICE", $2
}' file1 > file2

If file2 already has variables that need to be overwritten:

#! /usr/bin/env bash
awk -F: -v OFS='=' '{
    if (FNR == NR) {
        split($1, a, "-")
        var[toupper(a[1])] = $2
    } else {
        split($0, a, "=")
        split(a[1], b, "_")
        if (var[b[1]] != "") {
            print a[1], var[b[1]]
        } else {
            print
        }
    }
}' file1 file2 > file2_new

mv file2_new file2

Second snippet explaination:

1) awk loops through the input files line by line.
2) While doing so, $0 contains the value of complete line. $1, $2, $3… contain individual : delimited values. We asked : to be delimeter in arg -F:.
3) NR is a variable the contains the line number that it is currently on.
4) As we are inputting 2 files to awk, it's a little different. NR resets to 0 when we go to next file. FNR is a variable that would keep on incrementing regardless of the file it's going through.
5) Conditional check FNR == NR means that we are on the first file.

Taking example from file1, let's say we were on line

mint-value-daemon:10/GH/KL-19GA1-2020
$0 = mint-value-daemon:10/GH/KL-19GA1-2020
$1 = mint-value-daemon
$2 = 10/GH/KL-19GA1-2020


6) In this conditional statement we use split() function to split $1 by - into an array named a.

So basically this line:

split($1, a, "-")

Gives us:

a[1] = mint
a[2] = value
a[3] = daemon

We use an associative array now, named var.

var[toupper(a[1])] = $2

This stores value of $2 in at the key MINT i.e.

var["MINT"] = 10/GH/KL-19GA1-2020

Going so, it stores value of every such element in file1.

When condition FNR == NR goes false i.e. we start traversing through 2nd file, we follow a similar approach.

Remember file separator is still :, and it is not the separator in file2.

So in else:

Let's say we were on line:

MINT_SERVICE=10/GH/KL-19GA1-2020 //value2 from file1

We use:

split($0, a, "=")

Which gives us:

a[1] = MINT_SERVICE
a[2] = 10/GH/KL-19GA1-2020 //value2 from file1

Then on a[1] we again use split based on _.

split(a[1], b, "_")

This gives us:

b[1] = MINT
b[2] = SERVICE

Now:

if (var[b[1]] != "") {
    print a[1], var[b[1]]
} else {
    print
}

In array var in which we stored values when FNR == NR was true,

We check if

var["MINT"] != ""

if this holds true,

print a[1], var[b[1]] 

which prints:

MINT_SERVICE=10/GH/KL-19GA1-2020

The reason for = is the -v OFS='=' which is the separator of print.

in the else:

print

which is same as

print $0

i.e. if this value if not a key in var print it without any modification.


All this output goes to file2_new.

In the end we rename file2_new to file2.

Sign up to request clarification or add additional context in comments.

2 Comments

if you observe the line 4 which i added in both the files, this should give you an idea that only the word prior to the first 'underscore' needs to be matched and only value needs to be appended in file2. also can you explain the logic for your 2 code, i can also give it a hand.
@theborngeek, I have corrected that issue and explained it. HTH

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.