python pattern recognition in data

Question

I have a large (order of 10k) set of data, let’s say in the form of key-value:

A -> 2
B -> 5
C -> 7
D -> 1
E -> 13
F -> 1
G -> 3
. . .

Also a smaller sample set (order of 10):

X -> 6
Y -> 8
Z -> 14
. . .

Although the values are shifted, the pattern can be found in the original data. What would be the best approach to match or do pattern recognition so that the machine recognizes the corresponding keys in the original data:

X -> B
Y -> C
Z -> E
. . .

I have been reading about TensorFlow and have been doing some exercises, but as a total noob I am not quite sure this is the right tool, or if it is, how exactly to go about the problem.

Thanks for any hints.

I mean shifted, and not exact, i.e. 6.1, 7.9 and 14.001 actually. :-) — xaratustra
– xaratustra, Commented Oct 5, 2016 at 15:25
This course covers the intrinsic patterns in the underlying data/ASCII; the assignments cover breaking Vigenere, one-time pad, padding oracle, cbc-mac, etc. coursera.org/learn/cryptography ; I don't think that's exactly what you're asking but the concepts may be relevant. — ǝɲǝɲbρɯͽ
– ǝɲǝɲbρɯͽ, Commented Oct 5, 2016 at 15:44

MMN · Accepted Answer · 2016-10-05 16:21:20Z

1

First, you need to think about a loss function, i.e. why is solution 1 better than solution 2? Can you come up with an objective score function such that lower scores are always better?

E.g. in your example, is this solution any worse:

X -> C
Y -> C
Z -> E

Once you've defined what you are trying to optimize, we can tell you if tensorflow is the right tool.

answered Oct 5, 2016 at 16:21

MMN

6766 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

xaratustra Over a year ago

good point! I actually forgot to say that the correspondence is one to one, i.e. your suggestion will not be possible. And the thing to be minimized is actually just the arithmetic distance, i.e. simply the closest number. Does that open up some clues?

MMN Over a year ago

Ah, so you just want to minimize the sum of the distances from your 10 points to 10 unique targets in your 10,000 points? Like, you have a string with ten thousand beads on it, and ten flies land on it at their preferred locations on the string, then they each pick the best bead to sit on to maximize overall fly happiness? Tensorflow probably isn't the right tool for that: it's more a (kinda fun) classic search and optimization problem.

xaratustra Over a year ago

Thanks. The analogy is actually nice, but ehm... Imagine you are a treasure hunter in a forest. Your treasure map says, that there is a tree in the forest, 10 feet from which there is another tree, 15 feet from which there is yet another tree. That’s where the treasure is. So you start checking all trees one by one according to that pattern and see if the above distance condition to other trees is met. If not you check another tree and so on. Note that it doesn’t matter how many trees you find in between, important is that there exists one tree at those given distances.

Collectives™ on Stack Overflow

python pattern recognition in data

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related