Algorithm for largest word formed from perodic table elements
I want to write an algorithm for the following problem scenario
Taking periodic table elements' names, find largest word that can be formed? The symbols such as Na
, Ne
etc should be regarded as single elements.
This was asked in a reputed company's job interview. Can anybody please help me out solve this.
How to express a given word as a chemical compound? Here's a dynamic programming solution. The important line is "Let progress[i] be the solution to the subproblem for word[:i]". The rest follows.
elements = "H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No Lr Rf Db Sg Bh Hs Mt Ds Rg Cn Uut Fl Uup Lv Uus Uuo".split()
def decompose(word):
"""Express given word as chemical compound. If there are multiple solutions, return one of minimal weight."""
progress = [False for x in range(len(word)+1)] # solution for word[:i]
progress[0] = []
for i in range(1, len(word)+1):
possibles = list()
for j in range(max(i-3,0), i):
if progress[j] == False:
continue
alchemical = word[j:i].title()
if alchemical in elements:
possibles.append(progress[j] + [alchemical])
if possibles:
# choose minimal solution
progress[i] = min(possibles, key=len)
if progress[-1] == False:
return False
return "".join(progress[-1])
assert decompose("sine") == "SiNe" # rather than S-I-Ne
assert decompose("bismuth") == "BiSmUTh"
assert decompose("jam") == False
https://gist.github.com/hickford/750135db633832913703
Ran this on a whole dictionary, the longest compounds were:
NoNRePReSeNTaTiONaLiSm
ThErMoPHOsPHoReSCeNCe
BrONCHoEsOPHAgOsCoPY
HYPErPHOsPHoReSCeNCe
HYPSiBrAcHYCePHAlISm
I think a better way is to check every word in a dictionary and see if it can be made from the names of the elements. To check every permutation of the elements would be harder to program and would be less efficient.
While I agree it is easier to produce the combinations, there are just way too many of them, and as you said tend to infinity if you don't give a limit. The production of the word with the symbols would be slightly more difficult and finicky, but I don't think it would be too difficult.
Eg when you get a word you could search through the elements looking for elements that could make up your word, then using that set of elements try and fill in the letters from start to finish. Obviously this gets harder with elements that aren't 2 letters and words that are odd in length.
You can actually use a sort of graph. Say you have 'silicon'. You can start with the letter 'S' or 'SI' which are both in the table. From there choose 'SI' because it is closer to your solution. If 'SI' does not lead to your solution you will have to come back and see if 'S' would work.
So it works as a kind of depth first search.
Generating all words and checking if they exist is impractical. There are 118^L words of length L, a too fast growing function. 1,643,032 three symbols words !
The other way round, as suggested by Makoto, is much more efficient. Try to reconstruct every word from a dictionary. There are on the order of 250,000 English words.
Checking a word is straightforward: see if any element symbol matches the start of the word, and continue with the remaining characters.
You must try all possible matches as a match could hide another. (I mean word ABC could be matched by symbol A and then be stuck because BC is not available, but it could be that AB and C are.) So the matching function will be recursive. I don't expect an exponential explosion in this matching process.
For maximum efficiency, you will store the symbols in a trie structure http://en.wikipedia.org/wiki/Trie.
Final hint: as you are just asked to find the longest match, try the words by decreasing lengths and stop at the first match.
Here is a simple solution in Python, without any optimization. Matching is done right to left, to allow printing the sequence of symbols in post-order:
Words= ['K', 'NaH', 'NaNaNaNa', 'HaKuNa']
Symbols= ['Na','K','H']
def Match(Word):
# Match the end of the word with every symbol
for S in Symbols:
# In case of a partial match, recurse on the prefix
if S == Word or (S == Word[-len(S):] and Match(Word[:-len(S)])):
print S,
return True
# No match, report failure
return False
for W in Words:
if Match(W):
print
>>>
K
Na H
Na Na Na Na
链接地址: http://www.djcxy.com/p/79130.html
上一篇: 向表格添加新行(IE友好版本)
下一篇: 算法的最大单词形成perodic表元素