Extract tokens from string

I have a html file, with unknown ammount of tokens. The keywords will be assigned to some data later by the user. I want to determine how much token does the html contain.Tokens can look like :
 
Back
Top