Help with a multiple text replacement script

Even though this forum so far has been for sharing scripts and not for writing them, I was wondering if I could pick the brains of AHK experts (Former Member and  come to mind right now), to figure something out.

I'm trying to put together a script to do multiple text replacements at a segment level, i.e., for the active segment only, so I can easily see what has been changed, without calling up the Find & Replace window.

I've managed to put together this script (silly examples included):

#r::
ClipSaved := ClipboardAll
Clipboard =
SendInput, ^a^c
ClipWait, 30
FixString := Clipboard
vList := " ;continuation section
(
dog perro

house casa
¿ ¿
/ /
, ,
? ?
. .
pie 2 pie2
m 2 m2
)"
Loop, Parse, vList, `n
{
oTemp := StrSplit(A_LoopField, "`t")
FixString := StrReplace(FixString, oTemp.1, oTemp.2)
}
oTemp := ""
Clipboard := FixString ; load the new string to clipboard
Sleep 200
Send ^v
Return

This works fine in segments with no tags, but when there are tags, they get stripped at some point during the replacement operation and the text that is pasted back into the segment has all the necessary replacements but no tags. Is there any way of preserving the tags in the clipboard?

I came up with a very clumsy workaround for this, which involves using Studio's Delete to Next Tag shortcut, so instead of Select All-Copy, the script would do Delete to Next Tag-Undo-Copy:

#r::
ClipSaved := ClipboardAll
Clipboard =
;SendInput, ^a^c
;ClipWait, 30
Send ^+D ;delete to next tag
Sleep 100
Send ^z ;undo
Sleep 50
Send ^c
ClipWait, 30
FixString := Clipboard
vList := " ;continuation section
(
organisation organization
¿ ¿
/ /
, ,
? ?
. .
pie 2 pie2
m 2 m2
)"
Loop, Parse, vList, `n
{
oTemp := StrSplit(A_LoopField, "`t")
FixString := StrReplace(FixString, oTemp.1, oTemp.2)
}
oTemp := ""
Clipboard := FixString ; load the new string to clipboard
Sleep 200
Send ^v
Return

While this also works in segments with no tags, I would like to optimize it.

My second question is: how would I go about creating a list of all these replacements (CSV? Excel?) and getting the script to take them from there instead of having to add them manually to the script? I've been reading up on arrays but I'm still far away from being able to implement what I need.

I have another simpler script attempt with just multiple StringReplace lines (see below), but again, that would require creating possibly hundreds of replacement lines and I imagine it's not the best solution.

#p:: 
Send, ^a
Send, ^c
StringReplace, clipboard, clipboard, dog, perro, All
StringReplace, clipboard, clipboard, cat, gato, All
StringReplace, clipboard, clipboard, raining, lloviendo, All
Send ^v
Return

So, any help with this would be greatly appreciated.

Thank you!

  • Hi Nora,

    I'll start with some suggestions to your second question about how to better implement the search and replacement items:

    Separating search and replacement parts by a tab is smart, but you don't need to put these into your script. It is way easier to maintain a simple text file where you add them and then read that text file into a variable like this:

    FileRead, vList, vListFile.txt

    In order to run your search and replace, just use a simple loop:

    Send, ^a
    Send, ^c

    Loop, Parse, vList, `r, `n
    {
        oTemp := StrSplit(A_LoopField, "`t")
        FixString := StrReplace(FixString, oTemp.1, oTemp.2)
    }

    SendInput, %FixString%

    As for the tag question, I need to look a bit deeper into this, but I fear it won't be easy without (again) the help of Studio APIs.

    Kind regards,
    Raphaël

  • Thank you very much, Raphaël! I'll look into this now, and probably will come back with more questions. I guess the tag issue is not a big deal. After all, it's a matter of making a quick decision as to what would be faster: replacing the tags in the segment manually or making all the necessary replacements.
  • Hi Raphaël,

    So, a couple of questions:

    - Where should I store the text file?, i.e, how will AHK know where to look for the file? And what happens if I move it to a different location later on?

    - Should the format of the contents in the file be:
    Old string TAB New string
    or should I use a different separator and not a tab?

    Thanks!
  • By the way, do you know this Studio app, TermInjector? I haven't played around with it for a very long time, but depending on what the exact purpose of your script is, the app might also do the trick natively, and then  tags should be fine.

  • 1) If you don't provide the complete path of the file in the script, but only its name, then it needs to be in the same folder as the script, otherwise wherever you want:

    • FileRead, vList, vListFile.txt → the file "vListFile.txt" must be in the same folder as the script
    • FileRead, vList, C:\Users\ndia\Documents\vListFile.txt → the file is stored in the folder "My Documents"

    If you move the file later on, then you have to update the file location accordingly in your script.

    2) That is up to you too, since you define the separator yourself in the script:

    • StrSplit(A_LoopField, A_Tab) → a tab is used, but you could replace that with a pipe character or whatever you like, both here and in the file

     

    Edit: typos

  • I have used TermInjector in the past, but not for a couple of years, it may be worth having a new look.

    I need this script for several things:

    1. MT post edit job clean-up: lots of unwanted spaces before specific punctuation marks and around backslashes and mixed quotation marks that are a pain to delete/fix manually but still need to be checked segment by segment (for those times I don't want to run a global Find and Replace on the document)
    2. Also in MT post edit jobs: term replacement, which can be recurrent depending on the MT engine used by the client on the file
    3. Regular editing jobs: multiple term replacement but only for the current segment
    4. Localization jobs: I have a list of words that always need to be localized

    I also need to be able to share the solution with my team and an AHK script + a file that can be easily edited would be a lot easier to share than TermInjector, which is a bit more complicated to implement, from what I remember.
  • Thank you Raphaël, a couple more questions:

    - Should SendInput, %FixString% paste the modified contents of the clipboard back to the segment or do I need to add a Ctrl+V there at the end?

    It seems like if it's by itself, SendInput, %FixString% won't paste the contents back, but if I add a Ctrl+V, both the modified contents and the original segment are pasted in, not sure what I need to add or where to make it work reliably.

    - Should the text file have a special encoding to allow Spanish characters such as ñ and á? They're not being passed through correctly.

    - If I want to include characters such as question marks, commas, periods, etc. in the text file, do they need to be escaped somehow or can they just be added literally?
  • Hi Nora,

    Since you use ClipboardAll at the start of your script, you know that it is different from Clipboard: ClipboardAll is binary, while Clipboard is just text.

    See https://autohotkey.com/docs/misc/Clipboard.htm

    When you use "FixString := Clipboard" in your script, you have already reduced your string to text, i.e. the tags are gone.

    The AutoHotKey webpage above indicates that "altering a binary-clipboard variable (by means such as StringReplace) will revert it to a normal variable, resulting in the loss of its clipboard data", so even if you use "FixString := ClipboardAll", subsequent StringReplace commands would reduce FixString to text.

    The webpage also states that  "binary-clipboard variables may be passed to functions by value (formerly they only worked ByRef)", so if you could figure out the binary format used you might be able to write your own routine to make changes to the clipboard data, but I suspect it wouldn't be all that easy.

    I have never tried to deal with this binary data, but maybe one of the AutoHotKey experts has some experience in this area ...

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • Aaaahhh, thank you Bruce, that shows how little I know about programming. : )

    The script is not my own code, except for the very simple bits, I put it together from things I found here and there while trying to research this. I got a similar explanation (a bit over my head, really) from a helpful user of the AHK forum who was also trying to help me with this. He also mentioned something about binary clipboard contents and modifying that somehow, but I didn't even realize there was a difference between Clipboard and ClipboardAll. The possible solution to preserve the tags seems a bit overkill and too much effort for what I need to do, though (especially considering that I have no clue where to start to do that). Unless the segment is heavily tagged, it's just faster to replace the tags if needed, I guess.
  • And I forgot to add that this is turning out to be a very educational and informative thread!
  • Hi Nora,

    About the special characters ñ and á, I had a similar problem when I processed strings in a Dragon command.

    The problem occurred because the string was being converted from Unicode to ASCII.

    I was using the Windows registry to pass strings between Dragon commands. Unfortunately strings in the Windows registry are ASCII only, so when another command retrieved the string the special characters had been converted to regular characters without the accents.

    I had to write a more complicated interface that converted each Unicode character in the string into three permissible "ASCII characters" (excluding ASCII zero, which is used as the terminating byte for the registry string) and then reversed the conversion when the string was retrieved.

    Maybe a similar Unicode to ASCII conversion is happening somewhere in your situation.

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • I guess that's possible. I also had something similar in KnowBrainer, but my workaround was to write all my commands involving accented and special characters directly in Dragon (no string manipulation, though, just simple commands like copy-paste).
  • Hi Nora,

    I just noticed that you are using a text file. (Sorry, I am not following all the details of what you are doing...)

    Are you saving strings to a text file and then retrieving them?

    If so, maybe check the encoding of the text file.

    Check by opening the file, and then opening the "Save As" dialog (File->Save As).

    At the bottom of the dialog box you will see a drop-down list for "Encoding:"

    I think "ANSI" is the default. Try changing it to "Unicode" and see whether this helps.

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • Hi Bruce,

    That was it! I had created the file in Notepad++ without making any changes to the encoding. I've now saved it as Unicode (was UTF-8 originally) and it's working fine. Thank you!
  • Good morning, Nora!

    Here we go:

    1. When you use SendInput, %FixString%, AHK sends the content of the variable directly to the currently active control in the active window, so there is no need to first copy the content of the variable to the clipboard and then the clipboard content to Studio by sending [Ctrl]+[ V ].
      It is even possible to send the content of a variable directly to a control that is in a non-active window via the command ControlSend, but in the case of Studio, this wouldn't work too well because of the changing control names.
      In principle, SendInput, %FixString% should paste the corrected string back into the target segment, overwriting the existing target since it would still be selected by the initial Send, ^a command. If you add a Send, ^v after that, it is normal that the original unmodified target gets also pasted since the clipboard still contains only the unmodified target from the command Send, ^c.
    2. As already pointed out by Jesus, if you need to support special characters like accents, diacritic marks or other alphabets like Cyrillic or Greek, both the AHK script and any external file need to be in Unicode (UTF-8), otherwise you risk encountering corrupted characters. Advanced text editors like Notepad++ usually use UTF-8 as the default file encoding, but the standard Windows Notepad uses ANSI.
    3. Since we are not talking about a CSV file, the only character that would need to be escaped or that might cause trouble would be the tab character itself since it is used as a delimiter. All other characters should be handled correctly when added literally.

    Don't hesitate to get back to me if anything is still unclear ;-)

     

    Have a great day!

    Raphaël