On This Page
advertisement

The transforms in this category access the Document Object Model (DOM) of the current document:

Transform Description
attribute Searches the current HTML document for an element that matches the query terms and returns the named attribute of the element
query Searches the current HTML document for an element that matches the query terms and returns the text of the element using textContent
queryInner Searches the current HTML document for an element that matches the query terms and returns the text of the element using innerText
queryValue Searches the current HTML document for an element that matches the query terms and returns the value of the element

attribute:selector:index:attribute-name

The following information is intended for advanced users.

Returns the value of the HTML attribute with the given attribute-name from the indexth HTML element that matches the selector. All the parameters are required.

The attribute transform is intended to allow ORA users to write templates that access attributes, such as HREF values, that ORA does not extract. The selector and index parameters have the same purpose as for the query transform.

The selector parameter must be a valid CSS selector. It is beyond the scope of this help page to explain CSS Selectors.

The attribute transform has a couple unusual characteristics:

  • The attribute transform may only be used with a special Field named "DOM". If you attempt to use it with any other Field, the result will be an empty string.
  • The attribute transform cannot be tested on the OraSettings page. The attribute transform must have the HTML of the page available, and that HTML is not available from the OraSettings page. Instead, the attribute transform inspects the HTML for the OraSettings page, not the collection page for which it is intended.

Example

To return the HREF attibute of the first "A" (link) element on the page: [DOM:attribute:a:1:href]. If the page has an A element, the result is the text of the HREF attribute.

Selectors are usually more involved than the selector used in the example above.

HREFs

To convert an HREF attribute value to a full URL, pass the attribute result to the hrefToUrl transform:

[DOM:attribute:a:1:href:hrefToUrl]

query:selector:index

The following information is intended for advanced users.

Returns the text of the indexth HTML element that matches the selector. If the optional index is not supplied, it defaults to 1, and the transform will return the text of the first matching element using its textContent property. Otherwise, it will return the text of the indexth HTML element.

The query transform is intended to allow ORA users to write templates that access text on the page that ORA does not extract.

The query and queryInner transforms are very similar and only differ in how the text of the element is interpreted. You may have to experiment to see which one is better suited to your usage.

The query, queryInner, and queryValue transforms use the Document.querySelectorAll() method to select the element of interest. The selector parameter must be a valid CSS selector. It is beyond the scope of this help page to explain CSS Selectors.

The query transform has a couple unusual characteristics:

  • The query transform may only be used with a special Field named "DOM". If you attempt to use it with any other Field, the result will be an empty string.
  • The query transform cannot be tested on the OraSettings page. The query transform must have the HTML of the page available, and that HTML is not available from the OraSettings page. Instead, the query transform inspects the HTML for the OraSettings page, not the collection page for which it is intended.

Examples

  1. Return the text of the first "H2" element on the page:
    [DOM:query:h2]

    If the page has an H2 element with the text "Part One", the result is "Part One".

  2. Return the second "H2" element on the page:
    [DOM:query:h2:2]

    If the page has two H2 elements, "Part One" and "Part Two", the result is "Part Two".

Selectors are usually more involved than the selectors used in the examples above.

queryInner:selector:index

The following information is intended for advanced users.

Returns the text of the indexth HTML element that matches the selector. If the optional index is not supplied, it defaults to 1, and the transform will return the text of the first matching element using its innerText property. Otherwise, it will return the text of the indexth HTML element.

The queryInner transform is intended to allow ORA users to write templates that access text on the page that ORA does not extract.

The query and queryInner transforms are very similar and only differ in how the text of the element is interpreted. You may have to experiment to see which one is better suited to your usage.

The query, queryInner, and queryValue transforms use the Document.querySelectorAll() method to select the element of interest. The selector parameter must be a valid CSS selector. It is beyond the scope of this help page to explain CSS Selectors.

The queryInner transform has a couple unusual characteristics:

  • The queryInner transform may only be used with a special Field named "DOM". If you attempt to use it with any other Field, the result will be an empty string.
  • The queryInner transform cannot be tested on the OraSettings page. The queryInner transform must have the HTML of the page available, and that HTML is not available from the OraSettings page. Instead, the queryInner transform inspects the HTML for the OraSettings page, not the collection page for which it is intended.

Examples

  1. Return the text of the first "H2" element on the page:
    [DOM:queryInner:h2]

    If the page has an H2 element with the text "Part One", the result is "Part One".

  2. Return the second "H2" element on the page:
    [DOM:queryInner:h2:2]

    If the page has two H2 elements, "Part One" and "Part Two", the result is "Part Two".

Selectors are usually more involved than the selectors used in the examples above.

queryValue:selector:index

The following information is intended for advanced users.

Returns the value of the indexth HTML input element that matches the selector. If the optional index is not supplied, it defaults to 1, and the transform will return the value of the first matching element. Otherwise, it will return the value of the indexth HTML element.

The queryValue transform is intended to allow ORA users to write templates that access the value of input elements that ORA does not extract. Input elements are used with forms and include drop-down menus, textboxes, and checkboxes.

The value of the input element is determined by the input element's value property. The value for textboxes is the visible text, but for other input elements, the value is determined by other factors.

For pull-down menus created with the HTML SELECT element, the queryValue result may not match the visible text. The value of the SELECT element is not determined by the visible text. It is determined by the value= attribute on the selected OPTION element.

For checkboxes, the queryValue result will be "1" (one) if the checkbox is checked or "0" (zero) if the checkbox is not checked.

For radio buttons, the queryValue result will be the value attribute of the selected radio button.

Radio Button Value

If the page HTML defines these radio buttons:

<input type="radio" name="gender" value="F">Female
<input type="radio" name="gender" value="M">Male
<input type="radio" name="gender" value="O">Other

The queryValue result will be "F", "M", or "O", depending on which is selected. The queryValue result will not be "Female", "Male, or "Other".

The query, queryInner, and queryValue transforms use the Document.querySelectorAll() method to select the element of interest. The selector parameter must be a valid CSS selector. It is beyond the scope of this help page to explain CSS Selectors.

The queryValue transform has a couple unusual characteristics:

  • The queryValue transform may only be used with a special Field named "DOM". If you attempt to use it with any other Field, the result will be an empty string.
  • The queryValue transform cannot be tested on the OraSettings page. The queryValue transform must have the HTML of the page available, and that HTML is not available from the OraSettings page. Instead, the queryValue transform inspects the HTML for the OraSettings page, not the collection page for which it is intended.

Examples

  1. Return the text of the first input text element on the page:
    [DOM:queryValue:input[type=text]]
  2. Return the text of the drop-down (select) element with the id "Gender":
    [DOM:queryValue:#Gender]

Selectors are often more involved than the selectors used in the examples above.

Advice

The attribute, query, queryInner, and queryValue transforms—always applied to the "DOM" pseudo-field—are examples of "giving users enough rope". They are a doorway into a technical world that most end users will find confusing, daunting, tedious, or worse. It's best to avoid these transforms. I only recommend using them when absolutely necessary and only when I think some of the issues I describe below can be avoided.

The first challenge associated with using these transforms is defining the selector parameter, which is a CSS selector. CSS selectors are very powerful, but they are intended for use by programmers or HTML and CSS authors.

To use the transforms, in addition to having some familiarity with CSS selectors, you also have to inspect the HTML page to see how it is structured. The Developer Tools facility built-in to most browsers is invaluable for that task.

If you right-click on a part of a web page and choose the "inspect" item from the context menu, that will open the Developer Tools panel and highlight an HTML node. That is the typical starting point for writing a selector that matches the current node, but it is only a starting point. You have to review the HTML node in context to determine the appropriate CSS selector value. There are many selectors that will work, but many will be too loose or too tight.

  • A "too loose" selector will select a lot of other nodes, and that will make the selector fragile. The selector may work for the exact current page, but won't work for pages associated with other records in the same collection.

    For example, if your transform uses a selector that matches 28 DIV elements, and you specify index 17, the 17 may be correct on the current page, but on some other page, you may need 16, or 18, or some other index value.

  • A "too tight" selector will select only one or a handful of nodes on the current page, but may not match any nodes on any other pages. The issue here is web servers often customize the HTML and CSS and a slight change by the server will invalidate CSS selectors that depend on those values.

    For example, you may see an HTML element where the class includes a prefix or suffix with a value that seems computed like class="rightSideCss_r1y3a6mc". If you write a selector that includes the suffix ("_r1y3a6mc"), the next time you visit the site, that suffix may change and your selector will no longer work.

So, like Goldilocks, you need to find the "just right" selector between too loose and too tight. Doing so often requires a lot of technical knowledge and experience.