Openning ScrapyFSharp.CssSelectorExtensions module will enable CSS selectors.
1: 2: |
|
Practice 1: Search something on Google
We will parse links of a Google to search for FSharp.Data
like in HTML Parser
article.
1:
|
|
To be sure we get search results, we will parse links in the div with id search
.
Then, for example, we could ensure we the HTML's structure is really compliant with the parser
using the direct descendants selector.
1: 2: 3: 4: 5: 6: 7: 8: |
|
"li.g > div.s" skips the 4 sub results targeting github pages.
|
Now we could want the pages titles associated with their urls with a List.zip
1: 2: 3: 4: |
|
|
Practice 2: Search FSharp books on Youscribe
We will parse links of a Youscribe to search result for F#
.
1:
|
|
We simply ensure to match good links with their CSS's styles and DOM's hierachy
1: 2: 3: 4: |
|
|
JQuery selectors
Attribute Contains Prefix Selector
Finds all links with an english hreflang attribute.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
|
Attribute Contains Selector
Finds all inputs with a name containing "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
|
Attribute Contains Word Selector
Finds all inputs with a name containing the word "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
|
Attribute Ends With Selector
Finds all inputs with a name ending with "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
|
Attribute Equals Selector
Finds all inputs with a name equal to "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
|
Attribute Not Equal Selector
Finds all inputs with a name different to "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
|
Attribute Starts With Selector
Finds all inputs with a name starting with "man".
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
|
Forms helpers
There are some syntax shorcuts to find forms controls.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: |
|
Find all buttons.
1:
|
|
|
Find all checkboxes.
1:
|
|
|
Find all checked checkboxs or radio.
1:
|
|
|
Find all disabled controls.
1:
|
|
|
Find all inputs with type hidden.
1:
|
|
|
Find all inputs with type radio.
1:
|
|
|
Find all inputs with type password.
1:
|
|
|
Find all files uploaders.
1:
|
|
|
Implemented and missing features
Basic CSS selectors are implemented, but some JQuery selectors are missing
This table lists all JQuery selectors and their status
Selector name |
Status |
specification |
---|---|---|
All Selector |
|
|
:animated Selector |
|
|
Attribute Contains Prefix Selector |
|
|
Attribute Contains Selector |
|
|
Attribute Contains Word Selector |
|
|
Attribute Ends With Selector |
|
|
Attribute Equals Selector |
|
|
Attribute Not Equal Selector |
|
|
Attribute Starts With Selector |
|
|
:button Selector |
|
|
:checkbox Selector |
|
|
:checked Selector |
|
|
Child Selector (“parent > child”) |
|
|
Class Selector (“.class”) |
|
|
:contains() Selector |
|
|
Descendant Selector (“ancestor descendant”) |
|
|
:disabled Selector |
|
|
Element Selector (“element”) |
|
|
:empty Selector |
|
|
:enabled Selector |
|
|
:eq() Selector |
|
|
:even Selector |
|
|
:file Selector |
|
|
:first-child Selector |
|
|
:first-of-type Selector |
|
|
:first Selector |
|
|
:focus Selector |
|
|
:gt() Selector |
|
|
Has Attribute Selector [name] |
|
|
:has() Selector |
|
|
:header Selector |
|
|
:hidden Selector |
|
|
ID Selector (“#id”) |
|
|
:image Selector |
|
|
:input Selector |
|
|
:lang() Selector |
|
|
:last-child Selector |
|
|
:last-of-type Selector |
|
|
:last Selector |
|
|
:lt() Selector |
|
|
Multiple Attribute Selector [name=”value”][name2=”value2″] |
|
|
Multiple Selector (“selector1, selector2, selectorN”) |
|
|
Next Adjacent Selector (“prev + next”) |
|
|
Next Siblings Selector (“prev ~ siblings”) |
|
|
:not() Selector |
|
|
:nth-child() Selector |
|
|
:nth-last-child() Selector |
|
|
:nth-last-of-type() Selector |
|
|
:nth-of-type() Selector |
|
|
:odd Selector |
|
|
:only-child Selector |
|
|
:only-of-type Selector |
|
|
:parent Selector |
|
|
:password Selector |
|
|
:radio Selector |
|
|
:reset Selector |
|
|
:root Selector |
|
|
:selected Selector |
|
|
:submit Selector |
|
|
:target Selector |
|
|
:text Selector |
|
|
:visible Selector |
|
[1] :root Selector seems to be useless in our case because with the HTML parser the root is always the html node.
namespace FSharp
--------------------
namespace Microsoft.FSharp
namespace FSharp.Data
--------------------
namespace Microsoft.FSharp.Data
from ScrapyFSharp
Full name: HtmlCssSelectorsExample.doc
module HtmlDocument
from FSharp.Data
--------------------
type HtmlDocument =
private | HtmlDocument of docType: string * elements: HtmlNode list
override ToString : unit -> string
static member AsyncLoad : uri:string -> Async<HtmlDocument>
static member Load : uri:string -> HtmlDocument
static member Load : reader:TextReader -> HtmlDocument
static member Load : stream:Stream -> HtmlDocument
static member New : children:seq<HtmlNode> -> HtmlDocument
static member New : docType:string * children:seq<HtmlNode> -> HtmlDocument
static member Parse : text:string -> HtmlDocument
Full name: FSharp.Data.HtmlDocument
static member HtmlDocument.Load : reader:System.IO.TextReader -> HtmlDocument
static member HtmlDocument.Load : stream:System.IO.Stream -> HtmlDocument
Full name: HtmlCssSelectorsExample.links
Gets descendants matched by Css selector
module List
from Microsoft.FSharp.Collections
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IEnumerable
interface IEnumerable<'T>
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
Full name: Microsoft.FSharp.Collections.List<_>
Full name: Microsoft.FSharp.Collections.List.map
System.String.StartsWith(value: string, comparisonType: System.StringComparison) : bool
System.String.StartsWith(value: string, ignoreCase: bool, culture: System.Globalization.CultureInfo) : bool
Full name: HtmlCssSelectorsExample.searchResults
Full name: Microsoft.FSharp.Collections.List.zip
Full name: HtmlCssSelectorsExample.doc2
Full name: HtmlCssSelectorsExample.books
Full name: Microsoft.FSharp.Collections.List.filter
Full name: HtmlCssSelectorsExample.englishLinks
Full name: HtmlCssSelectorsExample.case1
Full name: HtmlCssSelectorsExample.case2
Full name: HtmlCssSelectorsExample.case3
Full name: HtmlCssSelectorsExample.case4
Full name: HtmlCssSelectorsExample.case5
Full name: HtmlCssSelectorsExample.case6
Full name: HtmlCssSelectorsExample.htmlForm
Full name: HtmlCssSelectorsExample.buttons
Full name: HtmlCssSelectorsExample.checkboxes
Full name: HtmlCssSelectorsExample.disabled
Full name: HtmlCssSelectorsExample.hidden
Full name: HtmlCssSelectorsExample.radio
Full name: HtmlCssSelectorsExample.password
Full name: HtmlCssSelectorsExample.file