Channel: Parsing multidimensional data in paragraphs - Unix & Linux Stack Exchange

Parsing multidimensional data in paragraphs

July 14, 2017, 3:41 pm

≪ Previous: Answer by DopeGhoti for Parsing multidimensional data in paragraphs

I'm trying to parse data from a PDF report and filter out certain interesting elements. Using pdftotext -layout I get data in this format as my starting point:

Record   Info           Interesting  
123      apple          yep         
         orange         nope         
         lemon          yep          
----------------------------------------------- 
456      dragonfruit    yep
         cucumber       nope         
-----------------------------------------------
789      kumquat        nope         
         lychee         yep          
         passionfruit   yep          
         yam            nope         
-----------------------------------------------
987      grapefruit     nope

My intended output is this - every 'Interesting' fruit and its record number except when the fruit is the first fruit in its record:

Record   Info
123      lemon
789      lychee
789      passionfruit

Currently, inspired by this question, I'm replacing the ------ record delimiters with \n\n and stripping out the record headers using sed. Then I can find paragraphs with matching records with awk:

awk -v RS='' '/\n   .....................yep/'

(Figuring out how to write {3}.{21} or similar with one of the awks is definitely a battle for another day :/ )

This produces the cleaned-up paragraphs like so:

123      apple          yep         
         orange         nope         
         lemon          yep          

789      kumquat        nope         
         lychee         yep          
         passionfruit   yep          
         yam            nope

From here I could get the desired output by:

adding a second record number column, populated from the first record number column or the previous row's second record number column
delete rows which have a record number in the first column
delete rows which aren't intereresting
cut out the final columns

Am I going broadly in the right direction here, or is there a more straightforward way to parse multidimensional data? Perhaps by grepping an interesting row (has yep and no record number), then grep backwards from there to the next row with a nonblank record number?

↧

↧

Latest Images

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

April 17, 2024, 6:48 pm

Deepfake Video of Aamir Khan circulates Online

Deepfake Video of Aamir Khan circulates Online

April 17, 2024, 3:07 am

Very Hungry Caterpillar™ Shirt: World of Eric Carle™+ Little Goodall by...

Very Hungry Caterpillar™ Shirt: World of Eric Carle™+ Little Goodall by...

April 14, 2024, 8:14 am

Have you seen Michael Wines? Burien man has been missing since Saturday,...

Have you seen Michael Wines? Burien man has been missing since Saturday,...

April 12, 2024, 3:59 pm

Stay Salty POTS Awareness Stretchy Stacking Bracelets | Set of Three|...

Stay Salty POTS Awareness Stretchy Stacking Bracelets | Set of Three|...

April 11, 2024, 5:27 pm

19 Reader-Favourite March Purchases — From Steals To Splurges

19 Reader-Favourite March Purchases — From Steals To Splurges

April 11, 2024, 5:07 am

Fake lip ring, silver lip ring, simple lip ring, unisex lip ring by AIRlab

Fake lip ring, silver lip ring, simple lip ring, unisex lip ring by AIRlab

April 9, 2024, 4:00 pm

People's Blog • First Pictures of the Eclipse ! ! !

People's Blog • First Pictures of the Eclipse ! ! !

April 8, 2024, 1:08 pm

Trending Articles

Blood-C (Dual Audio) (2011) [CBM] [BDRip] [HD 720p]

July 28, 2013, 2:05 am

In 1952, Hindi film songs were banned on All India Radio

July 14, 2018, 5:30 pm

Happy Birthday Wishes for Bhabhi in Hindi & English |हैप्पी बर्थडे भाभी

March 13, 2020, 3:01 am

Gemvision Matrix 9.0 (7349) Full Package

October 12, 2019, 1:05 pm

Kanulanu Thaake Lyrics and translation | Manam (2014)

May 9, 2014, 5:45 am

How to find specific mac address or IP address in a Cisco Switch port

November 16, 2013, 9:15 pm

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

December 17, 2013, 6:12 pm

Suvarna Mohangal (2002) – Malayalam Movie Watch Online

August 5, 2015, 9:52 pm

NOTES ZA GENERAL CHEMISTRY ZA NGAIZA

April 4, 2019, 11:38 pm

Anchors List (Female Male) of Zee News Channel with Full Biop/Detail ,Pictures

February 5, 2017, 9:06 am

100+ Short Whatsapp Status in English | Short Status Quotes Words

March 22, 2017, 12:27 am

Lirik Lagu Rohani Kristen Kasih - Kapata

April 3, 2014, 2:49 am

Dagadi Chaawl 2015 Dual Audio 720p HDRip [Hindi – Marathi] – UNCUT

June 21, 2016, 5:37 pm

Varzish Sport Tv HD Biss Key Frequency Update

January 15, 2017, 9:03 pm

Guntur District Police Officers Mobile Numbers

April 17, 2017, 2:10 am

how to leverage the data from tstats query from a datamodel to stats command...

June 12, 2019, 1:16 am

Online Grading System with Grade Viewing Capstone Project

February 27, 2019, 2:08 am

SIMBA VS YANGA VIKOSI VINAVYOANZA MECHI YA LEO…

September 26, 2015, 5:47 am

Romantic And Impressive Birthday Wishes For Girlfriend - Best Birthday Wishes...

January 30, 2020, 8:41 am

99 formas de llamarle a tus tetas

May 19, 2017, 5:00 am

Latest Images

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

April 17, 2024, 6:48 pm

Deepfake Video of Aamir Khan circulates Online

Deepfake Video of Aamir Khan circulates Online

April 17, 2024, 3:07 am

Very Hungry Caterpillar™ Shirt: World of Eric Carle™+ Little Goodall by...

Very Hungry Caterpillar™ Shirt: World of Eric Carle™+ Little Goodall by...

April 14, 2024, 8:14 am

Have you seen Michael Wines? Burien man has been missing since Saturday,...

Have you seen Michael Wines? Burien man has been missing since Saturday,...

April 12, 2024, 3:59 pm

Stay Salty POTS Awareness Stretchy Stacking Bracelets | Set of Three|...

Stay Salty POTS Awareness Stretchy Stacking Bracelets | Set of Three|...

April 11, 2024, 5:27 pm

19 Reader-Favourite March Purchases — From Steals To Splurges

19 Reader-Favourite March Purchases — From Steals To Splurges

April 11, 2024, 5:07 am

Fake lip ring, silver lip ring, simple lip ring, unisex lip ring by AIRlab

Fake lip ring, silver lip ring, simple lip ring, unisex lip ring by AIRlab

April 9, 2024, 4:00 pm

People's Blog • First Pictures of the Eclipse ! ! !

People's Blog • First Pictures of the Eclipse ! ! !

April 8, 2024, 1:08 pm

© 2024 //www.rssing.com